🛣️

데이터베이스 성능 최적화의 비밀: Filesort와 외부 정렬-병합 알고리즘

태그

데이터베이스

Filesort

외부정렬

디스크IO

쿼리최적화

SQL튜닝

데이터엔지니어링

공개여부

작성일자

2026/04/02

데이터베이스(DB)에서 쿼리를 작성할 때 `ORDER BY`를 무심코 사용해 본 적이 있으신가요? 데이터의 양이 적을 때는 문제가 없지만, 수백만 건의 데이터를 정렬해야 할 때 DB 내부에서는 엄청난 디스크 I/O 작업이 발생하게 됩니다. 실행 계획(Execution Plan)에서 흔히 마주치게 되는 filesort라는 단어는 바로 이러한 험난한 과정의 흔적입니다.

이 블로그 포스트에서는 데이터베이스가 실제로 어떻게 데이터를 디스크에 쓰고, 어떤 단위로 데이터를 옮기며 정렬하는지, 그 기저에 깔린 외부 정렬-병합 알고리즘(External Sort-Merge Algorithm)의 동작 원리와 수학적 비용 분석을 완벽하게 파헤쳐 보겠습니다.

---

1. 데이터베이스의 물리적 한계: 디스크 I/O와 블록(Block)

데이터베이스 시스템을 이해하려면 먼저 하드웨어의 물리적 특성을 이해해야 합니다. 데이터베이스 시스템은 궁극적으로 비트(bit) 형태의 데이터를 하나 이상의 저장 장치에 저장해야 합니다 [1]. 오늘날 대다수의 데이터베이스는 마그네틱 디스크나 플래시 기반의 SSD(Solid-State Drive)에 데이터를 저장하며, 처리를 위해 메인 메모리로 데이터를 가져옵니다 [1, 2].

여기서 성능의 핵심적인 병목이 발생합니다. 메인 메모리 접근 시간은 1마이크로초(microsecond)의 10분의 1 수준인 반면, 플래시 스토리지는 20~100마이크로초, 전통적인 마그네틱 디스크는 수십 밀리초(milliseconds)나 걸립니다 [1]. 즉, 디스크 접근은 메모리 접근보다 압도적으로 느립니다.

따라서 데이터베이스는 디스크와 메인 메모리 사이의 데이터 이동 단위로 '블록(Block)'이라는 개념을 사용합니다 [3]. 디스크에서 데이터를 한 건씩 읽는 것이 아니라, 효율성을 위해 연속된 바이트의 묶음인 블록 단위로 전송(Block transfer)을 수행합니다 [3]. 시스템의 중앙 처리 장치(CPU) 속도에 비해 데이터를 디스크에서 메인 메모리로 옮기는 속도가 너무 느리기 때문에, 데이터베이스 시스템은 디스크와 메인 메모리 간의 데이터 이동(I/O) 필요성을 최소화하도록 데이터를 구조화하는 것이 필수적입니다 [2].

2. 메모리를 초과하는 거대한 데이터: Filesort(외부 정렬)의 등장

SQL 쿼리에서 정렬이 요구되거나, 조인(Join) 등의 연산을 위해 정렬된 데이터가 필요할 때 데이터베이스는 정렬 작업을 수행합니다 [4]. 만약 정렬해야 할 릴레이션(테이블)의 크기가 작아서 메인 메모리에 모두 들어간다면, 퀵 정렬(Quick-sort)과 같은 표준적인 인메모리(In-memory) 정렬 기법을 사용하여 빠르게 처리할 수 있습니다 [5].

하지만 데이터가 너무 커서 사용 가능한 메인 메모리 공간에 들어가지 못할 때 문제가 발생합니다. 이때 데이터베이스는 디스크를 임시 작업 공간으로 활용하는 외부 정렬(External sorting)을 수행하며, 이 과정이 바로 우리가 흔히 `filesort`라고 부르는 작업입니다. 외부 정렬에 가장 널리 사용되는 기법이 바로 외부 정렬-병합 알고리즘(External Sort-Merge Algorithm)입니다 [5].

3. 외부 정렬-병합 알고리즘의 동작 단계

외부 정렬-병합 알고리즘은 데이터를 디스크와 메모리 사이에서 블록 단위로 부지런히 옮기며 정렬을 완성합니다. 사용 가능한 메인 메모리의 버퍼 블록 수를

M

이라고 할 때, 이 알고리즘은 크게 두 가지 단계로 진행됩니다 [5].

1단계: 런(Run) 생성 단계

첫 번째 단계에서는 데이터를 메모리 크기만큼 잘라서 부분적으로 정렬된 파일 조각인 '런(Run)'을 만듭니다.

1. 릴레이션에서 $M$개의 블록(또는 남은 데이터 중 더 작은 쪽)을 메모리로 읽어 들입니다 [6].

2. 메모리 내에서 퀵 정렬 등을 사용하여 해당 블록들의 레코드를 정렬합니다 [6].

3. 정렬된 데이터를 디스크에 새로운 런 파일($R_i$)로 씁니다 [6].

4. 전체 릴레이션을 다 읽을 때까지 이 과정을 반복합니다 [6].

2단계: 병합(Merge) 단계

생성된 여러 개의 런들을 하나로 합치는 과정입니다. 만약 전체 런의 개수(

N

)가 메모리 블록 수(

M

)보다 작다면, 메모리에 각 런을 위한 입력 버퍼 1개씩을 할당하고, 남은 공간을 출력 버퍼로 사용하여 단 한 번의 패스(pass)만으로 정렬을 완료할 수 있습니다(N-way merge) [7].

동작 방식은 다음과 같습니다:

N

개의 런 파일 각각에서 첫 번째 블록을 메모리 버퍼로 읽어옵니다 [7].

2. 모든 버퍼 블록을 비교하여 정렬 순서상 가장 첫 번째 튜플을 선택합니다 [7].

3. 선택된 튜플을 출력 버퍼에 쓰고, 입력 버퍼에서는 삭제합니다 [7].

4. 만약 특정 런의 입력 버퍼가 비워지면, 해당 런의 다음 블록을 디스크에서 읽어옵니다 [7].

5. 이 과정을 모든 입력 버퍼가 비워질 때까지 반복하여 최종 정렬된 결과를 만들어냅니다 [7].

하지만 런의 개수가

M

개보다 많다면 한 번에 병합할 수 없으므로, 여러 번의 다중 병합 패스(Multiple merge passes)를 거쳐야 합니다. 첫 번째 패스에서 $M-1$개의 런씩 묶어서 병합하여 새로운 런을 만들고, 이 과정을 런이 $M$개 미만으로 줄어들 때까지 반복합니다 [8].

4. 수학적 비용 분석: 정렬에 드는 디스크 I/O 비용

이 알고리즘이 디스크에 얼마나 큰 부담을 주는지 수학적으로 분석해 보겠습니다. 디스크 I/O 비용은 크게 데이터를 읽고 쓰는 블록 전송 횟수(Block transfers)와 디스크 헤드를 움직이는 디스크 탐색 횟수(Disk seeks)로 나뉩니다.

분석을 위한 변수를 정의해 보겠습니다:

•

brb_rbr​: 정렬해야 할 릴레이션의 전체 블록 수 [9]

•

MMM: 정렬에 사용 가능한 메인 메모리 버퍼의 블록 수 [5]

•

bbb_bbb​: 병합 단계에서 각 입력 및 출력 런에 할당되는 버퍼 블록 수 (탐색 횟수를 줄이기 위해 1 이상으로 설정) [9]

1) 블록 전송 비용 (Block Transfers)

•

런 생성 단계: 전체 릴레이션의 모든 블록을 한 번 읽고, 정렬 후 다시 디스크에 쓰므로 2br2b_r2br​ 번의 블록 전송이 발생합니다 [9].

•

초기 런의 개수: 메모리 크기(MMM)만큼 잘라서 정렬했으므로, 런의 개수는 ⌈br/M⌉\lceil b_r / M \rceil⌈br​/M⌉ 개가 됩니다 [9].

•

병합 패스 횟수: 각 패스마다 ⌊M/bb⌋−1\lfloor M/b_b \rfloor - 1⌊M/bb​⌋−1 개의 런을 병합하므로, 총 패스 횟수는 ⌈log⁡⌊M/bb⌋−1(br/M)⌉\lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r/M) \rceil⌈log⌊M/bb​⌋−1​(br​/M)⌉ 번이 됩니다 [9].

•

총 블록 전송 횟수: 각 병합 패스마다 전체 블록을 한 번씩 읽고 써야 하므로, 최종적으로 (마지막 결과를 디스크에 쓰지 않는다고 가정할 때) 다음의 공식이 도출됩니다 [10]:

b_r (2 \lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r/M) \rceil + 1)

2) 디스크 탐색 비용 (Disk Seeks)

디스크 탐색 비용은 데이터를 연속적으로 읽지 못하고 랜덤하게 접근할 때 발생합니다.

•

런 생성 단계: 데이터를 읽을 때와 쓸 때 각각 탐색이 필요하므로 약 2⌈br/M⌉2 \lceil b_r/M \rceil2⌈br​/M⌉ 번의 탐색이 발생합니다 [11].

•

병합 패스: 각 런에서 데이터를 읽어올 때 $b_b$ 단위로 읽어오므로 탐색 횟수가 줄어듭니다.

•

총 디스크 탐색 횟수: [11]

2 \lceil b_r/M \rceil + \lceil b_r/b_b \rceil (2 \lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r/M) \rceil - 1)

5. 실무 관점의 DB 튜닝 인사이트 (Filesort 최적화)

수학적 비용 공식을 살펴보면, DB 관리자(DBA)나 백엔드 개발자가 어떻게 쿼리를 최적화해야 하는지 명확한 인사이트를 얻을 수 있습니다.

1. 정렬 버퍼($M$)의 크기를 늘려라:

공식에서 로그의 밑이 되는

\lfloor M/b_b \rfloor - 1

값이 커질수록, 전체 병합 패스 횟수($\log$)가 기하급수적으로 줄어듭니다. 즉, MySQL의 `sort_buffer_size`와 같은 정렬 메모리 공간을 늘려주면 디스크 I/O가 획기적으로 감소하여 성능이 개선됩니다 [9].

2. **$b_b$ (읽기/쓰기 단위)를 조정하여 Seek를 줄여라:**

버퍼를 크게 가져갈수록 헤드의 물리적 이동인 디스크 탐색(Seek) 횟수를 줄일 수 있습니다 [9, 12]. 하지만 메모리는 한정되어 있으므로 $M$과 $b_b$ 사이의 적절한 트레이드오프가 필요합니다.

3. **가장 좋은 최적화는 Filesort 자체를 피하는 것이다:**

디스크를 사용하는 외부 정렬은 아무리 최적화해도 근본적으로 비쌉니다. 애초에 인덱스를 B+-Tree 구조로 잘 설계하여, 데이터가 물리적 또는 논리적으로 이미 정렬된 상태(Index scan)로 읽어오게 한다면 정렬 비용 자체를 0으로 만들 수 있습니다 [13, 14].

데이터베이스는 블랙박스가 아닙니다. 여러분이 날린 한 줄의 `ORDER BY` 쿼리가 이처럼 정교한 수학적 계산과 디스크 I/O 최적화 알고리즘을 거쳐 실행된다는 점을 이해한다면, 훨씬 더 우아하고 성능이 뛰어난 애플리케이션을 설계할 수 있을 것입니다.

참고문헌

[1] Database-System-Concepts-7th-Edition — [Mitchell (1997)] T. M. Mitchell, Machine Learning, McGraw Hill (1997). [Witten et al. (2011)] I. H. Witten, E. Frank, and M. Hall, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 3rd edition, Morgan Kaufmann (2011). Credits The photo of the sailboats in the b…

[2] Database-System-Concepts-7th-Edition — 1.6 Database Engine A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be broadly divided into the storage manager, the query processor components, and the transaction management componen…

[3] Database-System-Concepts-7th-Edition — Scheduling. If several blocks from a cylinder need to be transferred from disk to main memory, we may be able to save access time by requesting the blocks in the order in which they will pass under the heads. If the desired blocks are on differ-ent cylinders, it is advantageous to request the blocks…

[4] Database-System-Concepts-7th-Edition — The implementation of selections with negation conditions is left to you as an exercise (Practice Exercise 15.6). 15.4 Sorting 701 15.4 Sorting Sorting of data plays an important role in database systems for two reasons. First, SQL queries can specify that the output be sorted. Second, and equally i…

[5] Database-System-Concepts-7th-Edition — The problem of sorting has been studied extensively, both for relations that fit entirely in main memory and for relations that are bigger than memory. In the first case, standard sorting techniques such as quick-sort can be used. Here, we discuss how to handle the second case. 15.4.1 External Sort-…

[6] Database-System-Concepts-7th-Edition — 1. In the first stage, a number of sorted runs are created; each run is sorted but contains only some of the records of the relation. i = 0; repeat read M blocks of the relation, or the rest of the relation, whichever is smaller; sort the in-memory part of the relation; write the sorted data to run …

[7] Database-System-Concepts-7th-Edition — 702 Chapter 15 Query Processing read one block of each of the N files Ri into a buffer block in memory; repeat choose the first tuple (in sort order) among all buffer blocks; write the tuple to the output, and delete it from the buffer block; if the buffer block of any run Ri is empty and not end-of…

[8] Database-System-Concepts-7th-Edition — In general, if the relation is much larger than memory, there may be M or more runs generated in the first stage, and it is not possible to allocate a block for each run during the merge stage. In this case, the merge operation proceeds in multiple passes. Since there is enough memory for M−1 input …

[9] Database-System-Concepts-7th-Edition — Figure 15.4 illustrates the steps of the external sort–merge for an example relation. For illustration purposes, we assume that only one tuple fits in a block (fr = 1), and we assume that memory holds at most three blocks. During the merge stage, two blocks are used for input and one for output. 15.…

[10] Database-System-Concepts-7th-Edition — to disk. Second, there may be runs that are not read in or written out during a pass —for example, if there are ⌊M∕bb⌋ runs to be merged in a pass, ⌊M∕bb⌋ − 1 are read in and merged, and one run is not accessed during the pass. Ignoring the (relatively small) savings due to the latter effect, the to…

[11] Database-System-Concepts-7th-Edition — 6To be more precise, since we read each run separately and may get fewer than bb blocks when reading the end of a run, we may require an extra seek for each run. We ignore this detail for simplicity. 704 Chapter 15 Query Processing 2⌈br∕M⌉ + ⌈br∕bb⌉(2⌈log⌊M∕bb⌋−1(br∕M)⌉ − 1) Applying this equation t…

[12] Database-System-Concepts-7th-Edition — We also need to add the disk-seek costs. Run generation requires seeks for reading data for each of the runs as well as for writing the runs. Each merge pass requires around ⌈br∕bb⌉ seeks for reading data.6 Although the output is written sequentially, if it is on the same disk as the input runs, the…

[13] Database-System-Concepts-7th-Edition — 4Recall that if B+-tree file organizations are used to store relations, records may be moved between blocks when leaf nodes are split or merged, and when records are redistributed. 698 Chapter 15 Query Processing 15.3.2 Selections Involving Comparisons Consider a selection of the form σA≤v(r). We ca…

[14] Database-System-Concepts-7th-Edition — We can sort a relation by building an index on the sort key and then using that index to read the relation in sorted order. However, such a process orders the relation only logically, through an index, rather than physically. Hence, the reading of tuples in the sorted order may lead to a disk access…