마이크로서비스 아키텍처(MSA)나 현대적인 백엔드 시스템을 설계하다 보면, 클라이언트의 단일 요청을 처리하기 위해 서버 내부적으로 여러 개의 외부 서비스를 호출해야 하는 상황에 자주 직면합니다. 예를 들어, 사용자의 대시보드 화면을 구성하기 위해 '결제 내역 API', '사용자 프로필 API', '맞춤형 추천 API'를 모두 조회해야 할 수 있습니다.

이때 가장 직관적이고 구현하기 쉬운 방법은 각 API를 순차적으로 호출하는 것입니다. 하지만 첫 번째 요청을 보내고 응답을 기다린 뒤, 두 번째 요청을 보내는 방식은 치명적인 성능 저하를 유발합니다. 결제 API가 300ms, 프로필 API가 200ms, 추천 API가 400ms가 걸린다면, 사용자는 데이터를 받아보기 위해 최소 900ms 이상을 대기해야 합니다. 사용자 이탈률이 페이지 로딩 속도와 직결되는 환경에서 이러한 누적 지연 시간은 서비스의 치명적인 약점이 됩니다.

이러한 문제를 해결하고 시스템의 응답성을 극대화하기 위해 자바(Java) 개발자는 스레드(Thread)를 활용한 병렬 처리를 도입해야 합니다. 본 포스트에서는 외부 호출 병렬화를 통해 API 성능을 최적화하는 과정에서 왜 자바 스레드가 필수적인지, 운영체제의 I/O 블로킹 원리부터 스레드 풀 크기 산정 공식까지 깊이 있게 파헤쳐 보겠습니다.

외부 API 호출의 함정: 네트워크 I/O와 스레드 블로킹

스레드를 왜 사용해야 하는지 명확히 이해하려면, 자바 가상 머신(JVM)과 운영체제가 네트워크 통신을 어떻게 처리하는지 들여다보아야 합니다.

자바 애플리케이션이 외부 API로 동기식 HTTP 요청을 보낼 때, 실행 중인 스레드는 네트워크 입출력(I/O)이 완료될 때까지 대기(Blocked) 또는 일시 정지(Waiting) 상태에 빠집니다 [1]. 단일 스레드로 동작하는 서버 환경이라면, 이 스레드가 네트워크 응답을 기다리는 동안 CPU는 아무런 유용한 작업도 하지 못한 채 유휴(Idle) 상태로 방치됩니다 [1, 2].

이 상황은 마치 "물이 끓기를 기다렸다가 신문을 읽기 시작하는 것"과 같습니다. 물이 끓는 동안(네트워크 대기) 신문을 읽는다면(다른 작업 처리) 시간을 훨씬 효율적으로 쓸 수 있음에도 말이죠 [2]. 여러 외부 API를 호출할 때 각각의 네트워크 요청에 독립적인 스레드를 할당하면, 하나의 스레드가 I/O 블로킹 상태에 빠지더라도 CPU가 다른 활성 스레드로 컨텍스트 스위칭(Context Switching)을 하여 남은 요청을 동시에 처리할 수 있게 됩니다 [1, 3]. 이것이 API 병렬 호출 시 멀티스레딩이 필수적인 가장 근본적인 이유입니다.

암달의 법칙(Amdahl's Law)과 병렬 처리의 수학적 근거

여러 API를 병렬로 호출하여 얻을 수 있는 성능 향상의 폭은 컴퓨터 과학의 핵심 원리인 암달의 법칙(Amdahl's Law)으로 설명할 수 있습니다 [4].

이 법칙에 따르면, 전체 프로그램 중 순차적으로 실행되어야 하는 부분의 비율을

F

, 프로세서의 개수를

N

이라고 할 때, 얻을 수 있는 최대 속도 향상(Speedup)은

1 / (F + (1-F)/N)

에 수렴합니다 [4]. 즉, 애플리케이션의 확장성과 속도는 '순차적으로 실행되어야만 하는 코드의 양'에 의해 엄격하게 제한됩니다 [4].

다행히도 단일 API 엔드포인트 내에서 여러 외부 시스템의 데이터를 조회하는 작업은 서로의 결과에 의존하지 않는, 병렬화 가능성이 극도로 높은 작업입니다 [5]. 데이터를 가져오는 동안 공유 상태를 수정하지 않으므로 순차 실행 비율(

F

)을 최소화할 수 있습니다. 따라서 순차적으로 대기하던 I/O 작업들을 스레드를 이용해 병렬 영역으로 옮기게 되면, 전체 API 응답 시간은 누적합(900ms)이 아닌 가장 오래 걸리는 단일 네트워크 호출 시간(400ms)으로 단축되는 극적인 성능 최적화를 이룰 수 있습니다.

무한정 스레드 생성의 위험성과 스레드 풀(Thread Pool)의 도입

그렇다면 성능을 높이기 위해 외부 API를 호출할 때마다 매번 new Thread().start()를 호출하여 새로운 스레드를 생성하면 될까요? 이론적으로는 순차 실행보다 빠르겠지만, 실무적인 관점에서 이는 시스템을 붕괴시키는 안티 패턴입니다.

스레드 생성 및 해제 오버헤드: 스레드를 생성하는 작업은 결코 공짜가 아닙니다. JVM과 운영체제 수준에서 메모리(실행 스택)를 할당해야 하며, 잦은 스레드 생성은 오히려 요청 처리 지연(Latency)을 유발합니다 [6, 7].

자원 고갈과 안정성 저하: 활성화된 스레드는 막대한 시스템 메모리를 소비합니다. 네트워크 응답을 기다리는 유휴 스레드가 수천 개씩 쌓이게 되면 가비지 컬렉터(GC)에 엄청난 압박을 주며, 결국 JVM은 OutOfMemoryError를 뱉어내며 참혹하게 종료될 수 있습니다 [7, 8].

이러한 불안정성을 방지하면서 안전하게 성능을 최적화하기 위해, 자바에서는 작업을 실행과 분리하는 `ExecutorService` 프레임워크(스레드 풀)를 사용해야 합니다 [9]. 스레드 풀을 사용하면 외부 API 호출 요청이 들어올 때마다 기존에 생성된 스레드를 재사용하므로 객체 생성 비용이 상쇄되며, 스레드의 최대 개수를 제한(Bounding)하여 과도한 트래픽이 몰려도 서버의 메모리 자원 고갈을 방어할 수 있습니다 [10].

실전 구현: Callable과 invokeAll을 활용한 타임아웃 제어

외부 API 호출의 특성은 단순한 백그라운드 작업과 다릅니다. 최종 API 응답을 만들기 위해서는 각 스레드가 수행한 작업의 결과값(Return Value)이 반드시 필요합니다 [11, 12]. 기존의 Runnable 인터페이스는 결과값을 반환하거나 예외를 던질 수 없으므로, 이러한 지연 연산(Deferred computation)에는 Callable 인터페이스가 훨씬 적합합니다 [12].

아래는 자바의 동시성 라이브러리를 활용하여 항공권 예약 플랫폼이 여러 외부 항공사 API를 병렬로 호출하는 예시입니다 [5, 13].

import java.util.concurrent.*;
import java.util.*;

public class TravelAggregatorAPI {
    // 1. 네트워크 I/O에 최적화된 스레드 풀 초기화
    private final ExecutorService executor = Executors.newFixedThreadPool(50);

    public List<TravelQuote> fetchQuotesConcurrently(List<String> airlines) {
        List<Callable<TravelQuote>> tasks = new ArrayList<>();
        
        // 2. 외부 API 호출 작업을 Callable로 래핑
        for (String airline : airlines) {
            tasks.add(() -> fetchFromExternalAirlineAPI(airline));
        }

        List<TravelQuote> quotes = new ArrayList<>();
        try {
            // 3. invokeAll을 통해 모든 작업을 병렬로 제출하고 3초의 엄격한 타임아웃을 강제함
            List<Future<TravelQuote>> futures = executor.invokeAll(tasks, 3, TimeUnit.SECONDS);
            
            // 4. 결과 추출
            for (Future<TravelQuote> future : futures) {
                if (!future.isCancelled()) {
                    try {
                        quotes.add(future.get());
                    } catch (ExecutionException e) {
                        // 개별 API 실패가 전체 API의 실패로 이어지지 않도록 예외 처리
                        logError("외부 API 호출 실패", e);
                    }
                } else {
                    logWarning("외부 API 응답 지연으로 작업이 취소되었습니다.");
                }
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
        
        return quotes;
    }

    private TravelQuote fetchFromExternalAirlineAPI(String airline) {
        // 실제 외부 API와 동기식 네트워크 통신을 수행하는 로직 (수백 ms 소요)
        return new TravelQuote(airline, 450.00); 
    }
}
Java
복사

여러 Callable 작업을 병렬로 실행할 때 가장 유용한 메서드는 invokeAll 입니다 [5]. 위 코드에서 주목할 부분은 invokeAll에 부여된 타임아웃(Time budget) 설정입니다 [5]. 외부 API는 언제든 지연되거나 먹통이 될 수 있습니다. 타임아웃을 지정하지 않으면 특정 외부 서비스의 장애가 우리 시스템의 스레드를 무한정 대기하게 만들어 '스레드 기아 상태(Starvation)'를 유발합니다 [14]. 타임아웃을 설정하면 지정된 시간이 지나도 완료되지 않은 남은 작업(Task)들은 프레임워크에 의해 자동으로 취소(isCancelled)되므로, 메인 API는 사용자에게 일관된 응답 속도를 보장할 수 있습니다 [5, 14].

I/O 바운드 작업의 스레드 풀 크기 산정 공식

스레드를 사용하여 API 성능을 튜닝할 때 가장 많이 묻는 질문은 "그럼 스레드 풀 사이즈는 몇 개로 설정해야 하는가?"입니다. 단순히 숫자를 찍어 맞추는 것이 아니라, 시스템의 물리적 환경과 작업의 특성에 맞춘 명확한 수학적 산식으로 접근해야 합니다 [15].

적정 스레드 풀 크기(

N_{threads}

)를 구하는 공식은 다음과 같습니다 [15]:

N_{threads} = N_{cpu} * U_{cpu} * (1 + W/C)

•

NcpuN_{cpu}Ncpu​: 사용 가능한 CPU 코어 수

•

UcpuU_{cpu}Ucpu​: 목표로 하는 CPU 활용률 (0과 1 사이)

•

W/CW/CW/C: 대기 시간(Wait time)과 계산 시간(Compute time)의 비율

API 내에서 다른 외부 API를 호출하는 작업은 전형적인 I/O 바운드(I/O-bound) 작업입니다. 스레드는 CPU에서 데이터를 연산(

C

)하는 시간보다, 네트워크 패킷을 기다리는 대기 시간(

W

)에 절대적으로 많은 시간을 소비합니다 [15].

이 경우

W/C

의 비율이 매우 높기 때문에, 최적의 스레드 풀 크기는 실제 서버의 물리적 CPU 코어 수보다 훨씬 크게 설정되어야 합니다 [15]. 만약 이 공식을 무시하고 CPU 코어 수와 동일하게 스레드 풀을 작게 잡는다면, 스레드들이 모두 네트워크 응답을 기다리며 대기 상태에 빠질 때 CPU는 놀고 있게 되어 처리량(Throughput)이 현저히 떨어지게 됩니다 [14, 15]. 반대로 과도하게 크게 잡으면 스레드 경합과 컨텍스트 스위칭 비용으로 인해 메모리가 낭비됩니다. 시스템의 실제 지연 시간과 CPU 사용률을 모니터링하며 공식을 기반으로 튜닝하는 것이 진정한 최적화의 완성입니다 [15, 16].

결론 및 요약

현대 백엔드 개발에서 하나의 API가 여러 외부 리소스를 취합해야 하는 요구사항은 피할 수 없습니다. 이때 자바 스레드를 활용하는 이유는 단순히 프로그램이 '작동'하게 만들기 위함이 아니라, I/O 대기 시간으로 인해 버려지는 CPU의 유휴 자원을 극대화하여 활용하고 누적되는 레이턴시를 혁신적으로 단축하기 위함입니다 [1, 2, 16].

본문에서 살펴보았듯, ExecutorService와 스레드 풀을 활용한 비동기 병렬 호출은 순차적인 병목을 제거해 줍니다 [9, 10]. 하지만 동시성의 세계에서는 예기치 않은 시스템 장애를 예방하기 위해 Callable, Future, invokeAll을 사용한 제한 시간(Timeout) 설정과 [5, 11]

N_{cpu} * U_{cpu} * (1 + W/C)

공식을 활용한 정교한 스레드 풀 사이징이 반드시 병행되어야 합니다 [15]. 이러한 원리와 기술을 올바르게 이해하고 적용한다면, 수백만 건의 트래픽 앞에서도 견고하고 응답성이 뛰어난 '사용자 중심'의 시스템 아키텍처를 완성할 수 있을 것입니다.

참고문헌

[1] Java Concurrency in Practice — 1.2.3 Simplified handling of asynchronous events A server application that accepts socket connections from multiple remote clients may be easier to develop when each connection is allocated its own thread and allowed to use synchronous I/O. If an application goes to read from a socket when no data i…

[2] Java Concurrency in Practice — Using multiple threads can also help achieve better throughput on single-processor systems. If a program is single-threaded, the processor remains idle while it waits for a synchronous I/O operation to complete. In a multithreaded program, another thread can still run while the first thread is waiti…

[3] Java Concurrency in Practice — 11.3 Costs introduced by threads Single-threaded programs incur neither scheduling nor synchronization over-head, and need not use locks to preserve the consistency of data structures. Scheduling and interthread coordination have performance costs; for threads to offer a performance improvement, the…

[4] Java Concurrency in Practice — Most concurrent programs have a lot in common with farming, consisting of a mix of parallelizable and serial portions. Amdahl’s law describes how much a program can theoretically be sped up by additional computing resources, based on the proportion of parallelizable and serial components. If F is th…

[5] Java Concurrency in Practice — Fetching a bid from one company is independent of fetching bids from an-other, so fetching a single bid is a sensible task boundary that allows bid retrieval to proceed concurrently. It would be easy enough to create n tasks, submit them to a thread pool, retain the Futures, and use a timed get to f…

[6] Java Concurrency in Practice — 116 Chapter 6. Task Execution 6.1.3 Disadvantages of unbounded thread creation For production use, however, the thread-per-task approach has some practical drawbacks, especially when a large number of threads may be created: Thread lifecycle overhead. Thread creation and teardown are not free. The a…

[7] Java Concurrency in Practice — Resource consumption. Active threads consume system resources, especially memory. When there are more runnable threads than available process-ors, threads sit idle. Having many idle threads can tie up a lot of memory, putting pressure on the garbage collector, and having many threads com-peting for …

[8] Java Concurrency in Practice — Up to a certain point, more threads can improve throughput, but beyond that point creating more threads just slows down your application, and creating one thread too many can cause your entire application to crash horribly. The way to stay out of danger is to place some bound on how many threads you…

[9] Java Concurrency in Practice — 6.2. The Executor framework 117 6.2 The Executor framework Tasks are logical units of work, and threads are a mechanism by which tasks can run asynchronously. We’ve examined two policies for executing tasks using threads—execute tasks sequentially in a single thread, and execute each task in its own…

[10] Java Concurrency in Practice — 3. This is analogous to one of the roles of a transaction monitor in an enterprise application: it can throttle the rate at which transactions are allowed to proceed so as not to exhaust or overstress limited resources. 120 Chapter 6. Task Execution Executing tasks in pool threads has a number of ad…

[11] Java Concurrency in Practice — 6.3. Finding exploitable parallelism 125 public class SingleThreadRenderer { void renderPage(CharSequence source) { renderText(source); List<ImageData> imageData = new ArrayList<ImageData>(); for (ImageInfo imageInfo : scanForImageInfo(source)) imageData.add(imageInfo.downloadImage()); for (ImageDat…

[12] Java Concurrency in Practice — Many tasks are effectively deferred computations—executing a database query, fetching a resource over the network, or computing a complicated func-tion. For these types of tasks, Callable is a better abstraction: it expects that the main entry point, call, will return a value and anticipates that it…

[13] Java Concurrency in Practice — 134 Chapter 6. Task Execution private class QuoteTask implements Callable<TravelQuote> { private final TravelCompany company; private final TravelInfo travelInfo; ... public TravelQuote call() throws Exception { return company.solicitQuote(travelInfo); } } public List<TravelQuote> getRankedTravelQuo…

[14] Java Concurrency in Practice — One technique that can mitigate the ill effects of long-running tasks is for tasks to use timed resource waits instead of unbounded waits. Most blocking meth-ods in the plaform libraries come in both untimed and timed versions, such as Thread.join, BlockingQueue.put, CountDownLatch.await, and Select…

[15] Java Concurrency in Practice — 8.3. Configuring ThreadPoolExecutor 171 by running the application using several different pool sizes under a benchmark load and observing the level of CPU utilization. Given these definitions: Ncpu = number of CPUs Ucpu = target CPU utilization, 0 ≤ Ucpu ≤ 1 W C = ratio of wait time to compute time…

[16] Java Concurrency in Practice — In using concurrency to achieve better performance, we are trying to do two things: utilize the processing resources we have more effectively, and enable our program to exploit additional processing resources if they become available. From a performance monitoring perspective, this means we are look…