Asynchronous Execution
About
Unlike RestTemplate
, which blocks the calling thread until the response is received, WebClient
is non-blocking and reactive by design. That means it can initiate multiple calls in parallel, freeing up threads and allowing better scalability especially important in high-concurrency environments like microservices or gateways.
Why Use Asynchronous Execution?
In modern distributed systems and microservice-based architectures, asynchronous execution is a crucial strategy for building responsive, resilient, and scalable applications. Traditional synchronous HTTP calls (as seen with RestTemplate
) block the calling thread until the remote service responds. This model works for simple use cases, but falls short under high concurrency or when integrating with slow/unreliable dependencies.
Using asynchronous execution with WebClient helps solve this by leveraging non-blocking I/O and reactive programming, which allows our application to remain responsive while waiting for external service responses.
1. Improved Scalability with Fewer Resources
In a blocking system, each HTTP call ties up a thread until a response is received. This limits scalability because threads are a limited and expensive resource. With async execution, threads are released immediately after making the call, allowing them to serve other requests.
For example: In a thread-per-request model, handling 1000 concurrent calls might need 1000 threads. With non-blocking async, we can achieve the same with just a fraction.
2. Faster Aggregate Latency
When calling multiple downstream services, synchronous calls are executed sequentially—each one adds to total latency. Asynchronous calls allow parallel execution, significantly reducing the overall response time.
For instance: Two services each take 500ms to respond. Sync execution takes ~1s, while async in parallel takes ~500ms.
3. Higher Throughput
Applications using asynchronous execution can handle more concurrent requests under the same hardware constraints, resulting in better throughput, especially under heavy load or spikes in traffic.
4. Responsive UI and APIs
Async calls help maintain responsiveness in front-end APIs or UI components. Backend calls don't block the thread serving the user, so our system feels more reactive and snappy, even during slow downstream responses.
5. Essential for Event-Driven and Reactive Systems
In reactive architectures (e.g., using Spring WebFlux), non-blocking async is not optional—it's foundational. It enables seamless data streaming, event chaining, and composition of async pipelines.
6. Better Error Isolation
Async flows allow graceful degradation. If one service call fails, the system can recover using fallback data or skip the failed step without crashing the entire flow.
7. Optimal for Cloud and Microservices
Async communication aligns well with microservices and cloud-native design, where:
Network latency is variable
Services might be temporarily unavailable
Horizontal scalability is expected
Observability and resilience are required (timeouts, retries, backpressure)
8. Resource Efficiency in Blocking Scenarios
Even when working with blocking databases or legacy systems, combining async I/O with bounded blocking thread pools helps us avoid resource starvation and maintain control over thread usage.
Core Mechanism
Asynchronous execution with Spring WebClient is powered by Reactor, the reactive programming library at the core of Spring WebFlux. The key idea is to not block threads while waiting for HTTP responses, but instead use event-driven, non-blocking I/O.
What Actually Happens Internally?
When we make a WebClient call asynchronously:
Mono<ResponseEntity<User>> mono = webClient.get()
.uri("/users/42")
.retrieve()
.toEntity(User.class);
The request is initiated, but the current thread is not blocked.
A Mono (a publisher representing a single future result) is returned.
The actual HTTP call happens in the background using non-blocking Netty I/O.
When the response arrives, a callback is triggered to process the result.
We can then:
mono.subscribe(response -> System.out.println(response.getBody()));
Or transform it:
User user = mono.block(); // This forces blocking — avoid this in reactive systems
Components in the Mechanism
WebClient
Provides the fluent API to define HTTP requests. Non-blocking by default.
Reactor Core
Enables reactive streams using Mono
(0..1) and Flux
(0..N).
Netty HTTP Client
Underlying engine for non-blocking HTTP. Uses event-loop based I/O instead of thread-per-request.
Event Loop (Reactor Netty)
Manages I/O readiness events efficiently using few threads.
Schedulers
Allows shifting execution to different thread pools (e.g., bounded elastic for blocking DB calls).
Backpressure Handling
Controls how fast data is produced/consumed — critical in streaming data.
Mono vs Flux: The Core of WebClient Responses
Mono<T>
Emits 0 or 1 item.
REST calls, single responses (e.g., GET /user/1
)
Flux<T>
Emits 0 to N items.
Streaming APIs, Server-Sent Events (SSE)
These types enable declarative async programming without using low-level Future
or callbacks manually.
Typical Async Use Case in Enterprise App
Imagine an order-service in an e-commerce platform responsible for handling user orders. While processing a new order, it must:
Fetch user profile from
user-service
Get product availability from
inventory-service
Calculate discount via
promotion-service
These calls can be made in parallel, improving overall latency.
Async Workflow Using WebClient
Mono<User> userMono = webClient.get()
.uri("http://user-service/api/users/{id}", userId)
.retrieve()
.bodyToMono(User.class);
Mono<Product> productMono = webClient.get()
.uri("http://inventory-service/api/products/{id}", productId)
.retrieve()
.bodyToMono(Product.class);
Mono<Discount> discountMono = webClient.get()
.uri("http://promotion-service/api/discounts/{id}", promoCode)
.retrieve()
.bodyToMono(Discount.class);
// Combine responses asynchronously
Mono<OrderResponse> responseMono = Mono.zip(userMono, productMono, discountMono)
.map(tuple -> {
User user = tuple.getT1();
Product product = tuple.getT2();
Discount discount = tuple.getT3();
return buildOrderResponse(user, product, discount);
});
Explanation
Mono.zip(...)
combines multiple async responses.All service calls start immediately and run concurrently, not sequentially.
Once all results are available, the transformation logic builds the final response.
No thread is blocked while waiting for service calls.
Using CompletableFuture for Async Integration
While WebClient
is inherently asynchronous and reactive (based on Project Reactor), in many real-world enterprise applications, we might not use the full reactive stack (e.g., Flux
, Mono
) across layers. Instead, we may prefer using CompletableFuture
for async composition, especially in layered or legacy architectures.
Spring WebClient can integrate smoothly with CompletableFuture
by bridging the reactive and future-based async worlds.
Why Use CompletableFuture
Instead of Mono/Flux Directly ?
CompletableFuture
Instead of Mono/Flux Directly ?Reactive chain across layers
Natural and efficient
Not ideal – requires adaptation
Integration with legacy codebases
Might be intrusive
Seamless, since CompletableFuture
is JDK
Familiarity in team
Reactive APIs can have steep learning curve
CompletableFuture is widely known
Minimal dependencies
Requires Reactor
CompletableFuture is part of Java
Blocking downstream logic
Requires careful scheduling (boundedElastic etc.)
More straightforward to use imperatively
Behavior Comparison: WebClient vs CompletableFuture
Execution model
Fully non-blocking, reactive
Non-blocking WebClient, wrapped in future
Thread management
Event loop + schedulers
ForkJoinPool or custom thread pool
Integration fit
Best in reactive end-to-end pipelines
Easier in non-reactive or mixed codebases
Flow control
Reactive operators (zip
, flatMap
)
Java 8+ chaining (thenCombine
, thenApply
)
Backpressure
Supported
Not supported
Debuggability (for many devs)
Can be harder due to async chains
More familiar for developers
When Not Ideal
When we are already using WebFlux or Project Reactor end-to-end
When we need backpressure handling and advanced stream operations
When our system must optimize for extreme throughput and responsiveness
Example: Parallel Service Calls using CompletableFuture and WebClient
Use Case
In an order processing system:
Fetch user info from
user-service
Get product details from
catalog-service
Call both concurrently, combine result
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import java.util.concurrent.CompletableFuture;
@Service
public class OrderAggregationService {
private final WebClient webClient;
public OrderAggregationService(WebClient.Builder builder) {
this.webClient = builder.baseUrl("http://localhost").build();
}
public CompletableFuture<User> getUser(String userId) {
return webClient.get()
.uri("http://user-service/api/users/{id}", userId)
.retrieve()
.bodyToMono(User.class)
.toFuture();
}
public CompletableFuture<Product> getProduct(String productId) {
return webClient.get()
.uri("http://catalog-service/api/products/{id}", productId)
.retrieve()
.bodyToMono(Product.class)
.toFuture();
}
public CompletableFuture<OrderResponse> getOrderDetails(String userId, String productId) {
CompletableFuture<User> userFuture = getUser(userId);
CompletableFuture<Product> productFuture = getProduct(productId);
return userFuture.thenCombine(productFuture, (user, product) -> {
return new OrderResponse(user, product);
});
}
}
Explanation
bodyToMono().toFuture()
bridgesMono
toCompletableFuture
thenCombine()
merges both async results when both completeWe can further chain additional async logic
Thread pool usage is typically
ForkJoinPool
, but can be customized
Threading Consideration
Spring’s WebClient
is inherently non-blocking and asynchronous. However, the actual threading behavior depends on:
Whether we are using it with
Mono
/Flux
(reactive pipelines)Or wrapping it with
CompletableFuture
The broader application architecture (WebFlux vs MVC)
Understanding the threading model helps prevent common pitfalls like blocking on non-blocking threads, thread leaks, and unnecessary CPU starvation.
WebClient Execution Model: Reactor Context
By default, WebClient uses Reactor Netty and follows the reactive programming model:
I/O threads (aka event loop threads) are used for sending and receiving HTTP requests
All operations like serialization, deserialization, transformation (
map
,flatMap
, etc.) run on reactor-managed threadsHeavy or blocking tasks should never run on these threads they must be offloaded using a scheduler
Mono.just("data")
.map(this::process) // stays on event loop thread
.subscribe();
This works for light, fast operations. For blocking ones:
Mono.fromCallable(this::blockingTask)
.subscribeOn(Schedulers.boundedElastic()) // switches thread pool
CompletableFuture + WebClient: Thread Behavior
When we convert from Mono
to CompletableFuture
:
webClient.get()
.uri("/api/data")
.retrieve()
.bodyToMono(MyResponse.class)
.toFuture();
This operation:
Executes the request on a Netty I/O thread
Returns immediately (non-blocking)
CompletableFuture
completes when the data is available
Downstream processing (e.g., .thenApply(...)
) executes on the ForkJoinPool.commonPool by default unless customized.
What Can Go Wrong ?
Calling block()
on a reactive chain
Blocks Netty I/O thread → kills scalability
Performing blocking DB or file I/O in .map()
Runs on event loop → causes performance bottleneck
Misconfigured CompletableFuture
blocking on get()
Converts async to sync and ties up threads
Overloading common pool with CPU-heavy tasks
Starves thread pool for other async work
Thread Pool Guidelines
Reactive non-blocking pipelines
Reactor’s default (event loop + internal)
Blocking code in reactive chain
Schedulers.boundedElastic()
CompletableFuture
heavy work
Custom thread pool via ExecutorService
Lightweight transformations (mapping, filter)
OK on common pool or event loop
Last updated