> For the complete documentation index, see [llms.txt](https://www.pranaypourkar.co.in/the-programmers-guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://www.pranaypourkar.co.in/the-programmers-guide/system-design/architectural-building-blocks/caching.md).

# Caching

## About

Caching is a design technique used in software systems to temporarily store frequently accessed data in a faster storage layer so that future requests for that data can be served more quickly. Instead of fetching the data from a slower or more expensive resource (like a database, external service, or disk), the system retrieves it from a cache, which is typically memory-based and significantly faster.

A cache can reside at various levels — within the application, in a shared service like Redis, or even in front of the system as a content delivery network (CDN). The basic idea is to trade memory or local compute resources to gain faster response times and reduce the load on downstream systems.

Common examples of cached data include user profiles, configuration settings, product details, access tokens, or computed results of expensive operations.

## Why Cache?

Caching serves multiple high-level system objectives. The main drivers for using caching include:

### 1. Reducing Latency

Fetching data from memory (cache) is orders of magnitude faster than from databases or remote services. Caching helps reduce the end-to-end latency of a request, improving application responsiveness and user experience.

Example: Serving user profile data from Redis cache in 2 ms vs querying the database in 50–100 ms.

### 2. Improving Throughput

Caching offloads the backend data stores by absorbing repeated requests. With fewer calls to the underlying systems, those systems can handle more diverse queries or user requests overall.

Example: If 90% of the traffic is handled by cache, only 10% hits the primary data store, allowing it to serve other operations more efficiently.

### 3. Enabling Scalability

As user traffic increases, systems need to scale. Caching allows horizontal scaling by reducing pressure on centralized systems. This is particularly important in distributed systems where multiple stateless application instances are running.

Example: A system with distributed caching can serve millions of requests per second with consistent performance without overloading the database.

### 4. Increasing Fault Tolerance

In certain scenarios, a cache can serve stale or fallback data if the primary data source is down, helping the system degrade gracefully instead of failing entirely.

Example: Serve a cached version of pricing or configuration during a temporary database outage.

## Where Caching Fits in System Architecture ?

Caching can be introduced at various layers in a system's architecture depending on what data is being cached and what the performance goals are. Common placement options include:

### 1. Client-Side Caching

* Implemented in browsers or mobile apps.
* Used for static assets, session tokens, or small data payloads.

### 2. Application-Level Caching

* Typically in-memory caches like `ConcurrentHashMap` in Java or frameworks like Spring’s `@Cacheable`.
* Limited to a single application instance (pod-level).
* Fastest access, but no data sharing between instances.

### 3. Distributed Caching Layer

* Used to share cached data between multiple pods or microservices.
* Technologies include Redis, Memcached, Hazelcast, Infinispan.
* Suitable for horizontally scalable architectures.

### 4. Database Caching

* Query result caches, materialized views, or second-level ORM caches (e.g., Hibernate’s L2 cache).
* Reduces pressure on database for repeated queries.

### 5. CDN or Edge Caching

* Used in web applications to cache static resources like images, JS/CSS files.
* Deployed closer to the user’s location to reduce network latency.

Each caching layer has different access patterns, expiry mechanisms, and trade-offs. In modern architectures, multiple types of caches are often used in combination.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://www.pranaypourkar.co.in/the-programmers-guide/system-design/architectural-building-blocks/caching.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.