> For the complete documentation index, see [llms.txt](https://www.pranaypourkar.co.in/the-programmers-guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://www.pranaypourkar.co.in/the-programmers-guide/system-design/design-foundations/system-characteristics.md). # System Characteristics ## About System characteristics define the fundamental properties that determine how a system performs, scales, and handles failures. These characteristics help architects design robust, scalable, and fault-tolerant systems. ## **1. Scalability** Scalability is the ability of a system to handle increasing amounts of work by adding resources. A scalable system ensures that performance does not degrade as demand grows. ### **Types of Scaling** * **Vertical Scaling (Scaling Up)** * Increasing the capacity of a single server (e.g., adding more CPU, RAM, or disk). * Has a physical limit—hardware can only be upgraded so much. * Example: Upgrading a database server from 32GB RAM to 128GB RAM. * **Horizontal Scaling (Scaling Out)** * Adding more servers to distribute the load. * Often preferred in cloud-based architectures for better redundancy. * Example: Adding multiple web servers behind a load balancer. * **Auto-Scaling** * Dynamically adding or removing resources based on demand. * Used in cloud environments (AWS Auto Scaling, Kubernetes Horizontal Pod Autoscaler). ### **Challenges in Scalability** * Data consistency across multiple nodes. * Load balancing efficiently. * Database sharding complexities. ## **2. Availability** Availability refers to the **percentage of time a system remains operational and accessible**. It is usually expressed as a **percentage (e.g., 99.99%)**, often called **“nines” of availability**. ### **Availability Levels** | Availability (%) | Downtime per Year | Downtime per Month | | -------------------- | ----------------- | ------------------ | | 99% (Two nines) | \~3.65 days | \~7.2 hours | | 99.9% (Three nines) | \~8.76 hours | \~43.8 minutes | | 99.99% (Four nines) | \~52.6 minutes | \~4.38 minutes | | 99.999% (Five nines) | \~5.26 minutes | \~26.3 seconds | ### **Methods to Improve Availability** * **Redundancy:** Deploying backup servers to avoid single points of failure. * **Failover Mechanisms:** Switching to standby resources if the primary system fails. * **Load Balancing:** Distributing traffic across multiple servers. * **Replication:** Keeping multiple copies of data to avoid data loss. ### **Trade-offs** * High availability often comes at the cost of **complexity and additional resources**. ## **3. Reliability** Reliability is the ability of a system to **perform correctly and consistently over time** without failures. A reliable system **minimizes unexpected downtimes and data inconsistencies**. ### **Factors Affecting Reliability** * **Hardware Failures:** Server crashes, disk failures. * **Software Bugs:** Memory leaks, race conditions, deadlocks. * **Network Failures:** Packet loss, connection timeouts. ### **Techniques to Improve Reliability** * **Error Handling and Recovery:** Implementing retry mechanisms and circuit breakers. * **Data Replication:** Ensuring backups exist in case of failures. * **Testing Strategies:** Unit tests, integration tests, and chaos engineering. ### **Difference Between Availability and Reliability**


Aspect	Availability	Reliability
Focus	Ensuring system is operational	Ensuring system works correctly over time
Metric	Uptime percentage (e.g., 99.99%)	Mean Time Between Failures (MTBF)
Example	A website is up 99.99% of the time	A website never crashes due to software bugs

## **4. Fault Tolerance** Fault tolerance is the **system's ability to continue operating even when components fail**. A fault-tolerant system does not crash completely due to failures. ### **Types of Faults** * **Transient Faults:** Temporary network failures, server timeouts. * **Intermittent Faults:** Occasional hardware failures. * **Permanent Faults:** Hardware crashes, disk corruption. ### **Fault Tolerance Mechanisms** * **Redundant Components:** Standby servers, multiple database replicas. * **Graceful Degradation:** Partial functionality when some services fail. * **Self-Healing Systems:** Detecting and automatically recovering from failures. ### **Example** A **fault-tolerant database** might use **leader-follower replication**. If the leader node fails, a follower takes over automatically. ## **5. Consistency** Consistency ensures that **all clients see the same data at any given time**. ### **Types of Consistency** * **Strong Consistency:** Every read receives the latest write. * **Eventual Consistency:** Data is updated eventually but might be inconsistent for a short time (used in NoSQL databases). * **Causal Consistency:** Guarantees that causally related updates appear in the correct order. ### **Trade-offs: CAP Theorem** According to **CAP Theorem**, a distributed system can only provide **two out of three** properties: 1. **Consistency** (C) – All nodes return the same data. 2. **Availability** (A) – The system remains responsive. 3. **Partition Tolerance** (P) – The system can function even when network partitions occur. ### Example: * SQL databases prioritize Consistency and Partition Tolerance (CP). * NoSQL databases prioritize Availability and Partition Tolerance (AP). ## **6. Durability** Durability ensures that **once a transaction is committed, it remains permanently stored** even in case of failures. ### **Durability Mechanisms** * **Write-Ahead Logging (WAL):** Logging every write operation before applying it. * **Data Replication:** Copying data to multiple locations. * **Snapshots and Backups:** Periodic data dumps to prevent data loss. ### **Example** A bank transaction that **deducts money from one account and adds it to another** must be **durable**. If a power outage occurs after the deduction, the system must ensure that the addition is completed when it restarts. ## Comparison


Characteristic	Definition	Key Considerations
Scalability	Ability to handle increased load	Vertical vs. Horizontal Scaling
Availability	Uptime percentage	Redundancy, Failover, Load Balancing
Reliability	Correct and consistent performance over time	Error Handling, Testing, Replication
Fault Tolerance	System's ability to function despite failures	Redundant components, Self-healing systems
Consistency	Ensures all users see the same data	CAP Theorem, Strong vs. Eventual Consistency
Durability	Data remains intact after crashes	Write-Ahead Logging, Data Replication

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://www.pranaypourkar.co.in/the-programmers-guide/system-design/design-foundations/system-characteristics.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.