> For the complete documentation index, see [llms.txt](https://www.pranaypourkar.co.in/the-programmers-guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://www.pranaypourkar.co.in/the-programmers-guide/system-design/observability/distributed-tracing/fundamentals.md).

# Fundamentals

## **Trace**

A **trace** represents the complete journey of a request as it propagates through various services in a distributed system. It is composed of one or more **spans**, each representing a specific operation within the request.

## **Span**

A **span** is a unit of work or operation within a trace.

Each span contains:

* A **Span ID**: A unique identifier for the span.
* **Start and End Timestamps**: To measure the duration of the operation.
* **Attributes**: Metadata such as service name, method, and status code.
* **Parent Span ID**: The span ID of the parent operation, linking spans hierarchically.
* **Logs**: Events or errors that occurred during the span.

## **Trace ID**

A globally unique identifier that ties together all spans within a single trace. It is propagated across service boundaries to maintain continuity in the trace.

## **Context Propagation**

Refers to the mechanism of passing the **Trace ID** and **Span ID** between services.

Commonly done via headers, such as:

* `X-B3-TraceId`
* `X-B3-SpanId`
* `X-B3-ParentSpanId`

## **Root Span**

The first span in a trace, typically representing the entry point of the request, such as an API gateway or front-end service.

### **Child Span**

A span that represents a sub-operation within the trace. It is linked to its parent span using the **Parent Span ID**.

### **Error Span**

A span that records an operation that failed or encountered an issue. Typically contains additional logs and annotations about the failure.

## **Sampling**

The process of deciding whether to trace a particular request.

Types:

* **Always Sampling**: Traces every request (useful in development).
* **Probability Sampling**: Traces only a percentage of requests (e.g., 1%).

## **Annotations/Tags**

Metadata associated with a span, providing context about the operation.

Examples:

* `http.method`: "GET"
* `http.url`: "/api/orders"
* `status`: "200 OK"

## **Logs**

Structured or unstructured events recorded within a span.

Useful for debugging and capturing key moments, like:

* A database query start/end.
* An error occurrence.

## **Parent-Child Relationship**

Defines the hierarchical structure of spans within a trace. A **parent span** can have multiple **child spans**, representing sequential or parallel operations.

## **Baggage**

Metadata or data that propagates along with the trace context across service boundaries. Useful for passing information, such as a tenant ID or user session, without including it in the span itself.

## **Service Dependency Graph**

A visualization of the interactions between services in a distributed system. Derived from spans and traces, showing how services depend on each other.

## **Root Cause Analysis**

The process of using traces to identify the origin of a failure or bottleneck in a distributed system.

## **Instrumentation**

The process of adding tracing logic to code, typically using libraries like Spring Cloud Sleuth, OpenTelemetry, or custom SDKs.

* **Automatic Instrumentation**: Provided by frameworks and tools.
* **Manual Instrumentation**: Developers explicitly add tracing logic.

### 1. **Manual Instrumentation**

We explicitly write code to create spans or log trace information.

**Example**:\
We wrap a method with a "start span" and "end span" to measure how long it takes and report it to our tracing system.

**When used:**

* For custom business logic
* For parts not automatically instrumented (e.g., low-level or third-party libraries)

### 2. **Automatic Instrumentation**

Frameworks or libraries automatically capture telemetry data without us writing trace code.

**When used:**

* For standard components like HTTP clients, REST controllers, database connections, etc.
* Tools like Spring Cloud Sleuth, OpenTelemetry Java Agent, or New Relic auto-instrument these layers

## **Distributed Context**

A collection of information (Trace ID, Span ID, Baggage) shared across services in a distributed trace.

## **Propagation Formats**

Standards for propagating trace and span information:

* **B3 Propagation**: Used by Zipkin.
* **W3C Trace Context**: An open standard supported by OpenTelemetry.

## **Latency**

The time taken for a span to complete its operation. Helps identify slow operations in a service.

## **Trace Sampling Rate**

The proportion of requests traced to reduce overhead. For example, setting the rate to **0.1** means 10% of requests are sampled.

## **Trace Aggregation**

The process of collecting traces from multiple services into a centralized system, like Zipkin, Jaeger, or OpenTelemetry backends.

## **Dependency Heatmap**

A visualization tool that highlights services with high latency or error rates.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://www.pranaypourkar.co.in/the-programmers-guide/system-design/observability/distributed-tracing/fundamentals.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
