PromQL
About
Prometheus is a time-series database and monitoring system that collects metrics from our applications and infrastructure. It uses a powerful query language called PromQL (Prometheus Query Language) to extract, filter, and aggregate metrics over time.
In OpenShift, Prometheus is often used as the backend for monitoring the cluster, workloads, nodes, and custom applications. We can execute PromQL queries directly from the OpenShift Web Console (under Observe → Metrics) or via the Prometheus UI.
Purpose of Prometheus Queries in OpenShift
Monitor pod or node CPU/memory usage
Track container restarts or failures
Measure request rates, latencies, and errors
Create custom dashboards or alerts
Troubleshoot performance or resource issues
Basic Structure of PromQL
Prometheus queries work on metrics, each optionally labeled with key-value pairs for filtering. A query generally looks like:
metric_name{label1="value1", label2="value2"} [range]You can use:
Instant vector queries (current value):
metric_name{...}Range vector queries (over time):
metric_name{...}[5m]Functions for aggregation, rates, comparisons, etc.
Executing Queries in OpenShift Console
Open OpenShift Console
Navigate to: Observe → Metrics
In the "Expression" box, enter your PromQL query
Choose:
“Run” to get instant values
“Show Graph” for visual trends over time
Optional: Filter by namespace, label, or duration

Common Prometheus Queries in OpenShift
1. CPU Usage
Query:
Returns the average CPU usage (in cores) over 5 minutes for the pod. Helps detect if the pod is under or over-utilizing CPU. Useful for auto-scaling and performance monitoring.

2. Memory Usage
Query:
Shows how much memory (in bytes) the pod is using. Helps detect memory leaks, over-consumption, or the need for adjusting memory requests/limits.

3. Pod Restarts
Query:
Displays the total number of container restarts for the pod. Useful for identifying unstable or crash-looping containers.

4. Thread Count per JVM State
Query:
Breaks down the number of JVM threads by state (e.g., RUNNABLE, BLOCKED, TIMED_WAITING). This is useful for diagnosing thread leaks or blocked thread pools in Java applications.

5. Running Pods (for cross-check)
Query:
Ensures the pod is currently running. You can use this to validate rollout status, pod health, or replicas during deployments.

6. CPU Requests vs Usage
Requested CPU:
Used CPU:
These queries help compare how much CPU was requested vs how much is being used. Critical for detecting over-requesting or under-provisioning.
7. Memory Requests vs Usage
Requested Memory:
Used Memory:
Helps understand whether our pod is requesting more or less memory than it actually needs. Important for resource optimization and avoiding OOM errors.
8. Network Usage
Receive:
Transmit:
Monitors how much network traffic (in bytes/sec) the pod is receiving and sending. Useful for identifying chatty services, bottlenecks, or unexpected spikes in network load.
9. Container Restarts Over Time
Query:
Tracks restart count increases over time. Helps correlate instability with specific time windows or deployments.
10. Error Rate (If We Have HTTP Metrics)
Query (example, requires instrumentation):
Shows how many 5xx errors occurred in the last 5 minutes. Helps identify backend failures, circuit breaker trips, or timeouts in dependent services.
Last updated