P95/P99 Percentiles

Performance

Statistical measures of tail latency. P95 means 95% of requests are faster than this value. P99 catches the slowest 1%. Essential for understanding real user experience beyond misleading averages.

Updated Mar 17, 2026

Full Explanation

Percentiles tell you what your slowest users actually experience. The P95 latency is the value where 95% of requests complete faster. P99 is where 99% are faster. If your P95 is 200ms and your P99 is 800ms, it means 5% of your users wait longer than 200ms and 1% wait longer than 800ms.

Averages are misleading for latency measurements because latency distributions are heavily skewed. A service might have a 50ms average but a P99 of 2 seconds. The average looks great because the fast majority drowns out the slow minority. But that slow minority might represent your most valuable customers (they're doing complex operations) or users on degraded network paths.

For CDNs, percentiles reveal problems that averages hide. Your average TTFB might be 30ms (served from cache at the nearest edge), but the P99 might be 500ms (cache misses going to origin, users in regions without a nearby PoP, or TLS handshake storms). The P99 is where you find origin performance issues, cold cache paths, and geographic coverage gaps.

Amazon's famous rule is instructive: they found that every 100ms of latency at the P99 level cost them 1% in sales. Not the average, the P99. This is because the tail represents real users having bad experiences, and those users make purchasing decisions based on that experience.

To calculate percentiles from access logs, sort all response times and find the value at position N*P where N is the total count and P is the percentile (0.95, 0.99). For 10,000 requests sorted by latency, P95 is the value at position 9,500 and P99 is at position 9,900.

When monitoring CDN performance, track at minimum: P50 (median, the typical experience), P95 (where problems start showing), and P99 (the worst common experience). Some teams also track P99.9 for critical paths. Compare these across geographic regions, ISPs, and device types to find specific problem areas.

CDN SLAs increasingly use percentiles. A CDN might guarantee P95 TTFB under 100ms. This is a much stronger guarantee than an average-based SLA because it means even the tail must perform well. When evaluating CDN providers, ask for percentile metrics, not just averages.

Examples

# Calculate P95 and P99 from Nginx access log
# Assuming $request_time is the last field
awk '{print $NF}' /var/log/nginx/access.log | \
  sort -n | \
  awk 'BEGIN{n=0} {a[n++]=$1} END{
    printf "P50: %.3f\n", a[int(n*0.50)];
    printf "P95: %.3f\n", a[int(n*0.95)];
    printf "P99: %.3f\n", a[int(n*0.99)];
    printf "Max: %.3f\n", a[n-1];
  }'

# Prometheus query for P99 latency
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# CloudFront real-time logs P95 analysis
aws athena start-query-execution --query-string "
  SELECT
    approx_percentile(time_to_first_byte, 0.50) as p50_ttfb,
    approx_percentile(time_to_first_byte, 0.95) as p95_ttfb,
    approx_percentile(time_to_first_byte, 0.99) as p99_ttfb
  FROM cloudfront_logs
  WHERE date = '2026-03-15'
"

# Python: calculate percentiles
import numpy as np
latencies = [12, 15, 18, 22, 25, 30, 45, 80, 150, 800]
print(f"P50: {np.percentile(latencies, 50):.0f}ms")  # 27ms
print(f"P95: {np.percentile(latencies, 95):.0f}ms")  # 507ms
print(f"P99: {np.percentile(latencies, 99):.0f}ms")  # 735ms
print(f"Avg: {np.mean(latencies):.0f}ms")             # 119ms
# Average says 119ms, but P95 shows the real story

Frequently Asked Questions

Statistical measures of tail latency. P95 means 95% of requests are faster than this value. P99 catches the slowest 1%. Essential for understanding real user experience beyond misleading averages.

# Calculate P95 and P99 from Nginx access log
# Assuming $request_time is the last field
awk '{print $NF}' /var/log/nginx/access.log | \
  sort -n | \
  awk 'BEGIN{n=0} {a[n++]=$1} END{
    printf "P50: %.3f\n", a[int(n*0.50)];
    printf "P95: %.3f\n", a[int(n*0.95)];
    printf "P99: %.3f\n", a[int(n*0.99)];
    printf "Max: %.3f\n", a[n-1];
  }'

# Prometheus query for P99 latency
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# CloudFront real-time logs P95 analysis
aws athena start-query-execution --query-string "
  SELECT
    approx_percentile(time_to_first_byte, 0.50) as p50_ttfb,
    approx_percentile(time_to_first_byte, 0.95) as p95_ttfb,
    approx_percentile(time_to_first_byte, 0.99) as p99_ttfb
  FROM cloudfront_logs
  WHERE date = '2026-03-15'
"

# Python: calculate percentiles
import numpy as np
latencies = [12, 15, 18, 22, 25, 30, 45, 80, 150, 800]
print(f"P50: {np.percentile(latencies, 50):.0f}ms")  # 27ms
print(f"P95: {np.percentile(latencies, 95):.0f}ms")  # 507ms
print(f"P99: {np.percentile(latencies, 99):.0f}ms")  # 735ms
print(f"Avg: {np.mean(latencies):.0f}ms")             # 119ms
# Average says 119ms, but P95 shows the real story

Related CDN concepts include:

  • Latency — The time delay between a request and the start of its response. For CDNs, it's …
  • Throughput — The actual amount of data transferred per unit of time. Unlike bandwidth (maximum capacity), throughput …
  • TTFB (Time To First Byte) (TTFB) — The time from the start of a request to receiving the first byte of the …