Building a QA Reliability Dashboard in Grafana

Data Ingestion & Metric Standardization #

Standardize test output formats across CI runners before visualization. Export Jest, Playwright, or Cypress results to structured JSON and forward to Prometheus or Loki. Tag every metric with repo, branch, test_suite, and environment. This foundational step aligns ingestion pipelines with broader Flaky Test Detection & Quarantine Engineering workflows.

Core Panel Configuration #

Deploy four primary panels: a Stat panel for daily pass rate, a Time Series graph for flakiness trends, a Table for quarantined tests, and a Gauge for stability scores. Use PromQL to calculate rolling 7-day averages, smoothing daily CI noise. Apply conditional formatting to highlight tests exceeding a 5% flakiness threshold.

sum(rate(test_flaky_total{status="failed_retry"}[7d])) / sum(rate(test_executions_total[7d])) * 100

{
 "fieldConfig": {
 "defaults": {
 "thresholds": {
 "mode": "absolute",
 "steps": [
 { "color": "green", "value": null },
 { "color": "yellow", "value": 3 },
 { "color": "red", "value": 5 }
 ]
 }
 }
 }
}

Dynamic Filtering & Drill-Downs #

Implement dashboard template variables for repo, branch, and test_status. Link panels using these variables so clicking a quarantined test automatically filters the time series. Ensure all drill-downs preserve CI execution context to prevent false positives during pipeline debugging.

Common Pitfalls & Troubleshooting #

Aggregating metrics across unrelated test suites masks localized flakiness.
Omitting CI environment variables causes dashboard noise during staging vs. production runs.
Static thresholds fail to account for seasonal test volume spikes.
Tracking only pass/fail counts ignores retry frequency and execution duration variance.

Core Reliability Metrics #

Flakiness Rate: (Failed Retries / Total Executions) * 100
Quarantine Hit Rate: Tests auto-quarantined / Total flagged tests
Test Stability Score: 100 - (Flakiness Rate + Timeout Rate)
MTTR: Average CI duration from flakiness detection to fix merge

Troubleshooting FAQ #

How do I prevent Grafana from displaying stale metrics after a pipeline failure? Configure a last_seen metric with a 24-hour expiry. Use Grafana’s transform plugin to filter out series where last_seen exceeds the threshold.

Can I correlate flakiness spikes with specific frontend dependency updates? Tag Prometheus metrics with dependency_version. Use Grafana’s annotation feature to overlay package update timestamps onto the flakiness time series.

What is the recommended refresh interval for a QA reliability dashboard? Set the dashboard refresh to 5m or 10m. Faster intervals increase backend load without adding value, as CI pipelines typically run in 15–30 minute batches.