Data Ingestion & Metric Standardization #
Standardize test output formats across CI runners before visualization. Export Jest, Playwright, or Cypress results to structured JSON and forward to Prometheus (via a push gateway or custom exporter) or Loki (for log-based metrics). Tag every metric with repo, branch, test_suite, and environment. This foundational step aligns ingestion pipelines with broader Flaky Test Detection & Quarantine Engineering workflows.
A minimal Prometheus exporter pattern for test results:
// scripts/push-metrics.js
// Pushes test result counters to Prometheus Pushgateway.
const { Gauge, Pushgateway, Registry } = require('prom-client');
const registry = new Registry();
const flakyGauge = new Gauge({
name: 'test_flaky_total',
help: 'Number of flaky test executions',
labelNames: ['suite', 'branch'],
registers: [registry]
});
async function pushMetrics({ suite, branch, flakyCount }) {
flakyGauge.set({ suite, branch }, flakyCount);
const gw = new Pushgateway('http://pushgateway:9091', [], registry);
await gw.push({ jobName: 'ci_test_results' });
}
module.exports = { pushMetrics };
Core Panel Configuration #
Deploy four primary panels: a Stat panel for daily pass rate, a Time Series graph for flakiness trends, a Table for quarantined tests, and a Gauge for stability scores. Use PromQL to calculate rolling 7-day averages, smoothing daily CI noise. Apply conditional formatting to highlight tests exceeding a 5% flakiness threshold.
# Flakiness rate: ratio of tests that needed a retry and eventually passed
# to total test executions over the past 7 days.
sum(increase(test_flaky_total{status="passed_on_retry"}[7d]))
/
sum(increase(test_executions_total[7d])) * 100
Grafana threshold configuration (JSON panel excerpt):
{
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 3 },
{ "color": "red", "value": 5 }
]
}
}
}
}
Dynamic Filtering & Drill-Downs #
Implement dashboard template variables for repo, branch, and test_status. Link panels using these variables so clicking a quarantined test automatically filters the time series. Ensure all drill-downs preserve CI execution context to prevent false positives during pipeline debugging.
Use Grafana’s Data Links feature to connect a failing test row in the Table panel to the CI artifact URL:
/ci/artifacts?suite=${__data.fields.suite}&run=${__data.fields.run_id}
Common Pitfalls & Troubleshooting #
- Aggregating metrics across unrelated test suites masks localized flakiness.
- Omitting CI environment variables causes dashboard noise during staging vs. production runs.
- Static thresholds fail to account for seasonal test volume spikes; prefer dynamic baselines.
- Tracking only pass/fail counts ignores retry frequency and execution duration variance.
Core Reliability Metrics #
- Flakiness Rate:
(Tests passing on retry / Total executions) * 100 - Quarantine Hit Rate:
Tests auto-quarantined / Total flagged tests - Test Stability Score:
100 - Flakiness Rate - MTTR: Average time from flakiness detection to fix merge
Troubleshooting FAQ #
How do I prevent Grafana from displaying stale metrics after a pipeline failure?
Set a staleness interval in your Prometheus scrape configuration (default is 5 minutes). Metrics older than the staleness window are treated as absent. If your CI runs less frequently, increase the staleness interval or use the last_over_time PromQL function to forward-fill values within a window.
Can I correlate flakiness spikes with specific frontend dependency updates?
Tag Prometheus metrics with a dependency_version label at push time. Use Grafana’s Annotations feature to overlay package update timestamps — stored as events in a SQL or Loki datasource — onto the flakiness time series.
What is the recommended refresh interval for a QA reliability dashboard?
Set the dashboard refresh to 5m or 10m. Faster intervals increase backend load without adding value, as CI pipelines typically run in 15–30 minute batches.