Article · Flaky Test Detection & Quarantine Engineering

Building a QA Reliability Dashboard in Grafana

Establishing a centralized view for test execution data is critical for modern JavaScript Testing Flakiness & Reliability Engineering initiatives. This guide details exact architectural steps and panel configurations for Building a QA Reliability Dashboard in Grafana. We focus on transforming raw CI/CD logs into actionable reliability signals for QA and DevOps teams.

6 sections URL: /flaky-test-detection-quarantine-engineering/reliability-dashboards-for-qa-teams/building-a-qa-reliability-dashboard-in-grafana/

Data Ingestion & Metric Standardization #

Standardize test output formats across CI runners before visualization. Export Jest, Playwright, or Cypress results to structured JSON and forward to Prometheus (via a push gateway or custom exporter) or Loki (for log-based metrics). Tag every metric with repo, branch, test_suite, and environment. This foundational step aligns ingestion pipelines with broader Flaky Test Detection & Quarantine Engineering workflows.

A minimal Prometheus exporter pattern for test results:

// scripts/push-metrics.js
// Pushes test result counters to Prometheus Pushgateway.
const { Gauge, Pushgateway, Registry } = require('prom-client');

const registry = new Registry();
const flakyGauge = new Gauge({
  name: 'test_flaky_total',
  help: 'Number of flaky test executions',
  labelNames: ['suite', 'branch'],
  registers: [registry]
});

async function pushMetrics({ suite, branch, flakyCount }) {
  flakyGauge.set({ suite, branch }, flakyCount);
  const gw = new Pushgateway('http://pushgateway:9091', [], registry);
  await gw.push({ jobName: 'ci_test_results' });
}

module.exports = { pushMetrics };

Core Panel Configuration #

Deploy four primary panels: a Stat panel for daily pass rate, a Time Series graph for flakiness trends, a Table for quarantined tests, and a Gauge for stability scores. Use PromQL to calculate rolling 7-day averages, smoothing daily CI noise. Apply conditional formatting to highlight tests exceeding a 5% flakiness threshold.

# Flakiness rate: ratio of tests that needed a retry and eventually passed
# to total test executions over the past 7 days.
sum(increase(test_flaky_total{status="passed_on_retry"}[7d]))
  /
sum(increase(test_executions_total[7d])) * 100

Grafana threshold configuration (JSON panel excerpt):

{
  "fieldConfig": {
    "defaults": {
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "color": "green", "value": null },
          { "color": "yellow", "value": 3 },
          { "color": "red", "value": 5 }
        ]
      }
    }
  }
}

Dynamic Filtering & Drill-Downs #

Implement dashboard template variables for repo, branch, and test_status. Link panels using these variables so clicking a quarantined test automatically filters the time series. Ensure all drill-downs preserve CI execution context to prevent false positives during pipeline debugging.

Use Grafana’s Data Links feature to connect a failing test row in the Table panel to the CI artifact URL:

/ci/artifacts?suite=${__data.fields.suite}&run=${__data.fields.run_id}

Common Pitfalls & Troubleshooting #

  • Aggregating metrics across unrelated test suites masks localized flakiness.
  • Omitting CI environment variables causes dashboard noise during staging vs. production runs.
  • Static thresholds fail to account for seasonal test volume spikes; prefer dynamic baselines.
  • Tracking only pass/fail counts ignores retry frequency and execution duration variance.

Core Reliability Metrics #

  • Flakiness Rate: (Tests passing on retry / Total executions) * 100
  • Quarantine Hit Rate: Tests auto-quarantined / Total flagged tests
  • Test Stability Score: 100 - Flakiness Rate
  • MTTR: Average time from flakiness detection to fix merge

Troubleshooting FAQ #

How do I prevent Grafana from displaying stale metrics after a pipeline failure? Set a staleness interval in your Prometheus scrape configuration (default is 5 minutes). Metrics older than the staleness window are treated as absent. If your CI runs less frequently, increase the staleness interval or use the last_over_time PromQL function to forward-fill values within a window.

Can I correlate flakiness spikes with specific frontend dependency updates? Tag Prometheus metrics with a dependency_version label at push time. Use Grafana’s Annotations feature to overlay package update timestamps — stored as events in a SQL or Loki datasource — onto the flakiness time series.

What is the recommended refresh interval for a QA reliability dashboard? Set the dashboard refresh to 5m or 10m. Faster intervals increase backend load without adding value, as CI pipelines typically run in 15–30 minute batches.