1. CI-First Detection Architecture #
Reliable detection must occur at the pipeline layer, not locally. Configure parallel retry logic with deterministic seed tracking to separate true regressions from environmental noise. Integrate Automated Flaky Test Detection Tools directly into your CI runner to capture execution variance, timing drift, and resource contention at scale. Set threshold-based triggers that only activate quarantine when flakiness exceeds defined SLOs.
2. Production-Ready Quarantine Workflows #
Quarantine must be automated, auditable, and reversible. Use dynamic test exclusion lists and metadata tagging to route unstable suites to isolated execution pools. Follow Building Auto-Quarantine Workflows to implement GitOps-managed quarantine states with automated re-validation cycles and precise ownership routing.
3. Measurable Stability & Observability #
Quarantine without telemetry creates hidden technical debt. Track flake rates, mean time to quarantine (MTTQ), and pass-rate recovery curves across sprint cycles. Implement Historical Flakiness Tracking & Analytics to correlate test instability with dependency bumps, infrastructure changes, and test authorship patterns. Surface these KPIs via Reliability Dashboards for QA Teams to align engineering and QA on data-driven stability targets.
4. Shift-Left Integration & PR Gates #
Prevent flaky code from merging by embedding stability gates early in the development lifecycle. Configure progressive PR checks that differentiate between true failures and environmental noise. Enforce deterministic test patterns that ensure external services are properly mocked and async handling is standardized before code reaches main.
Production Configuration Examples #
Jest CI Retry Configuration #
// jest.config.js
// jest-circus (the default runner since Jest 27) supports retryTimes via jest.retryTimes().
// There is no built-in flakyTestConfig key — quarantine logic lives in custom reporters.
module.exports = {
// Trade-off: retryTimes masks underlying race conditions.
// Use strictly for CI isolation; disable for local dev.
testRunner: 'jest-circus/runner',
reporters: ['default'],
};
// In a test setup file (e.g., jest.setup.js), enable per-test retries:
// jest.retryTimes(3, { logErrorsBeforeRetry: true });
// Call this inside describe/beforeEach for targeted retry, not globally.
Playwright GitHub Actions Quarantine Step #
# .github/workflows/quarantine.yml
- name: Run Quarantined Tests
run: npx playwright test --grep @quarantined --reporter=html
env:
# Trade-off: Running quarantined suites in parallel increases CI compute cost.
# Offset by scheduling during off-peak hours to maintain budget.
CI: true
Cypress CI YAML Isolation #
# .github/workflows/cypress-quarantine.yml
- name: Cypress Quarantine Execution
run: npx cypress run --spec "cypress/e2e/quarantine/**/*"
env:
# Trade-off: Headless mode reduces browser overhead but may hide
# rendering-specific flakiness. Enable video for post-mortem analysis.
CYPRESS_VIDEO: true
CYPRESS_RETRIES: 2
Common Pitfalls #
- Over-relying on local retries instead of CI-level detection
- Quarantining tests without automated re-validation windows
- Ignoring environmental variance (CPU throttling, network latency) as root causes
- Blocking PRs without clear flakiness SLOs or exception workflows
- Failing to tag quarantined tests with ownership and remediation deadlines
- Treating quarantine as permanent deletion instead of a temporary isolation state
Reliability Metrics #
- Flake Rate (%)
- Mean Time to Quarantine (MTTQ)
- Quarantine Re-validation Pass Rate
- Pipeline Stability Index (PSI)
- Test Execution Variance (ms)
- Flake-to-Fix Ratio
FAQ #
What is the acceptable flakiness threshold for production CI? Industry practice targets <1% flake rate for main branch pipelines. Quarantine triggers should activate at 2–3 consecutive non-deterministic failures, or when execution variance exceeds 15% of the baseline duration.
How do we prevent quarantined tests from becoming permanent technical debt? Enforce automated re-validation windows (e.g., 72 hours), assign remediation owners via metadata tags, and block new feature merges if the quarantine backlog exceeds defined SLOs.
Should flaky tests block pull requests? Yes, but only when integrated with deterministic PR checks that differentiate between true regressions and environmental noise. Use progressive gating rather than hard blocks to maintain developer velocity.