Root cause #
A Playwright test becomes a quarantine candidate when retries keep classifying it as flaky run after run — the non-determinism is real and recurring, usually an auto-waiting timeout, a parallel-worker state collision, or a network response that lands outside the expected window. Letting that test stay in the blocking suite means every red run could be the flaky test rather than a genuine regression, which is exactly what erodes trust in CI.
The fix is structural, not a code patch on the test. Playwright lets you tag a test (@quarantine), grep on that tag, and assign tags to a dedicated project. By running quarantined tests in their own project you keep collecting their pass/fail data for the trend, but you exclude that project from the gate that blocks merges. The quarantine list itself can be a tag in source, a test.fixme annotation, or a generated grep file produced by your detection step.
Step-by-step fix #
1. Tag flaky tests #
Apply the @quarantine tag to tests your detection step flagged. Tags live in the title or the tag option and are greppable.
// checkout.spec.ts
import { test, expect } from '@playwright/test';
// The tag option marks this test without changing its title text.
test('applies discount code', { tag: '@quarantine' }, async ({ page }) => {
await page.goto('/cart');
await expect(page.getByTestId('total')).toHaveText('$90');
});
// Trade-off: tagging in source is explicit and reviewable but requires a
// commit; a generated grep file (step 3) avoids commits at higher risk
// of drift between the list and the code.
2. Add a non-blocking quarantine project #
Define two projects: the main one excludes the tag, the quarantine one includes only the tag. The quarantine project gets generous retries because its job is observation, not gating.
// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
reporter: [['json', { outputFile: 'results.json' }]],
projects: [
{
name: 'main',
grepInvert: /@quarantine/, // blocking suite skips quarantined tests
},
{
name: 'quarantine',
grep: /@quarantine/, // observe-only suite runs them in isolation
retries: 3,
},
],
});
// Trade-off: grepInvert keeps the gate fast and clean, but a tag typo
// silently leaves a flaky test in the blocking suite — lint the tags.
3. Generate the quarantine grep from detection output #
Instead of editing source, write the flaky set to a grep file your CI passes to --grep. This couples quarantine to your detection JSON.
// build-quarantine-grep.js
const fs = require('fs');
const flaky = require('./flaky-playwright.json'); // from the detection step
// Escape titles into an alternation for --grep.
const pattern = flaky.map((f) => f.title.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')).join('|');
fs.writeFileSync('quarantine.grep', pattern || '__none__');
// Trade-off: a generated list never drifts from detection data, but a
// huge alternation is slow to match — cap the list and quarantine in bulk.
4. Gate the merge in GitHub Actions #
Run the main project as the required check and the quarantine project with continue-on-error so its result is recorded but never blocks.
# .github/workflows/playwright.yml
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
# Required check: only the blocking project can fail the job.
- name: Run main suite
run: npx playwright test --project=main
# Observe-only: failures here are recorded but do not block merge.
- name: Run quarantine suite
continue-on-error: true
run: npx playwright test --project=quarantine
- name: Annotate quarantine result
if: always()
run: echo "Quarantine job conclusion: ${{ job.status }}"
Pitfalls #
- A tag typo leaves a flaky test in the blocking suite. Mitigation: lint tags in CI and assert the quarantine project matched at least the expected count.
- Quarantine becomes a graveyard. Mitigation: enforce an exit policy — 10+ consecutive passes in the quarantine project before removing the tag.
continue-on-errormasks an infrastructure outage. Mitigation: still surface the quarantine conclusion in the job summary and alert on a spike.- Coverage silently drops as tests move to quarantine. Mitigation: track quarantine count as its own metric and cap it with an SLO.
- Using
test.fixmeinstead of a project:fixmeskips entirely, so you lose the pass/fail data. Mitigation: prefer a non-blocking project so observation continues.
Reliability targets #
| Metric | Target |
|---|---|
| Quarantined tests (% of suite) | < 2% |
| Time in quarantine before fix | < 2 sprints |
| Consecutive passes to exit | ≥ 10 |
| Main-suite flake rate | < 1% |
| CI pass rate (main project) | ≥ 99.5% |
Frequently Asked Questions #
Q: Should I use test.fixme or a quarantine project?
A: A non-blocking project. test.fixme skips the test entirely, so you stop gathering the data you need to know whether it is fixed; a separate project keeps running it without blocking the gate.
Q: How do I make only the main project a required check?
A: Run the projects as separate steps (or jobs) and apply continue-on-error to the quarantine step, then mark only the main step’s job as a required status check in branch protection.
Q: How is this different from the Cypress approach? A: The concept matches but the mechanism differs — Cypress uses a runtime skip list. See how to auto-quarantine flaky Cypress tests for that pattern.