Auto-Quarantining Flaky Playwright

Tagged tests run in a separate project whose result the merge gate deliberately ignores.

Root cause #

A Playwright test becomes a quarantine candidate when retries keep classifying it as flaky run after run — the non-determinism is real and recurring, usually an auto-waiting timeout, a parallel-worker state collision, or a network response that lands outside the expected window. Letting that test stay in the blocking suite means every red run could be the flaky test rather than a genuine regression, which is exactly what erodes trust in CI.

The fix is structural, not a code patch on the test. Playwright lets you tag a test (@quarantine), grep on that tag, and assign tags to a dedicated project. By running quarantined tests in their own project you keep collecting their pass/fail data for the trend, but you exclude that project from the gate that blocks merges. The quarantine list itself can be a tag in source, a test.fixme annotation, or a generated grep file produced by your detection step.

Step-by-step fix #

1. Tag flaky tests #

Apply the @quarantine tag to tests your detection step flagged. Tags live in the title or the tag option and are greppable.

// checkout.spec.ts
import { test, expect } from '@playwright/test';

// The tag option marks this test without changing its title text.
test('applies discount code', { tag: '@quarantine' }, async ({ page }) => {
  await page.goto('/cart');
  await expect(page.getByTestId('total')).toHaveText('$90');
});
// Trade-off: tagging in source is explicit and reviewable but requires a
// commit; a generated grep file (step 3) avoids commits at higher risk
// of drift between the list and the code.

2. Add a non-blocking quarantine project #

Define two projects: the main one excludes the tag, the quarantine one includes only the tag. The quarantine project gets generous retries because its job is observation, not gating.

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  reporter: [['json', { outputFile: 'results.json' }]],
  projects: [
    {
      name: 'main',
      grepInvert: /@quarantine/, // blocking suite skips quarantined tests
    },
    {
      name: 'quarantine',
      grep: /@quarantine/,        // observe-only suite runs them in isolation
      retries: 3,
    },
  ],
});
// Trade-off: grepInvert keeps the gate fast and clean, but a tag typo
// silently leaves a flaky test in the blocking suite — lint the tags.

3. Generate the quarantine grep from detection output #

Instead of editing source, write the flaky set to a grep file your CI passes to --grep. This couples quarantine to your detection JSON.

// build-quarantine-grep.js
const fs = require('fs');
const flaky = require('./flaky-playwright.json'); // from the detection step
// Escape titles into an alternation for --grep.
const pattern = flaky.map((f) => f.title.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')).join('|');
fs.writeFileSync('quarantine.grep', pattern || '__none__');
// Trade-off: a generated list never drifts from detection data, but a
// huge alternation is slow to match — cap the list and quarantine in bulk.

4. Gate the merge in GitHub Actions #

Run the main project as the required check and the quarantine project with continue-on-error so its result is recorded but never blocks.

# .github/workflows/playwright.yml
jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      # Required check: only the blocking project can fail the job.
      - name: Run main suite
        run: npx playwright test --project=main

      # Observe-only: failures here are recorded but do not block merge.
      - name: Run quarantine suite
        continue-on-error: true
        run: npx playwright test --project=quarantine

      - name: Annotate quarantine result
        if: always()
        run: echo "Quarantine job conclusion: ${{ job.status }}"

Pitfalls #

A tag typo leaves a flaky test in the blocking suite. Mitigation: lint tags in CI and assert the quarantine project matched at least the expected count.
Quarantine becomes a graveyard. Mitigation: enforce an exit policy — 10+ consecutive passes in the quarantine project before removing the tag.
continue-on-error masks an infrastructure outage. Mitigation: still surface the quarantine conclusion in the job summary and alert on a spike.
Coverage silently drops as tests move to quarantine. Mitigation: track quarantine count as its own metric and cap it with an SLO.
Using test.fixme instead of a project: fixme skips entirely, so you lose the pass/fail data. Mitigation: prefer a non-blocking project so observation continues.

Reliability targets #

Metric	Target
Quarantined tests (% of suite)	< 2%
Time in quarantine before fix	< 2 sprints
Consecutive passes to exit	≥ 10
Main-suite flake rate	< 1%
CI pass rate (main project)	≥ 99.5%

Frequently Asked Questions #

Q: Should I use test.fixme or a quarantine project? A: A non-blocking project. test.fixme skips the test entirely, so you stop gathering the data you need to know whether it is fixed; a separate project keeps running it without blocking the gate.

Q: How do I make only the main project a required check? A: Run the projects as separate steps (or jobs) and apply continue-on-error to the quarantine step, then mark only the main step’s job as a required status check in branch protection.

Q: How is this different from the Cypress approach? A: The concept matches but the mechanism differs — Cypress uses a runtime skip list. See how to auto-quarantine flaky Cypress tests for that pattern.