Article · Flaky Test Detection & Quarantine Engineering

Auto-Quarantining Flaky Playwright Tests

Quarantining a flaky Playwright test means keeping it running for visibility while stopping it from blocking the pipeline, and Playwright's own tags, annotations, and project filtering make this clean to automate. This guide is part of Building Auto-Quarantine Workflows and shows how to mark tests with a @quarantine tag, route them into a non-blocking project, and gate the merge in GitHub Actions so isolated tests never fail the build.

10 sections URL: /flaky-test-detection-quarantine-engineering/building-auto-quarantine-workflows/auto-quarantining-flaky-playwright-tests/
Quarantine routing in Playwright Tests split by tag into a blocking main project and a non-blocking quarantine project, with the CI gate ignoring the quarantine result. test suite grep by tag no tag @quarantine main project blocking quarantine project non-blocking CI gate fails on main only ignored
Tagged tests run in a separate project whose result the merge gate deliberately ignores.

Root cause #

A Playwright test becomes a quarantine candidate when retries keep classifying it as flaky run after run — the non-determinism is real and recurring, usually an auto-waiting timeout, a parallel-worker state collision, or a network response that lands outside the expected window. Letting that test stay in the blocking suite means every red run could be the flaky test rather than a genuine regression, which is exactly what erodes trust in CI.

The fix is structural, not a code patch on the test. Playwright lets you tag a test (@quarantine), grep on that tag, and assign tags to a dedicated project. By running quarantined tests in their own project you keep collecting their pass/fail data for the trend, but you exclude that project from the gate that blocks merges. The quarantine list itself can be a tag in source, a test.fixme annotation, or a generated grep file produced by your detection step.

Step-by-step fix #

1. Tag flaky tests #

Apply the @quarantine tag to tests your detection step flagged. Tags live in the title or the tag option and are greppable.

// checkout.spec.ts
import { test, expect } from '@playwright/test';

// The tag option marks this test without changing its title text.
test('applies discount code', { tag: '@quarantine' }, async ({ page }) => {
  await page.goto('/cart');
  await expect(page.getByTestId('total')).toHaveText('$90');
});
// Trade-off: tagging in source is explicit and reviewable but requires a
// commit; a generated grep file (step 3) avoids commits at higher risk
// of drift between the list and the code.

2. Add a non-blocking quarantine project #

Define two projects: the main one excludes the tag, the quarantine one includes only the tag. The quarantine project gets generous retries because its job is observation, not gating.

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  reporter: [['json', { outputFile: 'results.json' }]],
  projects: [
    {
      name: 'main',
      grepInvert: /@quarantine/, // blocking suite skips quarantined tests
    },
    {
      name: 'quarantine',
      grep: /@quarantine/,        // observe-only suite runs them in isolation
      retries: 3,
    },
  ],
});
// Trade-off: grepInvert keeps the gate fast and clean, but a tag typo
// silently leaves a flaky test in the blocking suite — lint the tags.

3. Generate the quarantine grep from detection output #

Instead of editing source, write the flaky set to a grep file your CI passes to --grep. This couples quarantine to your detection JSON.

// build-quarantine-grep.js
const fs = require('fs');
const flaky = require('./flaky-playwright.json'); // from the detection step
// Escape titles into an alternation for --grep.
const pattern = flaky.map((f) => f.title.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')).join('|');
fs.writeFileSync('quarantine.grep', pattern || '__none__');
// Trade-off: a generated list never drifts from detection data, but a
// huge alternation is slow to match — cap the list and quarantine in bulk.

4. Gate the merge in GitHub Actions #

Run the main project as the required check and the quarantine project with continue-on-error so its result is recorded but never blocks.

# .github/workflows/playwright.yml
jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      # Required check: only the blocking project can fail the job.
      - name: Run main suite
        run: npx playwright test --project=main

      # Observe-only: failures here are recorded but do not block merge.
      - name: Run quarantine suite
        continue-on-error: true
        run: npx playwright test --project=quarantine

      - name: Annotate quarantine result
        if: always()
        run: echo "Quarantine job conclusion: ${{ job.status }}"

Pitfalls #

  • A tag typo leaves a flaky test in the blocking suite. Mitigation: lint tags in CI and assert the quarantine project matched at least the expected count.
  • Quarantine becomes a graveyard. Mitigation: enforce an exit policy — 10+ consecutive passes in the quarantine project before removing the tag.
  • continue-on-error masks an infrastructure outage. Mitigation: still surface the quarantine conclusion in the job summary and alert on a spike.
  • Coverage silently drops as tests move to quarantine. Mitigation: track quarantine count as its own metric and cap it with an SLO.
  • Using test.fixme instead of a project: fixme skips entirely, so you lose the pass/fail data. Mitigation: prefer a non-blocking project so observation continues.

Reliability targets #

Metric Target
Quarantined tests (% of suite) < 2%
Time in quarantine before fix < 2 sprints
Consecutive passes to exit ≥ 10
Main-suite flake rate < 1%
CI pass rate (main project) ≥ 99.5%

Frequently Asked Questions #

Q: Should I use test.fixme or a quarantine project? A: A non-blocking project. test.fixme skips the test entirely, so you stop gathering the data you need to know whether it is fixed; a separate project keeps running it without blocking the gate.

Q: How do I make only the main project a required check? A: Run the projects as separate steps (or jobs) and apply continue-on-error to the quarantine step, then mark only the main step’s job as a required status check in branch protection.

Q: How is this different from the Cypress approach? A: The concept matches but the mechanism differs — Cypress uses a runtime skip list. See how to auto-quarantine flaky Cypress tests for that pattern.