Detecting Flaky Tests with Jest

A pass-on-retry is the signal worth recording: it is neither a clean pass nor a hard failure.

Root cause #

A Jest test that fails on its first attempt and passes on a retry is, by definition, non-deterministic. In a single-process Jest run the most common drivers are leaked async state between tests, a timer that resolves at a different point relative to an assertion, or shared module state that the previous test left dirty. None of these are fixed by the retry — the retry only proves the outcome depends on timing or order rather than on the code under test.

The danger with jest.retryTimes is that it defaults to silence. If a test passes on attempt two, Jest reports the suite as green and you never learn the test is flaky. The whole point of using it as a detection tool is to invert that: keep the retries so CI stays unblocked, but emit a record every time a retry was needed so the flakiness shows up in your historical flakiness tracking instead of vanishing.

Step-by-step fix #

1. Enable retries with error logging #

Call jest.retryTimes in a setup file so it applies to every suite. The logErrorsBeforeRetry option prints the failing error before each retry, which is what makes the retry observable instead of invisible.

// jest.setup.js  — referenced from setupFilesAfterEach in jest.config
// retryTimes is a global; it must run before the tests in each file.
jest.retryTimes(2, { logErrorsBeforeRetry: true });
// Trade-off: 2 retries caps CI cost; higher counts mask deeper bugs
// and inflate runtime, so keep n small and treat retries as a signal.

2. Count retried-but-passed tests with a custom reporter #

A Jest reporter sees each test result, including how many invocations it took. testResult exposes invocations (total attempts) — when a test ends passed but took more than one invocation, it is flaky.

// flaky-reporter.js
const fs = require('fs');

class FlakyReporter {
  constructor() { this.flaky = []; }

  onTestResult(_test, testResult) {
    for (const r of testResult.testResults) {
      // r.invocations > 1 means a retry happened; status 'passed' means it recovered.
      if (r.status === 'passed' && r.invocations > 1) {
        this.flaky.push({
          file: testResult.testFilePath,
          title: r.fullName,
          attempts: r.invocations,
        });
      }
    }
  }

  onRunComplete() {
    // Persist as JSON for later aggregation, not stdout noise.
    fs.writeFileSync('flaky-jest.json', JSON.stringify(this.flaky, null, 2));
    if (this.flaky.length) {
      console.warn(`Detected ${this.flaky.length} flaky test(s) via retry.`);
    }
  }
}
module.exports = FlakyReporter;
// Trade-off: writing a file per shard is cheap, but you must merge
// shard outputs before computing a real flake rate (CI cost is low).

3. Register the reporter and fail the build on regressions #

Wire the reporter into Jest config alongside the default reporter so you keep normal output, then decide whether a newly flaky test should fail the build.

// jest.config.js
module.exports = {
  setupFilesAfterEach: ['<rootDir>/jest.setup.js'],
  reporters: ['default', '<rootDir>/flaky-reporter.js'],
};
// Trade-off: keeping CI green on retry keeps velocity, but pair it with
// a budget check (step below) so flakiness cannot grow unbounded.

4. Gate on a flaky budget #

Add a tiny post-run check that compares the detected flaky count against a budget. This keeps retries from becoming a permanent crutch.

// check-flaky-budget.js
const flaky = require('./flaky-jest.json');
const BUDGET = 3; // max tolerated flaky tests per run
if (flaky.length > BUDGET) {
  console.error(`Flaky budget exceeded: ${flaky.length} > ${BUDGET}`);
  process.exit(1); // fail the job so the regression is addressed
}
// Trade-off: a hard budget surfaces drift early but can block merges;
// start lenient, then ratchet down as you stabilize the suite.

Pitfalls #

Treating retries as a fix: passing on retry hides the bug. Mitigation: always emit a record and review it; never let a green retry close the loop.
Setting retryTimes too high: 5+ retries can turn a 90%-failing test green. Mitigation: cap at 1-2 and rely on the reporter to flag recurrence.
Losing data across shards: each worker writes its own JSON. Mitigation: merge all flaky-jest.json artifacts before computing rates.
Order-dependent flakiness: a test only flakes after another runs first. Mitigation: run with --randomize periodically so retries surface ordering bugs.
Confusing invocations with numPassingAsserts: only invocations > 1 indicates a retry. Mitigation: assert on the attempt count, not assertion count.

Reliability targets #

Metric	Target
Retry count (`retryTimes` n)	1-2
Flaky tests detected per run	≤ 3 (budgeted)
Pass-on-retry rate per suite	< 1%
Time-to-quarantine after detection	< 1 sprint
CI pass rate (post-retry)	≥ 99%

Frequently Asked Questions #

Q: Does jest.retryTimes hide real failures? A: It can, which is why detection matters. A test that fails all retries still fails the build; only pass-on-retry outcomes are softened, and the reporter records each one so they are never truly hidden.

Q: Where do I call jest.retryTimes? A: In a setupFilesAfterEach file so it executes before each test file. It is a global and has no effect if called from inside an individual test body.

Q: How do I turn detection into a trend? A: Persist the reporter’s JSON per run and aggregate it over time. See tracking test flakiness trends over time for the rolling-window approach.