Article · Flaky Test Detection & Quarantine Engineering

Detecting Flaky Tests with Jest retryTimes

Jest's jest.retryTimes re-runs a failed test in the same process, and when used deliberately it becomes a detection signal rather than a way to paper over instability. This guide is part of Automated Flaky Test Detection Tools and shows how to capture every test that failed once but later passed, count those retried-but-passed outcomes, and feed them into your flakiness records instead of silently swallowing them.

10 sections URL: /flaky-test-detection-quarantine-engineering/automated-flaky-test-detection-tools/detecting-flaky-tests-with-jest-retrytimes/
Retry as a detection signal A test run flows through an initial attempt, a retry on failure, and a reporter that classifies the result as stable, flaky, or failing. Attempt 1 runs test fail Retry re-runs test Pass on 1st = stable Pass on retry = FLAKY All retries fail = failing Reporter records flaky
A pass-on-retry is the signal worth recording: it is neither a clean pass nor a hard failure.

Root cause #

A Jest test that fails on its first attempt and passes on a retry is, by definition, non-deterministic. In a single-process Jest run the most common drivers are leaked async state between tests, a timer that resolves at a different point relative to an assertion, or shared module state that the previous test left dirty. None of these are fixed by the retry — the retry only proves the outcome depends on timing or order rather than on the code under test.

The danger with jest.retryTimes is that it defaults to silence. If a test passes on attempt two, Jest reports the suite as green and you never learn the test is flaky. The whole point of using it as a detection tool is to invert that: keep the retries so CI stays unblocked, but emit a record every time a retry was needed so the flakiness shows up in your historical flakiness tracking instead of vanishing.

Step-by-step fix #

1. Enable retries with error logging #

Call jest.retryTimes in a setup file so it applies to every suite. The logErrorsBeforeRetry option prints the failing error before each retry, which is what makes the retry observable instead of invisible.

// jest.setup.js  — referenced from setupFilesAfterEach in jest.config
// retryTimes is a global; it must run before the tests in each file.
jest.retryTimes(2, { logErrorsBeforeRetry: true });
// Trade-off: 2 retries caps CI cost; higher counts mask deeper bugs
// and inflate runtime, so keep n small and treat retries as a signal.

2. Count retried-but-passed tests with a custom reporter #

A Jest reporter sees each test result, including how many invocations it took. testResult exposes invocations (total attempts) — when a test ends passed but took more than one invocation, it is flaky.

// flaky-reporter.js
const fs = require('fs');

class FlakyReporter {
  constructor() { this.flaky = []; }

  onTestResult(_test, testResult) {
    for (const r of testResult.testResults) {
      // r.invocations > 1 means a retry happened; status 'passed' means it recovered.
      if (r.status === 'passed' && r.invocations > 1) {
        this.flaky.push({
          file: testResult.testFilePath,
          title: r.fullName,
          attempts: r.invocations,
        });
      }
    }
  }

  onRunComplete() {
    // Persist as JSON for later aggregation, not stdout noise.
    fs.writeFileSync('flaky-jest.json', JSON.stringify(this.flaky, null, 2));
    if (this.flaky.length) {
      console.warn(`Detected ${this.flaky.length} flaky test(s) via retry.`);
    }
  }
}
module.exports = FlakyReporter;
// Trade-off: writing a file per shard is cheap, but you must merge
// shard outputs before computing a real flake rate (CI cost is low).

3. Register the reporter and fail the build on regressions #

Wire the reporter into Jest config alongside the default reporter so you keep normal output, then decide whether a newly flaky test should fail the build.

// jest.config.js
module.exports = {
  setupFilesAfterEach: ['<rootDir>/jest.setup.js'],
  reporters: ['default', '<rootDir>/flaky-reporter.js'],
};
// Trade-off: keeping CI green on retry keeps velocity, but pair it with
// a budget check (step below) so flakiness cannot grow unbounded.

4. Gate on a flaky budget #

Add a tiny post-run check that compares the detected flaky count against a budget. This keeps retries from becoming a permanent crutch.

// check-flaky-budget.js
const flaky = require('./flaky-jest.json');
const BUDGET = 3; // max tolerated flaky tests per run
if (flaky.length > BUDGET) {
  console.error(`Flaky budget exceeded: ${flaky.length} > ${BUDGET}`);
  process.exit(1); // fail the job so the regression is addressed
}
// Trade-off: a hard budget surfaces drift early but can block merges;
// start lenient, then ratchet down as you stabilize the suite.

Pitfalls #

  • Treating retries as a fix: passing on retry hides the bug. Mitigation: always emit a record and review it; never let a green retry close the loop.
  • Setting retryTimes too high: 5+ retries can turn a 90%-failing test green. Mitigation: cap at 1-2 and rely on the reporter to flag recurrence.
  • Losing data across shards: each worker writes its own JSON. Mitigation: merge all flaky-jest.json artifacts before computing rates.
  • Order-dependent flakiness: a test only flakes after another runs first. Mitigation: run with --randomize periodically so retries surface ordering bugs.
  • Confusing invocations with numPassingAsserts: only invocations > 1 indicates a retry. Mitigation: assert on the attempt count, not assertion count.

Reliability targets #

Metric Target
Retry count (retryTimes n) 1-2
Flaky tests detected per run ≤ 3 (budgeted)
Pass-on-retry rate per suite < 1%
Time-to-quarantine after detection < 1 sprint
CI pass rate (post-retry) ≥ 99%

Frequently Asked Questions #

Q: Does jest.retryTimes hide real failures? A: It can, which is why detection matters. A test that fails all retries still fails the build; only pass-on-retry outcomes are softened, and the reporter records each one so they are never truly hidden.

Q: Where do I call jest.retryTimes? A: In a setupFilesAfterEach file so it executes before each test file. It is a global and has no effect if called from inside an individual test body.

Q: How do I turn detection into a trend? A: Persist the reporter’s JSON per run and aggregate it over time. See tracking test flakiness trends over time for the rolling-window approach.