You push a new feature, all local tests pass, and you open a pull request. The continuous integration (CI) pipeline kicks off, but a few minutes later, you see a dreaded red ‘X’. A test failed. You scrutinize your code, find nothing wrong, and re-run the job. This time, it passes with a green checkmark. If this scenario feels familiar, you’ve encountered a flaky test.
A flaky test is a test that exhibits non-deterministic behavior—it can both pass and fail across multiple runs without any changes to the code or its environment. While it might seem like a minor annoyance, test flakiness is a significant problem that can erode your team’s confidence in your test suite, slow down development velocity, and ultimately allow real bugs to slip into production. Understanding what causes these inconsistencies is the first step toward building a more reliable and trustworthy testing process.
What Causes Flaky Tests?
Flaky tests are rarely caused by a single, obvious issue. They often stem from subtle problems in the test code, its interaction with the application, or the environment in which it runs. Pinpointing the exact reason for the flaky test meaning
requires looking at several common culprits.
Concurrency and Race Conditions
One of the most common causes of flakiness is incorrect assumptions about execution order. This happens when multiple tests, or different threads within a single test, compete for shared resources like a database entry, a file on disk, or a piece of memory. A race condition occurs when the test’s success depends on the unpredictable sequence of these operations. If a test only accepts one specific outcome when multiple are possible and correct, its result will be unreliable.
For example, imagine two tests running in parallel:
Test_CreateUser
creates a user withID=123
.Test_DeleteUser
deletes the user withID=123
.
If Test_DeleteUser
runs before Test_CreateUser
, it will fail because the user doesn’t exist yet. If they run in the intended order, both pass. This is a classic test order dependency
, a frequent source of flaky tests.
Asynchronous Operations and Timing Issues
Modern applications are full of asynchronous operations, from API calls to database writes. A test might trigger an action and then immediately check for the result before the operation has had time to complete.
A common anti-pattern is to add a fixed sleep()
command to the test, like sleep(2)
. This might work on your local machine, but in a resource-constrained CI environment, the operation might take longer than two seconds, causing the test to fail. Conversely, a long sleep time unnecessarily slows down the entire test suite. This async wait
problem is a major contributor to test flakiness.
External Dependencies and Unstable Environments
Tests that rely on external systems are inherently more prone to flakiness. The reliability of these tests is tied to factors outside your control:
- Third-Party APIs: An external API might be slow, temporarily unavailable, or return unexpected rate-limiting errors.
- Network Latency: Fluctuations in network performance can cause timeouts.
- Inconsistent State: If the test environment is not properly isolated, the remnants of a previous test run (e.g., data left in a database) can influence the outcome of the current test.
A robust test suite requires a controlled, isolated, and consistent environment for every single run.
Poorly Written Tests and Flawed Logic
Sometimes, the bug is in the test itself. A flaky test
can be a symptom of:
- Insufficient Assertions: The test doesn’t check all relevant aspects of the expected outcome, leaving room for false passes or failures.
- Non-deterministic Code: The test relies on unpredictable elements like random number generators (without a fixed seed) or the current system time. For example, a test that checks for an exact timestamp is almost guaranteed to be flaky.
- Flawed Logic: The test contains simple bugs, typos, or logical errors that affect its validity under certain conditions.
The High Cost of Ignoring Flaky Tests
Flaky tests are more than just a nuisance; they have tangible, negative consequences for software quality and team productivity. A study by Google found that flaky tests were a major drain on resources, taking significantly longer to fix than non-flaky ones. Another study from Microsoft estimated that flaky tests cost them over a million dollars annually in developer time alone.
The costs manifest in several ways:
- Eroded Confidence: When tests fail randomly, developers start to distrust the test suite. This “cry wolf” syndrome leads to teams ignoring genuine failures, assuming they are just more flakiness. This is how critical regressions make it to production.
- Decreased Velocity: Teams waste countless hours re-running CI jobs, manually verifying failures, and debugging problems that don’t actually exist in the application code. This friction slows down the entire development and deployment pipeline.
- Poor Team Morale: Constant battles with an unreliable CI/CD pipeline are frustrating and demotivating. It distracts engineers from their primary goal of building and shipping valuable features.
How to Detect Flaky Tests
You can’t fix what you can’t find. Flaky test detection
requires a systematic approach, as these tests may not fail consistently. Here are some effective strategies for how to detect flaky tests
.
Monitor and Analyze Test History
The most reliable way to identify a flaky test is by analyzing its history. A test that frequently switches between pass and fail statuses over time is a clear red flag. Many modern CI/CD platforms provide analytics that can automatically surface this information. Look for tools that track:
- The number of times a test has failed.
- The “flakiness rate” (pass/fail percentage) over the last
N
runs. - The tests that are most frequently re-run by developers.
Use Automated Test Retries
A common feature in CI tools is the ability to automatically re-run a failed test a certain number of times. If a test fails on its first attempt but passes on a subsequent retry, it is highly likely to be flaky. While this can keep your pipeline from being blocked, it’s crucial to use this data. A test that consistently requires retries should be quarantined and investigated, not just ignored.
Run Tests in Parallel and Random Order
To uncover hidden dependencies, intentionally run your test suite in a non-sequential manner.
- Parallel Execution: Running tests simultaneously will quickly expose race conditions and issues with shared resources.
- Randomized Order: Shuffling the execution order before each run will break any implicit dependencies between tests that assume a specific sequence.
A Practical Guide to Fixing Flaky Tests
Once you’ve identified a flaky test, the next step is to fix it. How to fix flaky tests
involves making them more isolated, deterministic, and resilient.
1. Isolate the Test from External Factors
A test should not depend on or affect any other test or external system. It must be entirely self-contained.
- Use Mocks and Stubs: Replace calls to external services (like third-party APIs) with mock objects that return predictable, consistent responses. This eliminates network and API instability as a source of flakiness.
- Create Dedicated Test Resources: Each test run should have its own isolated environment. For database interactions, this could mean running each test in a transaction that is rolled back afterward or using a fresh in-memory database for each run.
- Practice Setup and Teardown: Ensure each test explicitly sets up the state it needs and cleans up after itself, leaving no artifacts behind.
2. Eliminate Randomness and Non-Determinism
Your test logic should be as deterministic as possible.
- Control Random Data: If you need to generate random data, use a fixed seed for your random number generator so it produces the same sequence of “random” values every time.
- Manage Time: Avoid using
DateTime.now()
. If your logic is time-sensitive, use a library that allows you to “freeze” or control time so your test runs against a consistent timestamp.
3. Handle Asynchronous Operations Intelligently
How to avoid flaky tests
in asynchronous applications means moving away from fixed sleep()
calls. Instead, use more robust waiting strategies. Most modern testing frameworks provide “explicit wait” or “polling” mechanisms. Instead of waiting for a fixed duration, you should write code that waits until a specific condition is met. For instance, rather than clicking a button and waiting for an arbitrary number of seconds hoping the action completes, a better approach is to instruct the test to wait until a specific element, like a “Success!” message, becomes visible. This approach is both faster (it proceeds as soon as the condition is met) and more reliable (it can handle slower response times up to a defined timeout).
4. Simplify and Strengthen Test Logic
Review the test code itself for clarity and correctness.
- Write Clear Assertions: Assert the specific outcome you’re testing for, but avoid being overly brittle. For example, instead of checking for an exact timestamp, check that it falls within a reasonable range.
- Refactor Complex Tests: If a test has become too long and complicated, break it down into smaller, more focused units. Simple, concise tests are easier to debug and maintain.
Tackling flakey tests
is a continuous effort, not a one-time fix. It requires a cultural shift where teams prioritize test reliability as highly as they do new features. By systematically detecting, isolating, and fixing sources of flakiness, you can build a robust CI/CD pipeline that accelerates development and gives you genuine confidence in your code.
For a deeper view into system performance that could be contributing to environmental flakiness—such as CPU contention or network issues—a comprehensive monitoring solution is invaluable. Explore how Netdata provides real-time, high-granularity insights into your entire infrastructure to help you build more stable systems from the ground up. Sign up for free today.