Enhancing Test Stability With Test Doubles for Flaky Test Management

9 min readApr 29, 2024

Ever spent hours writing a test, only to have it fail mysteriously the next day? If yes, you are not alone, It is sincerely a very common frustration for developers of all experience levels.

Let me share a little of my experience. Right before an update for our app, we ran our weekend tests to ensure everything worked fine. While earlier tests for purchasing items worked fine, on Sunday, they suddenly failed. After looking into it, we figured out that our tests were dependent on the response from a mock payment , programmed to mimic weekdays only, overlooking weekend transaction delays.

We never noticed this before because we never ran these tests on a weekend. It was a good lesson that our tests need to be more like real life, including what happens on weekends, to catch problems like this.

Goals

Understanding flaky tests is coming to terms that your application could be all perfect and still fail a test, it doesn’t matter if it has passed any previously, it could fail if it wants to, there are certain reasons for this, and there are certain ways we could enhance our test stability, one of which is by implementing test doubles and knowing when exactly to implement it, and thats the sole purpose of this article.

What is Flaky Test and Why Test Doubles?

Similar to how stunt doubles do dangerous work in movies, we use test doubles to replace troublemakers and make tests easier to write.~Jani Hartikainen

A test that exhibits both passing and failing outcomes under the same configuration is known as a Flaky test. This tends to be very frustrating. Enhancing test stability and managing Flaky tests becomes secondary in a software development cycle.

Flaky test reduces confidence in the testing team and leads to wasted time and resources. A few of the most common causes of flakiness are Concurrency Issues, Dependency on External Services, leaked state, Platform or environmental differences, randomness, and fixed time dependency.

Employing the use of test doubles is a good choice to improve test stability, and make your tests more reliable.

They help by taking the place of unpredictable parts of your code, like external services, so your tests don’t fail unexpectedly. This approach saves time and makes sure your tests really check what they’re supposed to.

Test doubles are assumed objects that imitate the behavior of real components in your application. In other words, Test doubles are like fake parts in a system that act like the real ones, but in a way we can control.

They’re used in tests to take the place of real parts to make sure tests run the same way every time. “Test double” covers different kinds: stubs, mocks, fakes, spies, and dummies.

Types of Test Doubles

Test doubles could be confusing at times and misplaced on when to be used and more especially what to be used.

Dummy Objects

Dummy Objects also known as Dummies, are usually put into parameters or arguments. These are passed around to satisfy the API but never actually used. For better understanding, they are like placeholders or fill-ins for parts of your code that need something to work, but you don’t use those parts for anything important.

Think of them as the extras in a movie scene they’re there to fill the space, so everything looks right, but they don’t have lines or impact the scene. They help your code run without errors when it’s expecting certain pieces to be there, even though those pieces aren’t doing anything.

Here is a basic implementation of a Dummy in Javascript :

const dummyObject = {};

// Test using the dummytest('dummy test', () => {
  // Dummy is not used in this test
});

In the code example above, we defined a dummy object named ‘dummyObject’, which could be usually empty. The purpose of this object is to fulfill the parameter requirement, without necessarily having any effect.

It is important to note that using a dummy object doesn’t directly address flakiness, but it helps ensure that the test remains consistent and predictable.

Spy Objects

Spies are stubs that record some information based on how they were called. They can be used to verify that certain methods were called. In testing, you can liken spies to undercover agents, they watch and register what happens in the code without really getting involved. While they can mimic certain actions like stubs, their main job is to keep track of how and when they’re called.

Let’s look at a simple implementation of Spies:

// TEST WITH JEST

const inventoryService = {
  checkStock: () => Promise.resolve(true),
};
test("spy test", async () => {
  const spy = jest.spyOn(inventoryService, "checkStock");
  const isInStock = await inventoryCheck("item123", inventoryService);  expect(spy).toHaveBeenCalledWith("item123");
  expect(isInStock).toBe(true);  spy.mockRestore();
});

Inside the test above, we use jest.spyOn() to create a spy on the checkStock() method of inventoryService. The spy records function calls and makes sure the behavior always resolves to true.

By ensuring expected behavior and function calls, spies contribute to test stability, and highly reduces the likelihood of flakiness in our test.

Fake Objects

Fakes do have a working implementation but usually take shortcuts which makes them unsuitable for production. They work and give results like they are the real thing, but take shortcuts to be easier to use in tests.

Let’s take for example, using an in-memory database instead of a real one speeds up tests but wouldn’t work for a live app because it’s too simple or not robust enough. Unlike spies, fakes are more about simplifying things by acting as a lightweight version of the real deal, making tests easier to manage.

Lets look at a simple implementation of Fakes:

// Fake Example with Jest

class InventoryServiceFake {
  checkStock(itemId) {
    return Promise.resolve(true);
  }
}
test("fake test", () => {
  const inventoryServiceFake = new InventoryServiceFake();
  return inventoryCheck("item123", inventoryServiceFake).then((isInStock) => {
    expect(isInStock).toBe(true);
  });
});

In the example above, we define a simple implementation of the InventoryService class called InventoryServiceFake. This fake implementation provides a simplified version of the checkStock method, which always resolves to true.

By using a fake object with a constant predictable behavior, we eliminate dependencies on external systems that might introduce flakiness into our test. The test then becomes more deterministic as it always expects true from the fake service. Is it a good practice? Of course it is.

Mock Objects

Mocks in tests act like parts of your code that expect certain actions to happen. They’re set up to watch for specific methods or functions to be called in a certain way. Instead of returning data like some other test doubles, mocks check if the right interactions occur. This means you’re testing how different parts of your code talk to each other, making sure everything is connected correctly.

Let’s look at a simple imlementaton of mocks in Jest. Imagine we have a class that retrieves user data from our API. This class utilizes Axios to make the API call and then returns the data attribute, which encompasses all the users. This could potentially be flaky because it depends on an external network request via Axios.

import axios from "axios";

class Users {
  static async all() {
    try {
      const response = await axios.get("https://api.example.com/users");
      return response.data.users;
    } catch (error) {
      console.error("Error fetching users:", error);
      return [];
    }
  }
}export default Users;

Now in other to test this method to completely isolate it’s dependence on external factors, We will use jest.mock(‘axios’) to mock the entire axios module.

We then use axios.get.mockResolvedValue(resp) to mock the get method of Axios to resolve with a mock response. The test asserts that the fetched users match the expected users. Thereby making it less prone to flakiness.

import Users from "./Users";
import axios from "axios";

jest.mock("axios");describe("Users", () => {
  test("fetching users should return expected data", async () => {
    const users = [{ name: "Alice" }];
    const resp = { data: { users } };    axios.get.mockResolvedValue(resp);    const fetchedUsers = await Users.all();
    expect(fetchedUsers).toEqual(users);
  });
});

Stub Objects

Stubs in tests can be likened to actors that follow a script. You tell them what to say (or how to respond) when they’re asked something specific during a test. They don’t do anything unexpected; they just give the responses you’ve set up prior to the response or in advance.

This helps make your tests much more predictable by ensuring that parts of your code that are not being tested don’t cause any unexpected results. Fakes are close to stubs but are often used instead of stubs because of their simplicity. The stub example will be seen as the article goes on.

Implementing Test Doubles

In my opinion, there are four ways we should implement test doubles to get the best results. We could start up by:

Identifying tests that are flaky due to external dependencies or complex setups.
Being able to choose the most appropriate test doubles for the job based on the nature of the test and the dependency.
Replace the external dependency with the chosen test double.
Refactor the test to use the test double, focusing on clarity and ensuring that the test accurately reflects the intended behavior, and after implementing the test doubles monitor the test for improvements and make adjustments if necessary.

Problems and Test Double Solutions

In this section, we will be looking at four theoretical Test problems and Test double solutions.

1. Isolation from External Dependencies

Problem: Tests that interact with external systems that is databases, and APIs can fail. This is due to unpredictable issues that are outside the control of the test environment. It could be a network issue or an external service downtime.

Solution: Stubs and Fakes can simulate these external systems, ensuring that tests run consistently without being affected by external factors.

2. Controlled Behavior

Problem: One of the causes of Flakiness in tests is dynamic data. Dynamic data could change between test runs.

Solution: Mocks and stubs can be programmed with a fixed response, ensuring that tests receive the same input and follow the same execution path every time, eliminating different results in tests.

3. Time-based Flakiness

Problem: Tests relying on real-time clocks or waiting for certain time-based events can fail if the timing isn’t precise on every run. Factors like system load, network delays, or scheduling differences can cause tests to fail even if the code itself is correct.

Solution: Using test doubles like stubs and spies to simulate time or events allows tests to run consistently regardless of real-world timing issues.

4. Resource-intensive tests

Problem: Some tests may be flaky because they exhaust system resources (memory, CPU) when running under certain conditions.

Solution: Fakes can be employed to provide lightweight implementations of heavy dependencies, reducing the resource load and avoiding conditions that lead to flakiness.

This is a good way to solve resource intensiveness but it is exposed to limitations, and here is why. If the flakiness shoots from internal resource consumption within your own code, unfortunately, fakes won’t directly solve the problem! You will then need to optimize your code’s resource usage.

Practical Application Of Test Doubles to Ehance Test Stability

In this section, we will briefly look at a practical implementation of a Test double that could enhance Test stability thereby managing Flakiness in our test:

Problem: Network Dependency and External Service Reliability

To address the problem with network dependency and external service reliability, we’ll use a stub. Let’s take for example a function called inventoryCheck(). This function is designed to check if an item is in stock by using an external inventory service which could be flaky. This is what the example looks like:

function inventoryCheck(itemId, inventoryService) {
  return inventoryService
    .checkStock(itemId)
    .then((isInStock) => {
      return isInStock;
    })
    .catch((error) => {
      console.error("Error checking inventory:", error);
      return false;
    });
}

Let’s go ahead and employ a stub:

class InventoryServiceStub {
  checkStock(itemId) {
    return new Promise((resolve) => {
      resolve(true);
    });
  }
}

// Test using the stubtest("inventoryCheck with stub should return true for any item", () => {
  const inventoryServiceStub = new InventoryServiceStub();
  return inventoryCheck("item123", inventoryServiceStub).then((isInStock) => {
    expect(isInStock).toBe(true);
  });
});

In the code above our test is written with the Jest Framework. The class is a stub for the real inventory service. It simulates the behavior of an inventory service but in a simplified manner.

The checkStock() method in this stub doesn’t perform any real inventory check. Instead, it returns a promise that resolves with true, indicating that any item passed to it is considered to be in stock. This stub is used in testing to isolate the inventoryCheck function from external dependencies, allowing for a controlled environment to test the function’s logic and thereby manage Flakiness in our test.

Thoughts To Note

Will test doubles help solve flakiness in your test? The answer is yes, but while test doubles are a valuable tool for managing flaky tests;

They should be used simultaneously with techniques like Identifying the core root cause of flakiness (which could be network issues, external system bugs), and Refactoring code to be more deterministic and less dependent on external factors.
Misusing test doubles can potentially lead to cascading failures in a couple of ways. If the doubles don’t accurately mimic the behavior of the components they’re replacing, tests might pass erroneously, hiding real issues that could cause failures in production.
Over-reliance on test doubles is not advised. It’s crucial to balance the use of test doubles with integration and end-to-end tests to ease these risks.
Test doubles won’t detect API contract changes, so ensure you have separate integration tests that interact with the real API.

Closing Thoughts

Test doubles to a great extent helps, in fact if used properly would solve a mass majority of flakiness, but also understanding the simple cause of test fails could go a long way. Thank you for staying this long, I really appreciate your time. Let’s keep testing!

Originally published at https://semaphoreci.com on April 29, 2024.