Once upon a time there were several software development teams that worked on a fairly mature product. Because the product was so mature, there were thousands of automated tests – a mix of unit and web UI tests – that covered the entire product. Any time they wanted to release a new version of the product, which was only a few times a year, they’d run the full suite of automated tests and several of the web UI tests would fail. This was because those tests were only executed during the specified regression testing period, and of course a lot had changed within the product over the course of several months.

The company wanted to be more competitive and release more often, so the various development teams began looking into continuous integration (CI) where the automated tests would run as part of every code check-in. But…there were thousands of tests. And although there were so many tests, the teams were really careful about choosing which tests to automate, so they were fairly confident that all of their tests provided value. So, they ran all of them – as part of every single build.

It didn’t take long for the team to complain about how much time it took the builds to execute. And rightfully so, as one of the benefits they were hoping to realize from CI was fast feedback. They were sold a promise that they’d be able to check in their code and within only a few minutes they’d receive feedback as to whether their check-in contained breaking changes. However, each build took hours to complete. And once the build was finally done, they’d also need to spend additional time investigating any test failures. This became especially annoying when the failures were in areas of the application that different teams worked on. This didn’t seem like the glorious Continuous Integration that they heard such great things about.

Divide and Conquer

Having a comprehensive test suite is good for covering the entire product, however, it posed quite the challenge for continuous integration builds. The engineers looked at how they themselves were divided up into smaller agile teams and decided to break their test suite up to better reflect this division.

Each area of the product was known as a work zone, and if anyone was working on a particular part of the application, that was considered an active work zone. Areas that were not under actively development, were considered dormant work zones.

The builds were broken up for the respective work zones. Each build would share common tests such as build verification tests and smoke tests, but the other tests in a given build would be only the ones related to that work zone. For example, the Registration feature of the application was considered a work zone, and therefore there was a build that would only run the tests that were related to Registration. This provided nicely scoped builds with relevant tests and reduced execution time.

In additional to the various work zone builds, there was still the main build with all of the tests, but this build was not used for continuous integration. Instead, this build would run periodically throughout the day. This provided information about how changes may have impacted dormant work zones which did not have active builds running.

Assigning tests to work zones

All web UI tests lived in a common repository, regardless of the specific functional area. This allowed tests to share common utilities. The teams decided to keep this approach and use tagging to indicate which functional area(s) a given test covered. For example, for a test that verified a product listing, this test would be tagged for the “Shopping” work zone. And for a test that adds a product to a cart, this one spanned multiple work zones and was therefore tagged as “Shopping” and “Cart”. Tests that were tagged for multiple work zones would run as part of multiple builds.

To tag the tests, the teams used their test runner such as TestNG or JUnit and made use of the annotation feature of these runners.

@Test(groups={WorkArea.SHOPPING, WorkArea.CART})
public void testAddProductToCart()
{
    ...
}

Test runners also typically allow a means to configure which tests run. The team decided not to create these configuration files within the code repository because it did not allow for quick changes, as they’d need to check the change in, have it reviewed, etc. So, instead the configuration was done at the CI job level.

mvn test -Dgroups=cart

With this, if someone was checking in a feature that touched multiple work zones, they could quickly configure the build to pull in tests from all relevant zones. Also, it allowed for teams to change their build’s needs as their sprints changed. For example, the Shopping area may be an active work zone one sprint but a dormant work zone in the next. So, while the builds were focused on a specific work zone, they really were more aligned with the sprint team and their current needs at any given time.

Limitations

While this approach eliminated the complaints of the build being too slow or containing unrelated test failures, there were still limitations.

Bugs can be missed

By reducing the scope of the build, the team was not testing everything. This means that unintentional bugs in other work zones could creep in with a check in. However, to mitigate this risk, remember, the teams kept the main build which ran all tests several times a day. Initially they set this to run only once a day but found that wasn’t often enough. So, they increased this to run every 6 hours. If this build failed, it would be from a check-in made within the last 6 hours which helped narrow down the problem area.

Also, this system relied heavily on the tests being tagged properly. If someone forgot to tag a test or mis-tagged it, that would not be run as part of the appropriate work zone build. Usually these were caught by the main build and this gave an opportunity to fix the tagging.

Tests must be reliable

The web UI tests were not originally part of continuous integration. Instead they were run periodically throughout the year (during the dedicated regression testing time) on someone’s local machine. That person would then investigate the failures and could easily dismiss flaky tests that failed with unjust cause, unbeknownst to the rest of the team.

However, this sort of immaturity is unacceptable when a test needs to run as part of continuous integration. It has to be reliable. So before this new CI process could work flawlessly, the team had to invest time into enhancing the quality of their tests so that they only failed when they were supposed to.

Not every test failure is a show-stopper

The teams went through the very important process of identifying the most valuable tests to automate. Which would make you think that if any of them fail, the integration should be canceled. This sounds right in theory, but was different in practice.

Sometimes tests would fail, the team would investigate, then determine they still wanted to integrate the feature. So, they opened a bug, disabled the test, and integrated the feature.

Is this wrong? Why have the test if you’re going to still integrate in the event of a failure?

The team decided that the information was still valuable to them. Knowing this gave them information about the risks they were taking, and they could discuss as a team if they were willing to take the risk of introducing this new feature knowing that it breaks an existing feature. In some cases, it was worth it, and they opened bugs to eventually address those failures.

That’s the role of tests: to provide the team with fast feedback so that they can make informed decisions.

Happily Ever After

Preparing for continuous integration certainly took a fair amount of investment. The team learned a valuable lesson: you don’t just jump into CI. A proper testing strategy is needed to ensure you’re able to realize the benefits of CI, namely fast and reliable feedback. After experiencing bumps and bruises along the way, the team finally figured out a system that worked for them.

Test Management within Continuous Integration Pipelines