Value Added Unit Testing

An old, well-known, and not particularly pleasant joke goes as follows:

A Harvard man and a Yale man are at the urinal. They finish and zip up. The Harvard man goes to the sink to wash his hands, while the Yale man immediately heads for the exit. Mister Harvard says, “At Harvard they teach us to wash our hands after we urinate”. Mister Yale retorts, “At Yale they teach us not to pee on our hands”.

If American university rivalries do not interest you, then you can substitute almost any pair of rivals for the two characters in the joke. For example, I have found versions for the navy and the marines, and with Winston Churchill and an opposition politician.

A less amusing unit testing variant of the joke, might go something like:

An agile developer and a traditional developer are checking code into source control. They finish checking in their changes. The agile developer starts to check in copious amounts of unit test code, while the traditional developer immediately heads for the coffee machine. The agile developer says, “At xUnit school they taught us to always write unit tests for our code”. The traditional developer replies, “At my school they taught us to write code that works”.

The traditional developer is obviously missing the point of writing unit test code, just as the Yale man has missed the whole point of washing his hands. However, unlike the hand washing example, the agile developer may also be in danger of missing the point.

Yes, we should always unit test our code, and whenever possible automate those tests so we can run them whenever we need to. Nevertheless, we should not overplay the significance of having a large number of automated unit tests that are passing:

Quis custodiet ipsos custodes?

Who tests the test code? Automated tests could be showing as passing when they should be showing as failing, giving a false sense of confidence. I have seen tests that only checked the contents of a set of objects returned by an operation, forgetting to check that the set contained any objects in the first place. The tests all passed, missing serious problems in the objects returned because the test code was generating an empty set. In addition, unit tests using stub code and mocks may build in incorrect assumptions about how the mocked/stubbed items really work. And so on. Unit test code may contain as many bugs as the code it is testing.

Writing unit tests should not be a substitute for thinking through a design.

Over emphasis on writing JUnit-style tests can lead to a kind of ‘trial and error’ style of development where developers simply keep trying different changes to their code until it all their tests pass. Writing code by trial and error until it passes a suite of unit tests is no substitute for thinking through the low-level design of a feature or user story before diving into code. Identifying the simplest design that works usually takes more thought than trying to immediately implement the first vague solution that comes to mind.

Modern unit tests are of little value in Continuous Integration/Delivery Builds

In the dim and distant past of the last century when I was starting my first programming job, unit testing meant testing the code that I was writing before it was integrated with other team members’ code. It was testing a chunk of software in isolation from the rest of the application, system or component.

Unit testing in the JUnit era has slowly come to mean the testing of the smallest practical chunk of software. For mainstream object-oriented languages such as Java and C# this has come to mean testing the individual methods within a particular class. Everything else is considered too wide in scope to be a unit test and should be called a functional or integration test instead.

In turn, this definition of unit testing has led to rules appearing about what unit tests may not do:

  • talk to the database
  • communicate across the network
  • touch the file system
  • run at the same time as any of your other unit tests
  • require special tweaks to your environment (such as editing config files)

All this sort of behavior must be mocked or stubbed out in a modern unit test.

Of course, unit tests complying with these sorts of rules, by definition, are not useful as part of a regular or continuous integration(CI) build. It is impossible for a correctly written unit test of this kind that passes on a developer’s own computer to fail in a CI environment. Running such a unit test in a CI environment is more of a test of the developer than the quality of of their code; we are checking to see if the developer was conscientious and professional enough to run the unit tests before they committed their changes. If they were, their unit tests will always pass in the CI build.

Proponents argue that unit tests written following these rules precisely identify where a problem is when they fail, and indicate precisely which code is working when they pass. In turn, this reduces the time that would be spent diagnosing problems should a wider-scoped test failed. In reality, the overriding aim of these rules is to keep the unit tests fast so that they do not impede a developer’s productivity when run every few minutes, or before committing any code to a source-control environment.

Unit testing is never comprehensive

It can be very time-consuming to try and write unit tests for every new code path that a project typically adds to a hybris-based system. It is almost certainly impractical to write unit tests that test every possible combination of data inputs used by the new code.

Additionally, modern unit tests miss whole classes of defects arising from errors and bad assumptions in the way classes, components and modules work with each other.  By definition, tests with wider scope are required in integration scenarios. Unit tests do not  check components integrate with each other, that they work correctly together. Developers need to write automated integration or functional tests for integration builds in addition to unit tests.

As Edsger Dijkstra once said, “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence.

Numerous trivial unit tests as ‘waste’

To be genuinely useful, the benefit of writing, frequently running, and constantly maintaining a large number of automated unit tests needs to outweigh the costs of doing so. For projects with serious deadlines to meet, each unit test must earn it’s keep.

Many managers like to see their developers active, typing away producing more code. Many programmers prefer to write code than to think through design details thoroughly. It is tempting to write more and more low-value unit tests that give that positive feedback of that green light when run.

Are large numbers of such tests truly earning their keep? Is the actual time saved from catching defects earlier because of the unit tests greater than the time spent designing, writing and maintaining the test code? If not, then writing and maintaining those tests is a form of waste that according to lean development principles should be eliminated from the process.

Of course, measuring accurately the true cost and benefit of intensive unit testing is something that few teams even attempt. In the end, it is left to the development team to make judgement calls based on experience, and for individual developers to think carefully about the value of each automated test they write, and not blindly churn out large amounts of trivial tests that are of very little real value to the project.

Concluding pyramids

Many use the recent fascination with Ancient Egyptian-style architecture in software testing to justify writing copious amounts of automated unit tests. The original intent of the automated testing pyramid, however, was to point out the need for more automated service or API-level  integration tests.

Nevertheless, this simple illustration with its very simple message seems to have taken on a life of its own: James Crisp and his article, Automated Testing and the Test Pyramid, has acceptance tests at the top of the pyramid, and led to Dean Cornish’s counter argument in On “The Testing Pyramid”. Alistar Scott’s pyramid in Yet another software testing pyramid put automated UI tests back at the top but adds more layers, and an eye. Alistar followed this up with a post on an upside down pyramid that he calls an ice-cream cone anti-pattern; ice-cream obviously being something you would want after a hot day toiling up and down pyramids. Scott’s pyramid is quoted in a presentation by Google entitled Move Fast, Don’t Break Things, and that in turn is quoted in a blog post by Mike Wacker called, Just Say No to More End-to-End Tests; but note that E2E tests are now at the peak of the pyramid instead of automated ui tests. In turn, this article is challenged by Bryan Pendleton in On testing strategies, and end-to-end testing.

If the unit testing motto of the 1990’s was, “We should write automated unit tests”, and the unit testing motto of the 2000’s was, “We need to write lots more unit tests”, then the unit testing motto of this decade should be, “We need to write genuinely useful unit tests, and plenty of automated component integration and API-level tests too”.

Further reading: