Value Added Unit Testing

An old, well-known, and not particularly pleasant joke goes as follows:

A Harvard man and a Yale man are at the urinal. They finish and zip up. The Harvard man goes to the sink to wash his hands, while the Yale man immediately heads for the exit. Mister Harvard says, “At Harvard they teach us to wash our hands after we urinate”. Mister Yale retorts, “At Yale they teach us not to pee on our hands”.

If American university rivalries do not interest you, then you can substitute almost any pair of rivals for the two characters in the joke. For example, I have found versions for the navy and the marines, and with Winston Churchill and an opposition politician.

A less amusing unit testing variant of the joke, might go something like:

An agile developer and a traditional developer are checking code into source control. They finish checking in their changes. The agile developer starts to check in copious amounts of unit test code, while the traditional developer immediately heads for the coffee machine. The agile developer says, “At xUnit school they taught us to always write unit tests for our code”. The traditional developer replies, “At my school they taught us to write code that works”.

The traditional developer is obviously missing the point of writing unit test code, just as the Yale man has missed the whole point of washing his hands. However, unlike the hand washing example, the agile developer may also be in danger of missing the point.

Yes, we should always unit test our code, and whenever possible automate those tests so we can run them whenever we need to. Nevertheless, we should not overplay the significance of having a large number of automated unit tests that are passing:

Quis custodiet ipsos custodes?

Who tests the test code? Automated tests could be showing as passing when they should be showing as failing, giving a false sense of confidence. I have seen tests that only checked the contents of a set of objects returned by an operation, forgetting to check that the set contained any objects in the first place. The tests all passed, missing serious problems in the objects returned because the test code was generating an empty set. In addition, unit tests using stub code and mocks may build in incorrect assumptions about how the mocked/stubbed items really work. And so on. Unit test code may contain as many bugs as the code it is testing.

Writing unit tests should not be a substitute for thinking through a design.

Over emphasis on writing JUnit-style tests can lead to a kind of ‘trial and error’ style of development where developers simply keep trying different changes to their code until it all their tests pass. Writing code by trial and error until it passes a suite of unit tests is no substitute for thinking through the low-level design of a feature or user story before diving into code. Identifying the simplest design that works usually takes more thought than trying to immediately implement the first vague solution that comes to mind.

Modern unit tests are of little value in Continuous Integration/Delivery Builds

In the dim and distant past of the last century when I was starting my first programming job, unit testing meant testing the code that I was writing before it was integrated with other team members’ code. It was testing a chunk of software in isolation from the rest of the application, system or component.

Unit testing in the JUnit era has slowly come to mean the testing of the smallest practical chunk of software. For mainstream object-oriented languages such as Java and C# this has come to mean testing the individual methods within a particular class. Everything else is considered too wide in scope to be a unit test and should be called a functional or integration test instead.

In turn, this definition of unit testing has led to rules appearing about what unit tests may not do:

  • talk to the database
  • communicate across the network
  • touch the file system
  • run at the same time as any of your other unit tests
  • require special tweaks to your environment (such as editing config files)

All this sort of behavior must be mocked or stubbed out in a modern unit test.

Of course, unit tests complying with these sorts of rules, by definition, are not useful as part of a regular or continuous integration(CI) build. It is impossible for a correctly written unit test of this kind that passes on a developer’s own computer to fail in a CI environment. Running such a unit test in a CI environment is more of a test of the developer than the quality of of their code; we are checking to see if the developer was conscientious and professional enough to run the unit tests before they committed their changes. If they were, their unit tests will always pass in the CI build.

Proponents argue that unit tests written following these rules precisely identify where a problem is when they fail, and indicate precisely which code is working when they pass. In turn, this reduces the time that would be spent diagnosing problems should a wider-scoped test failed. In reality, the overriding aim of these rules is to keep the unit tests fast so that they do not impede a developer’s productivity when run every few minutes, or before committing any code to a source-control environment.

Unit testing is never comprehensive

It can be very time-consuming to try and write unit tests for every new code path that a project typically adds to a hybris-based system. It is almost certainly impractical to write unit tests that test every possible combination of data inputs used by the new code.

Additionally, modern unit tests miss whole classes of defects arising from errors and bad assumptions in the way classes, components and modules work with each other.  By definition, tests with wider scope are required in integration scenarios. Unit tests do not  check components integrate with each other, that they work correctly together. Developers need to write automated integration or functional tests for integration builds in addition to unit tests.

As Edsger Dijkstra once said, “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence.

Numerous trivial unit tests as ‘waste’

To be genuinely useful, the benefit of writing, frequently running, and constantly maintaining a large number of automated unit tests needs to outweigh the costs of doing so. For projects with serious deadlines to meet, each unit test must earn it’s keep.

Many managers like to see their developers active, typing away producing more code. Many programmers prefer to write code than to think through design details thoroughly. It is tempting to write more and more low-value unit tests that give that positive feedback of that green light when run.

Are large numbers of such tests truly earning their keep? Is the actual time saved from catching defects earlier because of the unit tests greater than the time spent designing, writing and maintaining the test code? If not, then writing and maintaining those tests is a form of waste that according to lean development principles should be eliminated from the process.

Of course, measuring accurately the true cost and benefit of intensive unit testing is something that few teams even attempt. In the end, it is left to the development team to make judgement calls based on experience, and for individual developers to think carefully about the value of each automated test they write, and not blindly churn out large amounts of trivial tests that are of very little real value to the project.

Concluding pyramids

Many use the recent fascination with Ancient Egyptian-style architecture in software testing to justify writing copious amounts of automated unit tests. The original intent of the automated testing pyramid, however, was to point out the need for more automated service or API-level  integration tests.

Nevertheless, this simple illustration with its very simple message seems to have taken on a life of its own: James Crisp and his article, Automated Testing and the Test Pyramid, has acceptance tests at the top of the pyramid, and led to Dean Cornish’s counter argument in On “The Testing Pyramid”. Alistar Scott’s pyramid in Yet another software testing pyramid put automated UI tests back at the top but adds more layers, and an eye. Alistar followed this up with a post on an upside down pyramid that he calls an ice-cream cone anti-pattern; ice-cream obviously being something you would want after a hot day toiling up and down pyramids. Scott’s pyramid is quoted in a presentation by Google entitled Move Fast, Don’t Break Things, and that in turn is quoted in a blog post by Mike Wacker called, Just Say No to More End-to-End Tests; but note that E2E tests are now at the peak of the pyramid instead of automated ui tests. In turn, this article is challenged by Bryan Pendleton in On testing strategies, and end-to-end testing.

If the unit testing motto of the 1990’s was, “We should write automated unit tests”, and the unit testing motto of the 2000’s was, “We need to write lots more unit tests”, then the unit testing motto of this decade should be, “We need to write genuinely useful unit tests, and plenty of automated component integration and API-level tests too”.

Further reading:

Play the Quality Game

Looking for something different to do with your team in a retrospective or as an ice-breaker or warm-up exercise at the beginning of a planning, story writing or other team activity? Running through this simple quality game can kick start some good process improvement discussions.

How to Play

  1. As a whole team or in groups of three or four, spend five minutes listing the characteristics that make a piece of software ‘high quality’ in the eyes of its users.In other words, look for words or short phrases to complete the sentence, “High quality software is ….”.
    For example: fast, robust, easy to use, …
  2. Then do the same again but this time list the characteristics that make source code ‘high quality’ in the eyes of developers.In other words, look for words to complete the sentence, “High quality source code is ….”.
    For example: extensible, portable, easy to understand, …
  3. Next spend a couple minutes listing different QA methods and techniques available to the team. Group them under the headings of automated testing, manual testing, static analysis, and peer reviews.
  4. Now spend five to ten minutes identifying which of the four categories of QA methods and techniques from step 3 are useful for checking or measuring each of the quality attributes identified in steps 1 and 2.
  5. Finally, spend a few minutes reviewing the results as a whole team. Consider how you can improve the way the team works to increase the quality of your software and source code. In addition, identify any existing quality assurance activities that don’t contribute effectively to checking any of the quality attributes that you have identified and justify continuing to do those activities.

How to Cheat

Occasionally a team might decide to state that high quality software is ‘software that meets the requirements’  or high quality source code is ‘source code that complies with coding standards’ and stop there. Combat this by requesting them to list some of the characteristics that should be covered by requirements and some of the reasons why coding standards exist and are important.

As the facilitator of the quality game you can use the following ‘cheat sheet’ to prompt and suggest additional ideas to help teams:

(the cheat sheet is also available as a TheQualityGameCheatSheet)

Step 1: Characteristics of High Quality Software

  • Correct – does the software do what it says on the tin under normal circumstances?
  • Reliable – does the software work correctly every time?
  • Robust – can the software handle abnormal conditions gracefully and appropriately?
  • Consistent – are similar tasks done in similar ways; both from a user interface perspective and from a design and implementation prespective?
  • Fast – does the software do what it says on the tin quickly enough?
  • Efficient – does the software avoid consuming too many computing resources e.g. processor, RAM, disk i/o, network i/o, etc.? Can other software run at the same time on an average end-users machine?
  • Secure – a subset of ‘correct’ but important and specialized enough to be worth a separate mention
  • Simple – Is there any unnecessary complexity, restrictions, or over-complication in the user interface? Can necessary complexity be hidden by better abstractions?

I consider the characteristics above can be summed up as software that is genuinely useful or entertaining and a delight to use. For example, if the software is not truly beneficial to someone then it probably is not doing what it ought to be doing, not doing it fast enough, or crashing too often. Can the intended users of the software easily learn and remember how use it? Can they do frequently needed tasks in a small number of steps? Do frequent error situations mean a lot of extra work for users? And so on.

Step 2: Characteristics of High Quality Source Code

  • Modular – is the software organised into logical chunks rather than a single monolithic heap of spaghetti code?
  • Loosely coupled – are the number of dependencies between modules kept reasonably low?
  • Highly Cohesive – does each module provide a small number of highly-related features?
  • Standards Compliant – does the code comply with agreed design and coding standards?
  • Simple  – Is there any unnecessary complexity, cleverness, or over-complication in the code? Can necessary complexity be hidden behind simpler interfaces and facades or encapsulated within better abstractions?
  • Reusable – are the features provided by key modules used by all the other modules needing that functionality?
  • Extensible – is it easy to add new features to the software?
  • Well-documented – does the code have useful comments? Is there enough additional functional, design, and user documentation?
  • Compatible – does the software play well with other standards-compliant software?
  • Adaptable – can the software be used in different situations easily?
  • Portable – can the software be easily run in different environments?

To me, these characteristics can be summed up as source code that is easy to understand and modify. Martin Fowler says in his book, Refactoring, that “any idiot can write code a computer can understand, good programmers write code that humans can understand.” And what cannot be communicated in the code should be clear from readily available documentation.

Step 3: Kinds of QA activity and technique

Most software development quality assurance techniques fall into one of four categories:

  1. Manual testing (MT) – includes adhoc testing, exploratory testing, informal bug hunts, user interface walkthroughs, in addition to formal execution of manual test cases.   

  2. Automated testing (AT) – the execution of test scripts using tools like the xunit family of tools, Spock, Geb, Selenium, Jasmine, Protractor, etc.

  3. Static analysis (SA) – includes source code and compiled code analysis tools like compiler warning levels, SonarQubePMDCheckstyleFindBugs, etc,.

  4. Peer review and inspections (PR) – visual inspection of requirements, plans, design and code, etc. by other members of the team or experts from outside the team.

Step 4: Checking Each quality Attribute

  • Genuinely Useful or Entertaining:  MT
    There is no automated way to tell if a piece of software is genuinely useful or entertaining. We hope that the software will prove useful if it functions as we understand to be required, and the specified requirements truly reflect the needs of the end-users.
  • A Delight to Use:  MT
    There is no automated way to tell if a piece of software is actually delightful or even easy to use.
  • Correct:  AT, MT, SA, PR
    All four methods can contribute in different ways to improving correctness. Various studies of the last four decades have repeatedly shown that static analysis and peer review find different defects than testing does.
  • Reliable:  AT, MT, SA, PR
    All four methods can contribute in different ways to improving reliability
  • Robust:  AT, MT, SA, PR
    All four methods can contribute in different ways to improving robustness
  • Consistent:  MT, PR
    Currently, there is no automated way to tell you if a piece of software does things consistently.
  • Fast:  AT, MT
    SA and PR cannot measure actual performance
  • Efficient:  AT, MT, SA, PR
    SA and PR cannot measure actual efficiency but can identify some typically inefficient coding constructs
  • Secure:  AT, MT, SA, PR
    There are certain SA tools designed to check for specific coding constructs that may cause security problems.
  • Simple:  MT, SA, PR
    Only MT can decide if a user interface is simple to use or not. SA can provide measures of code complexity but only PR can decide if it is appropriate or not.
  • Easy to Understand:  SA, PR
    SA tools can identify specific contributing issues but only actual peer review of the code can determine if it is easy to understand.
  • Modular:  PR
    You cannot test or automatically analyse for modularity.
  • Loosely Coupled:  SA, PR
    SA can measure coupling but only PR can decide if it is appropriate or not.
  • Highly Cohesive:  SA, PR
    SA can measure cohesion but only PR can decide if it is appropriate or not.
  • Standards Compliant:  SA, PR
    SA should be used for code layout and other basic coding standards leaving PR for stuff that SA cannot fully check such as meaningful variable and method names. It is important to agree the standards that SA will check and to configure it appropriately before using it. It is also wise to agree standards such as code layout rules that can be supported easily within whatever IDEs your team prefer to use.
  • Reusable:  PR
    You cannot really test or automatically analyze for reusability.
  • Extensible:  PR
    You cannot test or automatically analyze for extensibility.
  • Well- documented:  SA, PR
    SA can check things like javadoc contents for the right types of entry but the value of comments and documentation is something that only a review can determine.
  • Compatible:  AT, MT
    Compatibility is usually measured by compliance with specific set of test cases, automated and/ or manual.
  • Adaptable:  PR
    You cannot really test or automatically analyze for adaptability
  • Portable:  AT, MT, SA, PR
    All four methods can contribute in different ways to checking portability

Step 5: improving the ways we work

If the team does not practise any form of peer code review then this exercise should convince them that it is something worth trying. If they are already doing code reviews then it is worth looking for ways to improve the effectiveness of these. Good use of static analysis tools should eliminate much of the tedious stuff from code reviews.

Ask the team how they can achieve a better balance of automated and manual testing. The usual answer to the question of, “How much automated testing should we be doing?” is usually,”More than we are currently doing”.

End-users and project sponsors value quality attributes such as correctness, performance and ease of use, but good development teams know that those aspects eventually depend upon maintaining high quality source code even though that is not visible to end-users and project sponsors. As Bertrand Meyer states in Object-Oriented Software Construction“In the end, only external factors matter. … But the key to achieving these external factors is the internal ones”. This is especially true in agile, iterative development approaches where a team is repeatedly building on results delivered in previous iterations and releases.

Prioritising effort to increase the quality for one person’s point of view at the cost of reduced effort for another’s point of view is often a tough balancing act. Too much emphasis on externally visible quality at the cost of internal quality can lead to a build up of technical debt that eventually undermines the ability to maintain externally visible quality. Conversely, overly emphasising internal quality without due emphasis on externally visible quality runs the risk of the software becoming less and less useful and relevant to the people paying for it.

For those working with micro-services, you might want to adapt the exercise to incorporate the infamous 12 factors.


The various different attributes of high quality software and source code need different combinations of tools, strategies and techniques to monitor and improve them. No one single technique or tool is sufficient. In particular, reliance on testing without any form of peer review leaves many quality attributes unchecked. Nevertheless, to avoid consuming too much time doing peer reviews, always select the appropriate level of formality for each piece of work (plans, requirements, design, code and test cases) from brief sanity check to formal inspection to pair-programming.

Further Reading

Old Stuff


From the Agile Transformation Trenches


The Coad Letter