UI Development with BDD and Approval Testing

Testing with confidence. Seb Rose shows a way to approach UI testing.

This article explores the challenges of applying a Behaviour-Driven Development (BDD) approach to UI development. In addition to giving a high-level introduction to BDD, I’ll describe a technique called Approval Testing that complements traditional assertion-based testing to give developers clearer visibility of the correctness of their implementation.

What is BDD?

BDD is an agile development approach in which three practices are applied to each story/backlog item: Discovery, Formulation, and Automation. Much has been written about BDD and there are many good introductory articles available, but here I’d like to stress that these practices must be applied in the correct order. Discovery, then Formulation, then Automation. [Rose24]

In the context of BDD, Automation means the writing of test code before the code under test has been written. Just like Test-Driven Development (TDD), the failing automated test drives the development of the production code. There are two implications of this approach:

Every test should be seen to fail when it is written and seen to pass when the correct production code is implemented. Never trust a test that you haven’t seen fail.
The automation code must be written either by the people who will write the production code or by someone who is collaborating very closely with them.

Should automation be end-to-end?

There’s a common misconception that all BDD scenarios will be automated end to end, exercising the entire application stack. Since each BDD scenario is only intended to specify a single business behaviour, using an end-to-end approach for each one would be incredibly wasteful:

Runtime – end-to-end tests take longer to run than tests that exercise specific areas of the code.
Noise – the more of the code each test exercises, the more likely it is that many tests will all hit the same code paths. So, an error in that code path will cause all the tests that use it to fail, even if that part of the code has nothing to do with the business behaviour the scenario was created to illustrate. In the face of multiple failing scenarios, it’s hard to diagnose which behaviour has deviated from specification.

The ‘Test Automation Pyramid’ [Rose20] is a common metaphor that suggests most tests should not be end to end. Applying this approach to BDD automation means that we should consider the most appropriate automation to ensure that a specific behaviour has been implemented as specified.

How should the UI be tested?

BDD scenarios that describe how the UI should behave are usually written using tools such as Selenium. These can be slow and brittle because the UI is often tightly coupled with the rest of the application. However, conceptually, the UI is a component that interacts with other application components. It should therefore be possible to test the UI in isolation, near the bottom of the Test Automation Pyramid, by providing test doubles for the application components that it depends on.

Many applications are architected in such a way that the UI can only be exercised as part of an end-to-end test. Whenever possible, a more flexible architecture (that allows the UI to be exercised with the rest of the application ‘stubbed out’) should be preferred.

Is that all?

Even with a flexible architecture and automation that conforms to the Test Automation Pyramid, there are challenges. Most test frameworks come with simple assertion libraries that verify if text or numerical values are set as expected. If you need to validate all the fields in a report, you will need an assertion for each of them. This approach leads to verbose automation code that is time consuming to write and difficult to maintain. Additionally, as soon as one assertion fails, the whole test fails without checking any subsequent assertions.

For many years, a technique called Approval Testing has been used in these situations, and several tools have been developed to help teams incorporate approval testing into their software development processes. The mechanics of how the tools work vary, but their approach is the same:

The first time the test is run, the output is checked manually. If correct, this output is stored as the ‘approved’ output. If not, fix the software and repeat until the output produced is correct.
On subsequent test runs, the tool will compare the output produced to the ‘approved’ output previously recorded. If they are found to be the same, then the approval test has passed. If not, the approval test has failed. [Falco08]

Naturally, it’s not quite as simple as that. For example, if the complex output that we’re comparing includes timestamps, these will likely be different each time the test is run. Therefore, approval testing tools typically include mechanisms to specify parts of the output that should not be compared. These exclusions will be specified when the approval test is first created and stored alongside the test.

Does approval testing work for UIs?

Simply specifying areas of the output that should not be compared is insufficient if we’re trying to automatically verify the correctness of a visual component. Perhaps a text field is now too close to the border of a control or one visual element is overlaying/obscuring another one.

In these situations, machine learning (ML) and artificial intelligence (AI) can deliver huge benefits. Our tests can leverage these technologies to identify these hard-to-spot issues to a precision that the human eye cannot. But they take time – and slow feedback from a build is the enemy of automated testing.

Instead, AI/ML powered visual tests should be run in a separate stage in the build pipeline, after the faster automated checks have already passed. This ensures that developers get the fast feedback they require while also delivering confidence that the UI is free of visual defects.

If the visual tests pass, then all is well. If there’s a failure during the visual tests, then manual investigation is required – because not all failures indicate a defect in the code.

When is a failure not a fail?

We normally think of a test as having a binary outcome. It either passes or fails. Life in software development is rarely that simple. To ensure that the software we ship satisfies our customers’ needs, we want to minimize false positives. So, when a test passes, we need to be confident that the behaviour being verified is implemented correctly.

When a test fails, it doesn’t necessarily mean that the behaviour has been implemented incorrectly. There are three expected situations that cause a test to fail:

Incorrect implementation: this could be caused by a misunderstanding of the specification or an error in the implementation.
Incorrect specification: the test is performing the wrong check(s) or the check(s) are being carried out incorrectly.
BDD/TDD: the test has been written before the behaviour it’s designed to check has been implemented.

When any sort of failure happens in a build, investigation is required. If you find that Situation 1 or 2 has occurred, fix the defect (either in the implementation or the specification) and run the build again.

Situation 3) is a signal to the development team that the work is incomplete. Seeing an automated test fail is an important part of all BDD/TDD workflows. Usually, we would like the failure to be seen in the developers’ environment and made to pass before being pushed to CI. However, some workflows may see the tests committed and pushed before the behaviour being verified is implemented.

AI/ML powered visual approval testing

There are several popular, free, open source approval testing tools available ([TextTest], [ApprovalTests]). Their support for visual comparison is limited (absent in the case of TextTest), but there are techniques that, used in conjunction, may be sufficient for your needs (see the Printer section in this article by Emily Bache [Bache19] for example).

With the increasing availability of AI/ML techniques, a number of visual testing tools are now available that incorporate AI functionality to facilitate the validation of complex graphical applications. Applitools is possibly the most popular commercial offering [Applitools], but there are many others available with competing functionality and pricing.

Conclusion

The development team need regular, fast feedback to give them visibility of important aspects of their software’s quality. They need confidence that they’re developing the right thing in the right way.

BDD and TDD are techniques that give developers that confidence (among other benefits). Currently, most organisations that adopt these approaches use popular assertion-based tools to verify that the code being developed satisfies the specifications. This focus on assertion-based testing is unsuitable for some of the subtle and complex issues that occur when developing today’s applications.

Approval testing in all its flavours can help bridge the gap between automated assertion checking and time-consuming manual testing. Existing approval testing libraries are excellent when dealing with complicated textual outputs and simple graphical components. Visual testing tools are emerging that leverage AI/ML to bring approval testing for modern UIs within reach of reliable automated testing.

References

[Applitools] Applitools: https://approvaltests.com/

[ApprovalTests] Approval Tests: https://approvaltests.com/

[Bache19] Emily Bache ‘Approval Testing’, published at https://approvaltests.com/ on 23 July 2019.

[Falco08] Llewellyn Faloc ‘Approval Tests (a picture worth a 1000 tests)’ posted at https://llewellynfalco.blogspot.com/2008/10/approval-tests.html on 13 October 2008.

[Rose20] Seb Rose ‘Eviscerating the Test Automation Pyramid’, available at https://cucumber.io/blog/bdd/eviscerating-the-test-automation-pyramid/, posted 7 February 2020.

[Rose24] Seb Rose ‘Behaviour-Driven Development’, available at https://cucumber.io/docs/bdd, last updated November 2024.

[TextTest] TextTest: https://www.texttest.org/

This article was first published on Seb Rose’s blog on 8 February 2023: https://cucumber.io/blog/bdd/bdd-approval-testing-and-visualtest/. It has been reviewed and updated for Overload.

Seb Rose Seb has been a consultant, coach, designer, analyst and developer for over 40 years. Co-author of the BDD Books series Discovery and Formulation (Leanpub), lead author of The Cucumber for Java Book (Pragmatic Programmers), and contributing author to 97 Things Every Programmer Should Know (O’Reilly).