please enable javascript

The Forgotten Layer of the Test Automation Pyramid

Even before the ascendancy of agile methodologies like Scrum, we knew we should automate our tests. But we didn’t. Automated tests were considered expensive to write and were often written months, or in some cases years, after a feature had been programmed. One reason teams found it difficult to write tests sooner was because they were automating at the wrong level. An effective test automation strategy calls for automating tests at three different levels, as shown in the figure below, which depicts the test automation pyramid.

Test Automation Pyramid

At the base of the test automation pyramid is unit testing. Unit testing should be the foundation of a solid test automation strategy and as such represents the largest part of the pyramid. Automated unit tests are wonderful because they give specific data to a programmer—there is a bug and it’s on line 47. Programmers have learned that the bug may really be on line 51 or 42, but it’s much nicer to have an automated unit test narrow it down than it is to have a tester say, “There’s a bug in how you’re retrieving member records from the database,” which might represent 1,000 or more lines of code. Also, because unit tests are usually written in the same language as the system, programmers are often most comfortable writing them.

Let’s skip for a moment the middle of the test automation pyramid and jump right to the top: the user interface level. Automated user interface testing is placed at the top of the test automation pyramid because we want to do as little of it as possible.

Suppose we wish to test a very simple calculator that allows a user to enter two integers, click either a multiply or divide button, and then see the result of that operation. To test this through the user interface, we would script a series of tests to drive the user interface, type the appropriate values into the fields, press the multiply or divide button, and then compare expected and actual values. Testing in this manner would certainly work but would be brittle, expensive, and time consuming.

Additionally, testing an application this way is partially redundant—think about how many times a suite of tests like this will test the user interface. Each test case will invoke the code that connects the multiply or divide button to the code in the guts of the application that does the math. Each test case will also test the code that displays results. And so on. Testing through the user interface like this is expensive and should be minimized. Although there are many test cases that need to be invoked, not all need to be run through the user interface.

And this is where the service layer of the test automation pyramid comes in. Although I refer to the middle layer of the test automation pyramid as the service layer, I am not restricting us to using only a service-oriented architecture. All applications are made up of various services. In the way I’m using it, a service is something the application does in response to some input or set of inputs. Our example calculator involves two services: multiply and divide.

Service-level testing is about testing the services of an application separately from its user interface. So instead of running a dozen or so multiplication test cases through the calculator’s user interface, we instead perform those tests at the service level.

Where many organizations have gone wrong in their test automation efforts over the years has been in ignoring this whole middle layer of service testing. Although automated unit testing is wonderful, it can cover only so much of an application’s testing needs. Without service-level testing to fill the gap between unit and user interface testing, all other testing ends up being performed through the user interface, resulting in tests that are expensive to run, expensive to write, and brittle.

For more on Scrum and agile testing, pick up a copy of Succeeding with Agile.

Tags:

17 Responses to “The Forgotten Layer of the Test Automation Pyramid”

  1. This seems to be what web testing has been doing quite well for the last couple of years.

    With the advent of BDD most of the test suites supporting it (Cucumber etc) work exactly via a series of automated calls to controllers without browser interface.

    I’ve seen far more organisations implementing something like Cucumber than the more in depth browser interfaces like Selenium, for exactly the reasons that you outline here. Perhaps my view is biased by mostly hanging around Rails shops rather than places using slower moving languages though.

  2. Laurent Bristiel says:

    Thanks for the explanation.

    I recently used your pyramid metaphor in my company as a QA manager trying to describe the testing strategy I want to follow.

    I ended up replacing “UI” by “End-to-end” for a couple of reasons like :
    - UI make people think about GUI (which is your example) which does not exist for a bunch of products (eg. batch).
    - the top of the pyramid is rather “testing the whole system the way it will be used” or “testing in pre-production environment”. So for some people it will mean, use it through the UI that I did not used as entry point in my “Service Layer”. For some other people it will mean “test my system on the real database instead of memory one”. For some other it will be, test my system with real data instead of QA test cases.
    - I could have “unit test” or “service test” testing the UI itself. For example, I could check that the multply button appears on the screen, or is clickable, without checking the results. In this case I would not be on the top of your pyramid.

    To me, going up the pyramid is testing bigger bricks of the whole wall rather than going closer to UI.

    I Would be interested to know if other share my experience on using the pyramid.

    Thanks,
    Laurent Bristiel

  3. Laurent,

    I got also some problems based on another layout of our software. We have classes, components, systems and GUI-level and customer-facing tests. There are five to ten separate software components of several classes. The components form together a complete system with a separate GUI. In the end we customize the product to fit the customer needs. Therefore there are multiple combinations to use the pyramid.

    In my context, the classes need the biggest amount of tests, so they’re also on the bottom line. Then some higher component tests build a basis for the system test to happen. GUI-level tests are mostly done manually, and in the end we rely on system testing to have happend for customer-based testing. This approach does not reflect the testing pyramid completely. But it’s a model, that may work to some degree.

    Hope this helped.

    Kind regards
    Markus Gärtner

  4. Hey there!

    Yes, my experience is the same as Laurent & Markus’s. I prefer to talk about “end-to-end” rather than GUI when referring to the top bit of the pyramid.

    This also helps me talk about testing aspects of the GUI such as making sure that the wiring is correct, e.g. that the Foo button calls foo(), and the look, e.g. css classes/attributes.

    In short, not all GUI tests absolutely need to be end-to-end, and not all end-to-end tests tests need to go through the GUI.

    Elisabeth

  5. Mike Cohn says:

    Hi Laurent, Markus and Elisabeth–
    Very good suggestion. I probably should change the top from “UI” to “End to End.” I think I’ll start doing that from now on. I know that when I first drew this pyramid (6-7 years ago) I was thinking more about the tools we would use. So “UI” at the top meant we’d “use our UI testing tools” (any type of capture/playback type thing although generally scripted rather than captured). So even though it was a UI testing tool we were testing end-to-end, which is definitely a better term.

    Another possible way of thinking about the levels is to think about how deep each pushes into the application. The unit tests test a unit; the service-level tests test from a service down; the top tests test from end-to-end (or from UI down).

    Thanks for your suggestions.

  6. Surbhi says:

    Hi Mike,
    What I understand of End to End automation test case is that they exercise a complete functionality of application in all possible ways or rather, all frequently used ways. They might have been expansive and brittle some years ago, but since the invent of light easy to use automation tools like Selenium and WATIR, automation has been very simple and robust. My experience says, then when you go into the 5th or 6th sprint of your project, you would love to run the End to End automation for things you developed somewhere in your 2nd and 3rd sprint. So where the point of regression testing comes, End to End automation is your best bet.
    So I feel if you have been good at your service layer, only then you can have a clear view of your app from Top(UI)
    Am i getting it right?

  7. Jonathan Perret says:

    Mike, I think this post is a very good summary of the value there is in testing software below the UI level.
    What I would like to hear more opinions about is how this pyramid “intersects” with the acceptance/developer testing divide. That is, at the bottom it is clear that unit tests cannot serve as acceptance tests since they are inscrutable to the customer. End-to-end tests on the other hand, directly reflect the requirements of the whole system so they can work as acceptance tests. However for service-level tests, I find the distinction more subtle : some of these could reflect actual requirements (i.e. 2+2=4) but others can be developer integration tests, written only to check the wiring of components.
    Any insights on that ?
    Cheers,
    –Jonathan

  8. Rakesh Patel says:

    Hi,

    as much as I agree with this posting, I strongly urge people to watch this presentation about the problems with integration tests which maps to the ‘service’ layer of the pyramid.

    http://www.infoq.com/presentations/integration-tests-scam

    R

  9. Cuan says:

    hi Mike

    good overview and summary for pre go live testing, once live though, UI tests or end-to-end provide great smoke tests to ensure the users exp is maintained post a patch release or an upgrade.

    C

  10. Matt Robb says:

    I would definately change UI to End-to-End, mainly because you can test the UI as a “service layer” independently of the layers it interacts with by mocking those layers just as you would with any other service layer.

  11. Duane Wesley says:

    Mike,

    I agree that the service layer, as you describe it, is often ignored. As for the top layer, whether you call it UI or End-To-End, is it not the View of the MVC architecture? And, is not the service layer the Controller wherein services and business logic may reside? Depending upon the system, the degree to which the software maps to MVC can vary widely–one can imagine various degenerate cases, for example–yet, I nonetheless believe that the MVC is an instructive, alternative “view” of the analysis you are presenting.

    Thanks for getting me thinking!

    Best Regards,

    Duane

  12. Mike Cohn says:

    Hi Duane–
    Yes, I do think these can be thought of as mapping closely to MVC in an MVC architecture.

  13. Sohan says:

    I think your layers and the comments strongly hint to the Rails test laters:
    1. Unit test
    2. Functional test
    3. Integration test
    I really liked the functional test part. Because they made it so simple. But can you share your views on end-to-end or integration testing? Because this seems to be the hardest thing to achieve in synch within a sprint cycle. We tried it a few times, but we found that it was hard for QA people to produce the test code in parallel to the UI implementation. And too often you need to spend a lot of time to fix the integration test code just because of changes in UI.

  14. Kerry Kimbrough says:

    Elisabeth makes a great point. There’s a bit of ambiguity in the pyramid because it could illustrate two different concerns. One concern is the structure of the “testables” as a stack of layers, each of which could be target of a different set of tests. Another concern is the “depth” of a test: how may layers would a test traverse? In other words, is it end-to-end, one layer only, etc.? There are other concerns for testers, too (for example, the nature of “use case” for this test, etc.) Clearly, the pyramid can’t illustrate them all. I think the pyramid picture most clearly shows the “structure of testables” dimension, in which case “External Interface” might be an accurate name for the top level. I’ve worked with many “headless” systems in which this top level was an API or RESTful protocol, etc. But I must say that every system has a UI. Inasmuch as every system exists to benefit some person(s), there must always be a way for those people to interact with the system, if only to check that something beneficial happened!

  15. Kerry Kimbrough says:

    I’ve also seen this 3-level structure in interactive systems, but from a different perspective. There tends to be (should be!) a layer that is the *model* of the UI. That is, a model of the elements and behaviors of the system *as experienced by the user*. This is typically not identical to the domain model of the system — instead, it is a model of the work people do when using this system within this domain model. It might the M of your MVC architecture, but only if you understand that there a M(ui) that wraps your M(domain). But it also encompasses the C of your MVC because the work model governs the task sequence structure. I think this work model is somewhat broader that a collection of “services”, since it also governs how services are used. At any rate, I think Mike’s main point is the crucial one: this UI work model can be (should be!) represented as an API that can be tested like any other (i.e. independently of any particular View). And if those tests are thorough, the remaining View-specific tests required can be (should be!) much smaller.

  16. todd says:

    I have been using Mike’s Test Automation Pyramid to explain to clients testers and developers how to structure a test strategy. It has proved the most effective rubric (say compared with the Brian Marick’s Quadrant’s model – as further evolved from Crispin and Gregory) to get people thinking about what going on in testing the actual application and its stress points. I want to add that JB Rainsberger’s talk mentioned above is crucial to understanding why that top level set of tests can’t prove integrity of the product by itself.

    It has got me thinking that perhaps that we need to rethink some assumptions behind these labels. The difference of opinion in this blogs also suggests this. So I thought I would spend some time talking about how I use the pyramid and then come back to rethinking its underlying assumptions. I’d love any feedback.

    I have renamed some parts of the pyramid so that at a first glance it is easily recognisable by clients. This particularly renaming is in the context of writing MVC web applications. I get teams to what their pyramid looks like for their project – or what they might want it to be because it is often upside down.

    My layers:

    – System (smoke, acceptance)
    – Integration
    – Unit

    I also add a cloud on top (I think from Crispin and Gregory) for exploratory testing. This is important for two reasons: (1) I want automated testing so that I can allow more time for manual testing and to emphasise that (2) there should be no manual regression tests. This supports Rainsberger’s argument not to use the top-level testing as proof of the systems integrity – to me the proof is in the use of the system. Put alternatively, automated tests are neither automating your tester’s testing nor are they a silver bullet. So if I don’t have a cloud people forget that manual testing is part of the automated test strategy (plus with a cloud when the pyramid is inverted it makes a good picture of ice cream in a cone and you can have the image of a person licking the ice cream and it falling off ;-) .)

    In the context of an MVC application, this type pyramid has lead me to some interesting findings at the code base level. Like everyone is saying, we want to drive testing down towards the Unit tests because they are foundational, discrete and cheapest. To do this, it means that I need to create units that can be tested without boundary crossing. For an asp.net MVC (just like Rails), this means that I can unit test (with the aid of isolation frameworks):

    – models and validations (particularly using MotherObjects)
    – routes coming in
    – controller rendering of actions/views
    – controller redirection to actions/views
    – validation handling (from errors from models/repositories)
    – all my jQuery plugin code for UI-based rendering
    – any HTML generation from HtmlHelpers (although I find this of little value and brittle)
    – any of course all my business “services”

    I am always surprised at how many dependencies I can break throughout my application to make unit tests – in all of these cases I don’t not need my application to be running in a webserver (IIS or Cassini). They are quick to write, quick to fail. They also require additional code to be written or libraries to be provided (eg MvcContrib Test Helpers).

    For integration tests, I now find that the only piece of the application that I still requires a dependency is the connection to the database. Put more technically, I need to check that my repository pattern correctly manages my object’s lifecycle and its identity; it is also ensuring that I correctly code the impedance mismatch between the object layer of my domain and relational layer of the database. In practice, this is ensuring a whole load of housekeeping rather than business logic: eg my migrations scripts are in place (eg schema changes, stored procs); my mapping code (eg ORM) and that the code links all this up correctly. Interestingly, I now find that this layer in terms of lines of code is less than the pyramid suggests because there is a lot of code in a repository service that can be unit tested – it is really only the code that checks identity that requires a real database. The integration tests left tend then to map linearly to the CRUD functions. I follow the rule, one test per dependency. If my integration tests get more complicated it is often time to go looking for domain smells – in the domain driven design sense I haven’t got that bounded context right for the current state/size of the application.

    For the top layer, like others I see it as the end-to-end tests and it covers any number of dependencies to satisfy the test across scenarios.

    I have also found that there are actually different types of tests inside this layer. Because it is web application, there is the smoke test – some critical path routes that show that all the ducks are lined up – selenium, watir/n and even Steve Sanderson’s MVCIntegationTest are all fine. I might use these tests to target parts of the application that are known to be problematic so that I get as earlier a warning as possible.

    Then there are the acceptance tests. This is where I find the most value not only because it links customer abstractions of workflow with code but also as importantly because it makes me attend to code design. I find that to run maintainable acceptance tests you need to create yet another abstraction. Rarely can you just hook up the SUT api and it works. You need setup/teardown data and various helper methods. To do this, I explicitly create “profiles” in code for the setup of data and exercising of the system. For example, when I wrote a Banner delivery tool for a client (think OpenX or GoogleAds) I needed to create a “Configurator” and an “Actionator” profile. The Configurator was able to create a number banner ads into the system (eg html banner on this site, a text banner on that site) and the Actionator then invoked 10,000 users on this page on that site. In both cases, I wrote C# code to do the job (think an internal DSL as a fluent interface) rather than say in fitnesse.

    Why are these distinctions important? A few reasons. The first is that the acceptance tests in this form are a test of the design of the code rather than the function. I always have to rewrite parts of my code so that the acceptance tests can hook in. It has only ever improved my design such as separation of concerns and it often has given my greater insight into my domain model and its bounded contexts. For me, these acceptance tests are yet another conversation with my code – but by the time I have had unit, integration and acceptance test conversations about the problem the consensus decision isn’t a bad point to be at.

    Second is that I can easily leverage my DSL for performance testing. This is going help me in the non-functional testing (or the fourth quarter of the Test Quadrants model).

    Third is that this is precisely the setup you need for a client demo. So at any point, I can crank up the demo data for the demo or exploratory testing. I think it is at this point that we have a closed loop: desired function specified, code to run, and data to run against.

    Hopefully, that all makes some sense. Now back to thinking about the underlying assumptions of what is going on at each layer. I think we are still not clear on what we really testing at each layer in the pyramid: most tend to be around the physical layers, the logical layers or the roles within the team. For example, some are mapping it to the MVC particularly because the V maps closely to the UI. Others are staying in a traditional unit, functional and integration partly because the separation of roles within a team.

    I want to suggest that complexity is a better underlying organisation. Happy to leave the nomenclature alone: the bottom is where there are no dependencies (unit), the second has one dependency (integration) and top have as many as you need to make it work (system). It seems to me that the bottom two layers require you to have a very clear understanding of your physical and logical architecture expressed in terms of boxes and directed lines ensure that you test each line for every boundary.

    If you look back to my unit tests it identified logical parts of the application and tested at boundaries. Here’s one you might not expect. The UI is often seen as a low value place to test. Yet, frameworks like jQuery suggest otherwise and breakdown our layering: I can unit test a lot of the browser code which is traditionally seens as UI layer. I can widgetize any significant interactions or isolate any specific logic and unit test this outside the context of the application running (StoryQ has done this).

    The integration tests tested across a logical and often physical boundary. It has really only one dependency. Because there is one dependency the nature of complexity here is still linear. One dependency equals no interaction with other contexts.

    The top level is all about putting it together so that people across different roles can play with the application and use complex heuristics to check its coding. But I don’t think the top level is really about the user interface per se. It only looks that way because the GUI is most generalised abstraction that we believe that customers and testers believe that they understand the workings of the software. Working software and the GUI should not be conflated. Complexity at the top-most level is that of many dependencies interacting with each other – context is everything. Complexity here is not linear. We need automated system testing to follow critical paths that create combinations or interactions that we can prove do not have negative side effects. We also need exploratory testing which is deep, calculative yet ad hoc that attempts to create negative side effects that we can then automate. Neither strategy aspires for illusive, exhaustive testing – or as JB Rainsberger argues – which is the scam of integration testing.

    There’s a drawback when you interpret the pyramid along these lines. Test automation requires a high level of understanding of your solution architecture, its boundaries and interfaces, the impedance mismatches in the movement between them, and a variety of toolsets required to solve each of these problems. And I find requires a team with a code focus. Many teams and managers I work with find the hump of learning and its associated costs too high. I like the pyramid because I can slowly introduce more subtle understandings of the pyramid as the team gets more experience.

    Cheers todd

  17. Mike Cohn says:

    Hi Todd–
    Thanks for the detailed comments. I can see the value in your renaming of the levels. And I agree this pyramid can be used in various ways to help teams at different points in their learning about testing. The first time I drew it was for a team I wanted to educate about UI testing. They were trying to do all testing through the UI and I wanted to show them how they could avoid that. Perhaps since that was my first use it, that’s the use I’ve stuck with most commonly.

Leave a Reply