In Chapter 8, “Using Models to Help Plan,” we reintroduced the agile testing quadrants. In the section on testing early, we covered Quadrant 1 and 2 tests commonly used by agile teams. In Chapter 12, “Exploratory Testing,” we outlined several approaches to exploratory testing. There’s a seemingly endless list of different types of tests that may be appropriate for your application. For example, some domains require unique approaches, such as the specialized testing techniques needed for business intelligence and data warehousing software, which we will cover along with contexts in Part VII, “What Is Your Context?” In this chapter, we’ll cover a few different types of testing other than functional testing and focus on ones we don’t normally talk about in agile teams but that we think are important.
Remember that the numbering of the Quadrants doesn’t indicate when different types of testing should be done. Q4 tests should be considered as soon as you start discussing a new feature or theme. For example, if there is a constraint that all pages in the application must respond in less than two seconds, make sure everyone on the team is aware of it. Capture that requirement as tests, and keep it in mind during development.
In Discover to Deliver (Gottesdiener and Gorman, 2012), Ellen Gottesdiener and Mary Gorman represent the quality attribute dimension in two different areas: operations and development. In Chapter 11 of Agile Testing, we covered some of the basic operational attributes that need testing, such as security, reliability, availability, interoperability, and performance. That is not a complete list, but it gives you an idea of the attributes and constraints we need to consider. The Quadrants can help you brainstorm about the different kinds of testing you need to do on your particular feature or product.
Our first Extreme Programming (XP)/agile teams back in the late 1990s and early 2000s were focused on finding out what our customers wanted and then delivering that functionality. We and our teammates tended to be generalizing generalists, and we sometimes neglected areas such as security, reliability, accessibility, performance, and internationalization—at our own peril. It became clear that agile teams must be diligent about all aspects of software quality, even if our business stakeholders don’t mention them.
New types of testing have accompanied the proliferation of platforms, devices, and technology. Many consumer products have software that must be tested. Fitness machines have embedded software. The Tesla automobile has an API! At the same time, outside forces require more test coverage. Security threats to software systems increase all the time, so we must find better ways to test security. As hardware advances in processing speed and graphical display quality, we have to ensure that our products perform well and look good.
Your team may need additional infrastructure to accommodate some types of testing. Let’s say you have a reliability requirement of no more than one failure in 1,000 transactions, or the need to do soak testing. Your team may want a second test environment to be able to drop stable code for reliability testing. You may need extra monitoring tools for watching overnight performance against minimum requirements. If you’re testing an Android app, you’ll need lots of devices in your test lab, and possibly some outside help.
We’ve heard of lists of more than 100 kinds of testing (see the bibliography in Part V, “Investigative Testing,” for an example), although we find that those lists often contain some duplication. They include the usual ones you would expect to see, such as load testing or browser compatibility testing. However, they also tend to list “types” such as “agile testing,” which really isn’t a type but an approach. When you start thinking about all the different kinds of testing, you might start to wonder when you’ll possibly have the skills, much less the time, to do it all, and it can be overwhelming to new testers. One of the problems we see is the vocabulary misalignment with so many different types. For example, when new people come into the organization and talk about integration tests, everyone else assumes they share the same definition, when perhaps to them, integration testing has a totally different meaning.
We suggest keeping your list of test types as simple as possible and creating a common understanding within your organization. You might start with the types that Wikipedia lists (Wikipedia, 2014l) and differentiate among types, levels, and methods. Bernice Niel Ruhland (Ruhland, 2014) maintains a list of testing types specific to her team, which includes an explanation of how her teams do or do not use a technique. She finds keeping a list helpful for onboarding new testers, and it provides a standard vocabulary shared by everyone in the department. Here’s one example from Bernice’s list of test types:
End-to-end testing is used to test whether the flow of an application or product is performing as designed from start to finish. It typically represents real-world use cases by walking through the steps similar to the end user. The purpose of performing end-to-end tests is to identify system dependencies and to ensure that the correct information is passed between database tables and the application’s modules and feature. We typically perform end-to-end testing for custom-development projects.
Let’s look at a few different types of testing that people often neglect or struggle with on their agile teams. This is not meant to be an all-encompassing list. The goal is to make sure that we remember to do certain important types of testing that often get overlooked in the rush to deliver new features frequently. In Chapter 18, “Agile Testing in the Enterprise,” we will look at how some of these crosscutting concerns affect development organizations with multiple teams.
Web and mobile applications enhance usability by providing easy ways to make updates, such as dragging and dropping items. When multiple users are looking at the same view of data, we need to verify that when one user makes a change, the other users see that change. If two users update the same piece of data at the same time, who “wins”?
Lisa’s Story
The ability for one person to see updates made by another user in real time is an important feature of our product. For our new autosave feature, where changes to an input field should persist as soon as focus moves out of the field, testing concurrent changes was a must. We recognized this as a high-risk area, since users want to feel certain their changes are persisted. We had automated regression tests for concurrent updates but wanted to explore this area in more detail because we weren’t feeling confident about the feature quality.
We scheduled what our team calls a “group hug,” where multiple team members test at the same time, as described in Chapter 12, “Exploratory Testing.” We time-boxed this session to one hour and wrote up test charters and scenarios in advance. These charters captured normal use as well as extreme worst-case scenarios.
We typed notes on the bugs we found and other observations in a shared document. Learning about a bug that one person found often inspired me to think up another test to try.
Here’s a sampling of issues we noted that would be harder to find while testing individually:
• Changing an epic label when someone else has the epic open, the user who changes sees an error and the label blanks out.
• Starting or moving a story with someone else updating the description causes overwriting and disappearing history.
One interesting discovery was that we couldn’t reliably reproduce some problems, which pointed to timing issues. We marked it to explore later to see if there was an issue in the implementation or in the way we tested. It turned out to be the tip of the iceberg of a bad bug that spurred a major code design change.
The group hug confirmed our suspicions that the feature still needed a lot of work. Our team needed to do some redesign, coding changes, and more exploring before we could consider releasing the feature for beta test.
During the group hug, we realized we still had questions about the desired behavior of the autosave feature. This led to further discussions within the whole team, and within a few days, design improvements and development stories were under way. We knew that there would be more concurrency testing group hugs for this feature.
If your product risks losing updates when simultaneous updates occur or when users need to see updates by other users in real time, concurrency is an important area to think about before coding, but also to cover in your exploratory testing. Solutions for this type of requirement often involve caching updates, which can lead to both performance and transaction integrity issues. Automated regression tests for these scenarios are essential because it is difficult to test them manually; however, timing issues can be subtle and hard to find. Experiment with creative ways to test your features in the same way that customers will use them in production. Take advantage of regular events; for example, check your system’s response time while the Olympics are being streamed live.
With global markets come global challenges. Many organizations must support multiple languages due to changing business needs, and that creates challenges for agile teams. In traditional projects, the application is built, and then the strings that are to be translated are sent off to experts. The translators have the full context of the application, so the translations are generally correct. In agile teams, we have quick releases with the possibility of releasing to the customer every iteration or perhaps even continuously. Traditional methods do not work in that environment.
If your product has a global clientele, do what you can to avoid frustrating customers who use other languages and character sets. Support globalization (g11n), which includes internationalization and localization, with tests that guide development, exploratory testing, and perhaps other types of testing, such as linguistic testing. Internationalization is a constraint that developers must consider as they are coding and incorporate into every story. For example, it would include encoding, formatting, and externalizing strings across the code base. If you are working on legacy code, perhaps you have stories specifically to address some of the i18n requirements.
Localization is the translation piece, not only of the language itself, but often of nuances for specific locales. Terminology needs to be established, but perhaps all of it doesn’t need to be done up front. The point when teams break features into stories might be a good time to think of the terms that should be used. Maybe the first story is for an analyst to determine what words will be used for consistency across the product. Frameworks to support localizing and translating can speed up the process, but they don’t eliminate the need to verify the translations and suitability for specific cultures and regions.
In Figure 13-1, Lisa shows the terminology for the parts of a donkey harness. To Janet, it seems like a foreign language, but at least now we have common terms and can point to the same piece of harness and mean the same thing. Even writing this book, we had to compromise between EN-US and EN-CA spellings. Since our audience is global, we tried to avoid using slang and colloquialisms.
Your organizational culture may have a large impact on how you approach localization on your team. As Paul Carvalho mentions in his story, having local support for languages is ideal. However, that may not always be possible, since it is expensive to have many experts available at all times for translations. There are alternative solutions that may allow you to take advantage of fast (or faster) feedback than is usual on a phased-and-gated-style project. Perhaps you can use machine translations, which are getting better all the time. The trade-off is the need for significant post-editing to make sure you catch the errors in the automated conversions.
Another approach might be a hybrid, where agile teams create a “drop” for translators. This drop would include a whole feature, rather than specific strings to translate from each story, and would allow translators to have some context for their translations. If your release cycle is six months long, perhaps a drop of every four weeks could give you fast enough feedback to correct any mistakes before the end game, while minimizing the overhead of translations. Figure 13-2 shows this alternative. It is all about balancing speed with quality, time, and functionality. Remember, if you are using third-party vendors, have agreed-upon time frames and expectations, and take advantage of automation in file drops for consistency in the process.
It is a different mindset to embrace rapid release cycles, and we encourage teams to think about how they can make changes to their process to get fast feedback. Perhaps, just perhaps, the local experts exist on some of your global teams.
First, we give a quick definition of what we mean by regression testing because it can be a contentious term. To us, regression tests are those tests that run regularly to give you confidence that changes made to the code do not affect existing functionality unexpectedly. We believe that automated checks can do this with the fastest feedback. At a minimum, these should run nightly.
Now that many teams release once a week, several times per week, or several times per day, regression testing is an even bigger challenge. Even if you automate your regression tests, they may not all run fast enough to complete in time for the next release.
Many companies faced with this problem turn to “testing in production.” Seth Eliot has written and presented extensively on this subject. He defines testing in production this way (Eliot, 2012):
Testing in production (TiP) is a set of software testing methodologies that utilizes real users and production environments in a way that both leverages the diversity of production, while mitigating risks to end users. By leveraging the diversity of production we are able to exercise code paths and use cases that we were unable to achieve in our test lab, or did not anticipate in our test planning.
Lisa’s team has good coverage from automated test suites running in the continuous integration system at all levels: unit, functional, and user interface (UI). They also spend lots of time doing exploratory testing, but it’s hard to cover every scenario. For new major versions, they use a TiP approach. They enable the new version for a small percentage of users and monitor production logs to see if users are experiencing errors. As they identify new defects, they fix them as needed. They have a rollback plan to disable the feature if there are unacceptable results. When the new features appear to be stable, the team enables them for more users, continuing to watch logs carefully, until all users are able to use them.
As much as we automate regression tests, there may be tests that are difficult to automate in a way that provides proper feedback, and other tests that are too costly to automate. It’s also a challenge to do manual regression tests for quality attributes such as look and feel when releasing so frequently. The testers on Lisa’s team created wiki pages with manual regression test checklists for different areas of the product based on risk. Often, programmers use the checklists to do the manual regression testing, or at least help with it. The team regularly audits the checklists to see if any tests can be or have been automated. They keep the lists as concise as possible, focused on the high-risk areas. In Chapter 12, “Exploratory Testing,” there were a couple of examples of managing regression testing with session-based test management or thread-based test management if you do not have automation. How you manage your regression suite will depend on your context, the risks within your system, and how often you release to production.
User acceptance testing (UAT) is part of Quadrant 3, business-facing tests that critique the product. We describe user acceptance testing as “making sure the actual users can do their job.” UAT may be performed by product managers, but preferably it’s done by the actual end users of the product. As we explained in Agile Testing, UAT is often done as part of a post-development testing cycle (pp. 467–68), but this does not mean it has to be left to the end game. Janet encourages companies she works with to think of ways to bring it earlier into the development cycle.
Janet’s Story
I started with a team that was on three-month delivery cycles, which should have meant that they released to the customer every three months. However, the UAT testing took another six weeks, so new features weren’t actually in production for almost six months. I found out that the reason for such a long UAT cycle was that the person (I’ll call her Betty) doing the testing had to do it in addition to her own job. She had to “fit it in” around her regular duties, and six weeks was how long it took.
I suggested that we have Betty sit with the team at the end of each two-week iteration and play with the new features we were developing. I paired with her for a while to show her the features and then let her be. At the end of the three months when we were ready to deliver the new features, Betty asked for only three weeks (instead of the usual six) for UAT.
This was a substantial improvement, but I wanted it to be even better. We got her a workstation in the team work area, and Betty started coming in every Friday afternoon. She gained confidence in our work and what we were delivering. At the end of that release, we included one full dedicated day of UAT and were able to put the release into production within the three-month delivery cycle.
Lisa’s current team develops a software-as-a-service (SaaS) product that is also used internally by the entire company. This provides a great opportunity to release new features for internal-only beta and get feedback from actual users who happen to be in the same company. Problems with real-life use are identified and fixed before the new features are made available to paying customers.
Understand your customers, your real users, and brainstorm ways to get them to use the system to make sure they can do their job and use the system appropriately. Customers are the ultimate judges of software value. End users are probably the best people to critique your product.
A/B or split testing is often used in lean startup products or in existing products that are changing their look. It is a different type of testing in that it validates a business idea, so in some ways it is a Q2 type of test. However, it is done in production by real customers, so it’s really a way of critiquing the product.
The idea is to develop two distinct implementations, each representing a different hypothesis about user behavior, and put them out for production customers to use. For example, you can move UI elements around or change the steps of a UI wizard. The company monitors statistics on which customers “click through” and which leave right away. In this way, companies can base decisions on real results and can improve their applications based on continued A/B experiments. When appropriately done, A/B testing can help companies make decisions about everything from user experience (UX) design to pricing plans.
A/B testing is definitely not limited to agile projects, but it fits agile values well. The goal of A/B testing is to identify which changes to your website will have the greatest impact. This type of testing is all about iterating with fast feedback to select a design or achieve and maintain other quality characteristics that help the business achieve its goals. If you think your team might benefit from A/B testing, check the links in the bibliography for Part V to learn more.
User experience (UX) designers have many ways to get feedback on functional website design and designs in progress. They use methods commonly found in industrial design and anthropology to test designs and design concepts before producing anything on screen or even on paper.
User experience testing is an example of a type of testing that can fit into more than one of the agile testing quadrants. You can get feedback from users before any coding is started, using paper prototypes and other techniques such as the ones Drew describes. This is a form of Quadrant 2 testing, creating customer-facing examples and information that will guide development. You can also do usability testing after coding is complete or monitor the product already in production to learn whether the current design and functionality are adequate. Those are examples of Quadrant 3 activities, evaluating the software from a business and user perspective. Feedback from this may result in new features and stories to be done in the future.
Nordstrom Labs recorded its in-store innovation testing efforts for an iPad app that would help customers choose eyeglass frames (Nordstrom, 2011). The video shows testing activities ongoing throughout iterations that lasted minutes rather than days. Designers and testers on Lisa’s team have done usability testing with both internal company users of a product and people outside the company, taking advantage of user group meet-ups. Look for ways to test with real end users rather than speculate about how they will use a particular feature.
Sit with your product’s existing users and see where they struggle. Take your new design to your local coffee shop and see what people think of it. Involve current and desired customers early and often. Chapter 20, “Agile Testing for Mobile and Embedded Systems,” has a bit more on user experience but focuses more on how it relates to mobile apps. Check the Part V bibliography for more links on these different types of testing.
Don’t repeat the mistakes of our early XP/agile teams and focus exclusively on functional testing. In this chapter, we discussed some of the types of testing that are outside the scope of what is generally known as functional tests. Many of these tests can be done with an exploratory approach.
Use the Quadrants to think about all the different types of testing your product requires.
Talk with your business stakeholders to learn their expectations for attributes such as stability, performance, security, usability, and other “ilities.”
Concurrency testing is key for products whose users may update the same data simultaneously. Use both automated and exploratory tests to ensure that updates are reflected correctly and in a timely manner.
Internationalization and localization testing requires looking at cultural differences as well as languages and character sets. Specialists in this field are needed, just as many software products require security, performance, or UX testing experts.
Testing (monitoring) in production is one approach to finding defects that are missed by checking and exploring during development.
Completing user acceptance testing during development, rather than after, shortens the UAT cycle needed during the prerelease end game.
A/B testing is one way to get fast feedback from production users about aspects of the application design, using a series of experiments, each one building on what was learned from the last.
Usability testing can be done simply and productively with paper prototypes and conversations with users to help your team refine your designs and features.