It’s time to move past the nuts of bolts of builds and Continuous Integration (CI) and focus on closing an easily overlooked gap in the age of Agile: the gap between how things should work versus how they actually work.
In recent years, a movement has been brewing to do away with Quality Assurance (QA) altogether. The argument is generally that the Agile focus on unit tests guarantees that software works as intended in production. This is problematic for many reasons. First, unit tests often check only what developers think to test. Sometimes, it can take a fresh set of eyes and a different perspective to spot bugs.
Further, unit tests can check many things, but are by nature designed to run in isolation and execute quickly. This means they often mock database and other such connections. They’re great for validating individual bits of code, but are a lousy solution for other testing needs.
Continuous Testing (CT) is such a broad field, often varying widely from industry to industry — even project to project — that we’re not going to build a lot of specific machinery. Instead, in this article, you’ll select the strategies and tools best suited to your own circumstances.
High-level Testing Strategies
Let’s begin with the broadest outlines of high-level testing strategies. Your first line of defence is static testing. This takes its name from the fact that your code remains “at rest.”
Having a second pair of eyes often catches things that would have been missed first time around. Techniques such as pair programming, mandatory code reviews, static analysis tools, and so forth can be useful in verifying that the code “looks right” for its intended purpose. You may know this by another name, as static testing is often considered verification.
But of course, any static testing is limited by the simple fact that it doesn’t involve executing the code. Even the best musicians and composers gain a new understanding of a piece when it’s performed, and the same is true of software.
That’s why executing code, or dynamic testing, is also critical for success. And you may know this by another name: dynamic testing is often considered validation. In the end, your users don’t care about the code, they want the software to work as expected.
Exploring Testing Types
Dynamic testing invariably leads to a variety of important questions. First, from whose perspective do we test? The way you answer that question tells you whether you’re embracing:
- White-box Testing: leverages detailed knowledge of internals.
- Black-box Testing: focuses wholly on the “surface”, avoiding any knowledge of internal details.
- Gray-box Testing: is a mix of white- and black-box testing.
It may seem like a subtle or even useless distinction, but you’ll easily find arguments for and against each choice.
In more depth…
White-box testing can be particularly useful, because you have detailed internal knowledge. This allows you to construct test cases to cover all logical code paths. Black-box testing can be especially useful when dealing with software components. This is useful, as in the end what usually matters most is that they produce the proper outputs for a given set of inputs. And in some cases, gray-box testing may be necessary to ensure that a set of operations work as expected given some initial state (e.g., certain known or expected records in a database table or other storage).
What your project does and how you want to verify that it’s doing it correctly should guide your choice when it comes to these options. The “golden rule” we can easily recommend is:
- To pay careful attention to how much knowledge of the system’s operation is essential for high-quality tests.
- Balance that against the expectations with which you intend to saddle your users.
In a highly technical application, you can require a lot from your users. However in a dirt-simple app for all mobile phone users, it’s a completely different story.
Testing in Layers
Speaking of different stories, layers of testing are also relevant. It doesn’t make sense, for example, to test low-level driver code that a user is never going to interact with the same way you test some custom-built user-interface element that users will “abuse” in all sorts of unexpected ways. In short, there are a number of different layers, levels, or scopes at which it can be relevant to test. In order of increasing complexity, there is:
- Unit testing
- Integration testing
- Interface testing
- System testing
- Acceptance testing
We’ll talk about each of these in turn, beginning with unit testing.
As already mentioned, unit tests are intended to exercise and validate the operation of some small bit of code as quickly as possible. Developers working with object-oriented languages, for example, often create a suite (or suites) of tests to exercise each individual “class” they create. Unit tests are often white-box in nature because these are bits of code expected to be used only by developers, though black-box and gray-box testing can also prove helpful. The key point is that unit tests ensure a single thing works as intended.
In contrast, integration testing is inherently focused at a higher layer/level. It ensures that the classes, modules, components, etc. that comprise some larger logical unit of software work together. In effect, integration testing by definition seeks to ensure that multiple things work together as intended.
Interface testing is similar to integration testing. It typically involves exercising multiple things, but with a particular focus on making sure that any communications occur correctly between larger units of software. In software, ‘interface’ is typically defined as some shared boundary that divides separate components. The goal of interface testing is to ensure that all data and operations shared between those larger units of software are correct.
The next step up the ladder of testing complexity is system testing, which can be thought of as integration testing on a larger scale. Instead of talking about bringing several lower-level bits of code (often components) together and making sure they work, system testing aims at validating the operation of some higher-level unit of software that can be thought of a thing in its own right.
The example of client-server software illustrates this. In such an architecture, a server program can typically be considered a complete system in its own right. It accepts commands or other interactions from a client, and then carries out tasks and/or responds with the proper outputs. Clients that “know” how to utilize the server can also be thought of systems in their own right.
Acceptance testing begins when you bring them together to verify that everything works properly at the very highest level possible. This is user interactions with the product as a whole. System testing shows you defects in a server, but acceptance testing shows you whether the overall process is good enough, from a user perspective, to ship.
It is common to leverage automated testing tools for the lowest three (unit, integration, and interface) testing levels. It’s often harder (and generally more expensive) to acquire good tools for system and acceptance testing. Although progress has been made in recent years, you’ll still have to get your hands dirty if you commit to higher-level testing.
Types of Testing
The following are ordered roughly in terms of commonality, highest to lowest, descending into testing obscurity as we go.
Alpha and beta testing
The most commonly known types of testing are arguably alpha testing and beta testing. By software tradition, an alpha release is a feature-incomplete version of a product or service that is ready for at least some review. Alpha testing is usually conducted internally. Typically it is only for those with the kind of high level of understanding and patience necessary to navigate the unfinished software. It can be helpful in making sure the project is on the right path before getting too far along the development calendar.
In contrast, a beta release is typically a feature-complete version of a product that is ready for at least some external review and perhaps even production use. Beta testing is usually conducted with a limited set of users, often important customers seeking special influence over product development. It provides a final gut-check of the planned feature set before a more general release of a given product or service.
Next is smoke testing, or sanity testing, which is almost the opposite of unit tests. For whereas unit tests often comprise highly specific suites that seek to exercise the individual bits that go into an application in a rigorous way, smoke testing is often involves limited high-level tests to ensure that nothing has gone horribly wrong — usually implemented from the end-user’s standpoint. Smoke testing is usually the most basic hurdle for a build to vault.
Regression testing is another very common type, focusing on making sure that some new change hasn’t broken features that were already working. This type of testing typically occurs right before a new release or when some crucial fix is performed after a disastrous report from the field. A one-line code change can cause chaos, and regression testing usually catches unintended consequences before release.
Performance testing is also common. This is particularly with software designed for high-volume applications or real-time processing. This type of testing often includes sub-types such as load testing. This validates that software performs acceptably under some large amount of concurrent work in progress. Another is stress testing, to validate that the software’s functionality will degrade gracefully under unexpected conditions (e.g., memory or storage scarcity). For the most demanding applications, real-time testing validates that a given system (often a hardware/software hybrid device) can execute tasks within a strict time limit. This can be crucial for medical devices, for example, or other high-risk products that must always deliver timely results.
A/B testing and more
Especially popular today is A/B testing. The ubiquity of the web, and its ease of deployment, have made it possible for various web applications and services to offer multiple variants of a given page or function to users in a controlled environment. This allows organizations to collect data on usage, practicality, and utility of two different ways of doing something. The less desirable can be easily retired once the data is gathered. This kind of testing improves perceived user value and prevents bad design and implementation decisions by utilizing real-time user feedback.
And of course the list goes on. Install/uninstall testing verifies that a product may be added or removed to/from a computer or user account. Security testing comes in many shapes and forms. From checking for known exploits (e.g., SQL injection) to hammering randomly at various interfaces to see what breaks. Internationalization testing focuses on accurate linguistic translations as well as cultural norms and standards (e.g., currency formats, calendar representations, etc.).
The Reality of Testing
It’s simply not possible to test everything thoroughly. Testing is often only partially directed at validating proper function. Its more important role , particularly in our increasingly litigious society, is often reducing risk of legal exposure and subsequently mitigating legal actions. Whatever your core concerns, do your homework and choose the testing most useful for assuaging them.
A final caveat is relevant for those wondering how truly complete testing remains impossible, even in today’s age of increasing computer automation. It’s not difficult if you consider how thoroughly insurmountable the problem really is.
Many lines of code involve potential logical branching; i.e., making a decision based on some data and choosing an alternate path of subsequent execution as a result. Even relatively trivial software can involve hundreds of thousands of lines of code. So the sheer number of total execution paths to test is enormous.
Beyond lines of code
And that’s just execution. Consider also that for any input field on a user interface, a user can enter whatever characters are allowed. That often means the entire alphabet as well as numbers, symbols, and perhaps even extended characters or entire other character sets in the era of Unicode. In short, there is often an effectively infinite set of possible input data to test.
And that doesn’t include the variety of interactions possible between applications today or the complexity of the environments in which they execute. Cross-platform software these days operates on Windows, macOS, Linux/Unix, and even various mobile operating systems such as Android and/or iOS.
Combinatorically speaking, you put all of this together and even the proverbial infinity of monkeys are not going to be able to test your software completely prior to the heat death of our universe.
Testing is a bit like a software version of the famous Gordian knot. Unravelling all the threads is impossible. Your best bet is to trim off only the bits you care about. Choose what gives you the biggest-quality-value bang for your test-resources buck and expand from there as needed.
Thus ends our prelude to tackling CT. Next time, we’ll cover it a little more specifically, using our sample application as a guide. We’ll discuss how managing testing environments is crucial for reliable, repeatable, high-quality results.