|
November
/ December 2002 Feature Article
|
|
Common Mistakes
in Test Automation Abstract: Automating the execution of tests is becoming more and more popular as the need to improve software quality amidst increasing system complexity becomes ever stronger. The appeal of having the computer run the tests in a fraction of the time it takes to perform them manually has led many organisations to attempt test automation without a clear understanding of all that is involved. Consequently, many attempts have failed to achieve
real or lasting benefits. This paper highlights a few of the more common
mistakes that have contributed to these failures and offers some thoughts
on how they may be avoided. Keywords: Automation, Test Automation, Testing, Testware. 1. Confusing automation and testing Testing is a skill. While this may come as a surprise to some people it is a simple fact. For any system there are an astronomical number of possible test cases although we have time to run only a very small number of them. Yet this small number of test cases is expected to find most of the bugs in the software, so the job of selecting which test cases to build and run is an important one. Both experiment and experience has told us that selecting test cases at random is not an effective approach to testing. A more thoughtful approach is required if good test cases are to be developed.
For an effective and efficient automated set of tests (tests that have a low cost but a high probability of finding bugs) you have to start with the raw ingredient of a good test set, a set of tests skilfully designed by a tester to exercise the most important things. You then have to apply automation skills to automate the tests in such a way that they can be created and maintained at a reasonable cost. 2. Believe capture/replay = automation Capture / replay technology is indeed a useful part of test automation but it is only a very small part of it. The ability to capture all the keystrokes and mouse movements a tester makes is an enticing proposition, particularly when these exact keystrokes and mouse movements can be replayed by the tool time and time again. The test tool records the information in a file called an automated test script. When it is replayed, the tool reads the script and passes the same inputs on to the software under test (SUT) that usually has no idea it is a tool controlling it rather than a real person sitting at a computer. In addition, the test tool generates a log file, recording precise information on when the replay was performed and perhaps some details of the machine. Figure 2 depicts the replay of a single test case. Figure 2 For many people this is all that is required to automate tests. After all, what else is there to testing but entering a whole series of inputs? However, merely replaying the captured input to the SUT does not amount to performing a complete test. There is no verification of the results. How will we know if the software generated the same outputs? If the tester is required to sit and watch each test being replayed he or she might as well have been typing them in since they are unlikely to be able to keep up with the progress of the tool, particularly if it is a long test. It is necessary for the tool to perform some checking of the output from the application to determine that its behaviour is the same as when the inputs were first recorded. This implies that as well as recording the inputs the tool must record at least some of the output from the SUT. But which particular outputs should be recorded? How often should the outputs be recorded? Which characteristics of the output should be recorded? These are questions that have to be answered by the tester as the inputs are captured, depending on the particular test tool in use, during a replay. Alternatively, the testers may prefer to edit the script, inserting the required instructions for the tool to perform comparison between the actual output from the SUT and the expected output now determined by the tester. This pre-supposes that the tester will be able to understand the script sufficiently well to make the right changes in the right places. It also assumes that the tester will know exactly what instructions to edit in the script, their precise syntax, and how to specify the expected output. In either approach, the tests themselves may not end up as particularly good tests. Even if it was thought out carefully at the start, the omission of just one important comparison or the inclusion of one unnecessary or erroneous comparison can destroy a good test. Such tests may never spot that important bug or may repeatedly fail good software. Scripts generated by testing tools are usually not very readable. Will the whole series of individual actions really convey what has been going on and where comparison instructions are to be inserted? Scripts are written in a programming language so anyone editing them has to have some understanding of programming. Also, while it may be possible for the person who has just recorded the script to understand it immediately afterwards, after some time has elapsed or for anyone else this may be more difficult. Even if the comparison instructions are inserted by the tool under the testers control, the script is likely to need editing at some stage in its life. This is most likely when the SUT changes. A new field here, a new window there, will soon cause untold misery for testers who then have to review each script looking for the places that need updating. Of course, the scripts could be re-recorded but this defeats the objective of recording them in the first place. Recording test cases that are performed once manually so they can be replayed is a low cost way of starting test automation. That is probably why it is so appealing to those who opt for this approach. The cost of maintaining automated scripts created in this way becomes prohibitive as soon as the software changes. If we are to minimise maintenance costs, it is necessary to invest more effort up front implementing automated scripts. Figure 3 depicts this in the form of a graph.
Figure 3 3. Verify only screen based information Testers are often seen in front of a computer screen so it is perhaps natural to assume that only the output to the screen by the SUT is checked. This view is further strengthened by many of the testing tools that make it particularly easy to check information that appears on the screen both during and after a test has been executed. However, this assumes that a correct screen display indicates success, but it is often the output that ends up elsewhere (in an output file or a database for example) that is more important. Just because information appears on the screen correctly it does not always guarantee that it will be recorded elsewhere correctly. For good testing it is often necessary to check these other outputs from the SUT. Perhaps not only the files and database records that have been created and changed, but also those that have not been changed and those that have (or at least should have) been deleted or removed. Checking some of these aspects of the outcome of a test (rather than merely the output) will make tests more sensitive to unexpected changes and help ensure that more bugs are found. For good testing it is often necessary to check these other outputs from the SUT. Perhaps not only the files and database records that have been created and changed, but also those that have not been changed and those that have (or at least should have) been deleted or removed. Checking some of these aspects of the outcome of a test (rather than merely the output) will make tests more sensitive to unexpected changes and help ensure that more bugs are found. Without a good mechanism to enable comparison of results other than those that appear on the screen, tests that undertake these comparisons can become very complex and unwieldy. A common solution is to have the information presented on the screen after the test has completed. This is the subject of the next common mistake. 4. Use only screen based comparisons Many testing tools make screen based comparisons very easy indeed. It is a simple matter of capturing the display on a screen or a portion of it and instructing the tool to make the same capture at the same point in the test and compare the result with the original version. As described at the end of the previous common mistake, this can easily be used to compare information that did not originally appear on the screen but was a part of the overall outcome of the test. However, the amount of information in files and databases is often huge and to display it all on the screen one page at a time is usually impractical if not impossible. Thus, compromise sets in. Because it becomes so difficult to do, little comparison of the tests' true outcome is performed. Where a tester does labour long and hard to ensure that the important information is checked, the test becomes complex and unwieldy once again, and worse, very sensitive to a wide range of changes that frequently occur with each new release of the SUT. Of course, this in turn adversely impacts the maintenance costs for the test. In one case, I came across a situation where a PC-based tool vendor had struggled long and hard to perform a comparison of a large file generated on a mainframe computer. The file was brought down to the PC one page at a time where the tool then performed a comparison with the original version. It turned out that the file comprised records that exceeded the maximum record length that the tool could handle. This, together with the length of time the whole process took caused the automated comparison of this file to be abandoned. In this case, and many others like it, it would have been relatively simple to invoke a comparison process on the mainframe computer to compare the whole file (or just a part of it) in one pass. This would have been completed in a matter of seconds (compared with something exceeding an hour when downloaded to the PC). 5. Let testware architecture evolve naturally Like a number of other common mistakes, this one isn't made through a deliberate decision (by choice); rather, it is made through a lack of understanding. The problem that is commonly and unwittingly ignored is not having a consistent and well organised home for all the data files, databases, scripts, expected results, etc. Everything that makes up the tests and is required to run them, the results from their execution, and other information comprise the 'testware'. Where and how these artefacts are stored (e.g. grouped by test case, grouped by artefact type, or not grouped at all) is called the testware architecture. There are three key issues to address: scale, re-use, and multiple versions. Scale is simply the number of things that comprise the testware. For any one test there can be several (ten, fifteen or even twenty) things (files) that are unique (files and records containing test input, test data, scripts, expected results, actual results and differences, log files, audit trails and reports). Figure 4 depicts one such test case.
Figure 4 Re-use is an important consideration for efficient automation. The ability to share scripts and test data not only reduces the effort required to build new tests but also reduces the effort required for maintenance. But, re-use will only be possible if testers can easily (and quickly) find out what there is to re-use, quickly locate it, and understand how to use it. I'm told that a programmer will spend no more than two minutes looking for a re-useable function before he or she will give up and write their own. I'm sure this behaviour applies to testers and that it may be a lot less than two minutes. Of course, while test automation is implemented by only one or two people this will not be much of a problem, at least while those people remain on the automation team. But once more people become involved, either on the same project or on other projects, the need for more formal testware architecture (indeed a standard / common architecture) becomes much greater. Multiple versions can be a real problem in environments where previous versions of software have to be supported while a new version is being prepared. When an emergency bug fix is undertaken, we would like to run as many of our automated tests as seems appropriate to ensure that the bug fix has not had any adverse affects on the rest of the software. But if we have had to change our tests to make them compatible with the new version of the software this will not be possible unless we have saved the old versions of the tests. Of course the problem becomes even worse if we have to manage more than one older version or software system. If we have only a few automated tests it will be practical to simply copy the whole set of automated tests for each new version of the software. Bug fixes to the tests themselves may then have to be repeated across two or more sets but this should be a relatively rare occurrence. However, if we have a large number of tests this approach soon becomes impractical. In this case, we have to look to configuration management for an effective answer. 6. Trying to automate too much There are two aspects to this common mistake: automating too much too soon, and automating too much, full stop. Automating too much too soon leaves you with a lot of poorly automated tests that are difficult (and therefore, costly) to maintain. It is much better to start small. Identify a few good, but diverse, tests (say ten or twenty tests, or two to three hours worth of interactive testing) and automate them on an old (stable) version of software, perhaps a number of times, exploring different techniques and approaches. The aim here should be to find out just what the tool can do and how different tests can best be automated taking into account the end quality of the automation (that is, how easy it is to implement, analyse, and maintain). Next, run the tests on a later (but still stable) version of the software to explore the test maintenance issues. This may cause you to look for different ways of implementing automated tests that avoid or at least reduce some of the maintenance costs. Then run the tests on an unstable version of the software so you can learn what is involved in analysing failures and explore further implementation enhancements to make this task easier and therefore reduce the analyse effort. The other aspect, that of automating too much, full stop, may at first seem unlikely. Intuitively, the more tests that are automated the better. But this may not be the case. Continually adding more and more automated tests can result in unnecessary duplication, redundancy, and/or a cumulative maintenance cost. James Bach has an excellent way of describing this [BACH97]. James points out that eventually the test suite will take on a life of its own, testers will depart, new testers will arrive and the test suite grows ever larger. Nobody will know exactly what all the tests do and nobody will be willing to remove any of them, just in case they are important. In this situation many inappropriate tests will be automated as automation becomes an end it itself. People will automate tests because "that's what we do here - automate tests" regardless of the relative benefits of doing so. James Bach [BACH97] reports a case history in which it was discovered that 80% of the bugs found by testing were found by manual tests and not the automated tests despite the fact that the automated tests had been developed over a number of years and formed a large part of the testing that took place. A sobering thought indeed. 7. Automating the Wrong Tests Not every test case should be automated because the benefit of automating some tests is outweighed by the cost of doing so. Indeed, some test cases cannot be automated but this fact does not stop some people trying (at high cost but with no benefit gained). Once we have some experience of automating tests it will be possible to estimate reasonably well the time it will take to automate a particular test. A crude but adequate measure of the likely savings can be calculated by multiplying the manual test effort by the number of times it is likely to be run. The decision as to which test cases to automate and which of these to automate first, has to be based on the potential pay back. That is, the extent of the benefits gained by automating one test case compared with the benefits gained by automating a different test case. The characteristics that would make a test case a likely candidate for test automation are given below:
The characteristics that would make a test case an unlikely candidate for test automation are given below:
Two further considerations are:
Where full automation is not warranted, consider partial automation. For example, it may be difficult to automate the execution of a particular complex test case but it may be possible and beneficial to automate the comparison of some of the results of the test case with the expected results. Conversely, where the execution of test case cannot be automated (say for technical reasons) it may be possible and beneficial to automate some parts of it (such as data preparation and clear-up). 8. Conclusion Appreciating that automation is a separate task from testing is important for successful test automation. Automation is neither easy nor straightforward; it has to be worked at and is rarely successful when undertaken as an incidental task. If insufficient resources are dedicated to automation, it will not deliver the significant benefits that are possible. Simple approaches to automation like capture/replay are a low cost way starting automation but then incur a high maintenance cost. More sophisticated approaches cost more in time and effort to start with but incur only a fraction of the maintenance costs of the simple approaches. When automating testing, automating the execution of tests is only part of the job. Verifying correctly that test cases passed or failed requires a number of important decisions to be made as to how often checks are to be made, what is to be checked, and how much of it is to be checked. If the wrong choices are made good tests can easily be compromised. Another lesson many organisations have learnt the hard way concerns testware architecture, the structure of the testware, the things we use and create when testing (such as scripts, data, expected results, etc.). A good architecture will encourage reuse (thereby reducing automated test build and maintenance costs) and be easier to work with (resulting in fewer errors being made when working with automated tests). When starting test automation, there is a huge learning curve and it is best not to automate a lot of test cases to start with since they are not likely to be as good as the ones we automate later on after we have learnt more about good practices. It is better to focus on relatively few tests, trying out different implementations and assessing their relative strengths and weaknesses before automating to automate large numbers of tests. Good test automation does take time and effort and where time is limited it is particularly important that success-threatening problems be avoided since there will be less time to backtrack and have another go. There are many pitfalls that impair or destroy well-intentioned attempts to automate testing. Knowledge of the most common ones should help organisations steer away from them and will hopefully help make them vigilant as to other problems that may similarly compromise test automation efforts. 9. References BACH97 James Bach, "Test
Automation Snake Oil" presented at the 14th International Conference
on Testing Computer Software, Washington, USA.
Mark Fewster |
<< October 2002 |
January / February 2003 >> |