December 2000 Feature Article

Testing For Software Reliability

This article introduces different software testing techniques that have been used to evaluate software reliability.

Testing is the process of executing a program with the intention of finding design errors in a given environment. Testing can only prove the incorrectness of software but not it's correctness. Some people have argued that testing is useless because what we want is the correctness of software instead of its incorrectness. This idea has led to the research of techniques for developing correct software and proving software correctness. It has been defined that a program is correct, if it meets its specification for all valid inputs. The reliability of a program is 1 if it is correct and 0 of it is incorrect, taken into account that a correct program will not be worn out.

Unfortunately, progress in this direction is far from the stage where techniques for developing error-free software or proving correctness can be used in practice to handle today's large software systems. Obviously, a more strict development process can reduce the possibility of design errors, but it can never guarantee 100% error-free software, as long as humans engineer software.

Since none of today's large software systems are error-free, it does not make sense to define software reliability according to the correctness. A more practical approach is to define software reliability as the probability that no failure (an incorrect output) will occur in a specified environment, during a specified exposure period. For some programs the appropriate time unit of exposure is the calendar of CPU time, and for other programs the appropriate time unit of exposure period is an operational run corresponding with the selection of an input from the input domain of the program.

This definition implies that software reliability is evaluated during the execution of a program in a specific environment either during testing or during operation. Different kinds of reliability models are available to evaluate software reliability. According to the nature of failure process, Goel [1] classifies reliability models into four kinds:

  • input domain based models

  •  
  • times-between-failure models

  •  
  • failure-count models

  •  
  • fault seeding models

  •  

Random Testing The first software reliability model is input-domain based with random selection of inputs, i.e., it randomly generates inputs for testing. Assume that n inputs have been used for testing in which ƒ tests have detected failures. The reliability of the program under testing in an operational run is evaluated by:

The reliability in t operational runs is then:

The problem with random testing is that we do not know whether all parts of the program have been tested. If the random generation of inputs is uneven for some reason, we may end up testing only a small part of the program, no matter how many tests have been executed.

Partition Testing

Partition testing tries to address the problem of random testing by dividing the input domain into m classes. The critical part of partition testing is to divide the input domain in such a way that inputs in different classes would execute different parts of the program. For example, we can put all inputs that will execute the same path in program flowchart into one class.

Suppose ni inputs are generated from class ti where ni = 1, and fi failures are detected when these inputs are applied for testing, then the reliability of the program under testing is evaluated by:

Normally, partition testing is conducted with rounds of testing. In each testing round, m tests are executed with one input selected from each class. Input selection within the class is still random. After k rounds of testing is conducted, the reliability is:

Where gj is the number of failures detected in round J. Let's consider the two extreme cases of partition testing. Case 1: If the number of classes is equal to 1, partition testing is in fact random testing. Case 2: If the number of classes is equal to the number of inputs, i.e., each input forms a class, partition testing is equivalent to exhaustive testing that uses all possible inputs for testing.

Testing and Operational Profiles

The most generic way of generating inputs for testing is to use a probability density function on the input domain to guide the input generation. In other words, each input in the input domain is associated to a probability with which the input is selected for testing. The sum of probability values associated with all possible inputs is equal to 1, therefore the probability distribution forms a probability density function. This function is called the testing profile.

Similarly, during the application (operation) of a program, there is a probability that associates with each input. The probability distribution forms a density function, which is called the operational profile of a program.

The design of a testing profile should be guided by the operational profile of the program under testing to ensure that the heavily used operations receive the most testing. Musa [2] described a step-by-step procedure to develop a practical operational profile for testing purposes.

In some cases it is not easy to obtain or to follow the actual operational profile. Operational usage of software can be unpredictable, unknown, or different for different users. Even if a testing profile is given, testing may not be able to follow the given profile due to the non-determinism of input generation to, say, cover a certain program path. This problem has been addressed by some researchers through theoretical and experimental approaches to sensitivity study of operational profile errors and how to correct the errors [3, 4].

Reliability Growth with Error Corrections

Techniques discussed above do not include the effect of error corrections performed during testing. A program is considered as a new program and has to be re-tested and re-evaluated after each error correction. The techniques that take error correction history into account are called reliability growth models. The idea of these models is to measure the change of time intervals between failures when error corrections are performed

These kinds of models are much more complex than the models that do not deal with error corrections. Current reliability growth models cannot accurately predicate the reliability of software in its operation. Researchers and testing practitioners worldwide are working together to find an ultimate solution.

Reliability is an important aspect of software quality. A more comprehensive merit is called dependability, which includes reliability, availability, safety and security. The Programme for Highly Dependable Systems is a research group working on software and system dependability. We investigate all aspects that can improve the dependability of computer systems. We are working on formal methods for software specification and verification, software testing and dependability evaluation. Our current projects are related to building reliable and secure network software, including task allocation, traffic redirection and firewall systems..

Yinong Chen

References:

1. Goel85 A.L. Goel, Software reliability models: assumptions, limitations, and applicability, IEEE, Trans. Soft. Eng., SE-11, No. 12, Dec 1985, pp. 1411 - 1423.

2. J.D. Musa, Operational Profiles in Software Reliability Engineering, IEEE Software, Vol. 10, No. 2, March 1993.

3. Y. Chen, Modelling Software Operational Reliability via Input Domain-Based Reliability Growth Model, IEEE 28th Annual International Symposium of Fault-Tolerant Computing, Munich, June 1998, pp. 314 - 323.

4. A.N. Crespo, P. Matrella, A. Pasquini, Sensitivity of reliability growth models to operational profile error, in Proc 7th International Symposium on Software Reliability Engineering, IEEE Computer Society Press, October 1996, pp 35 - 44.

<< November 2000 January 2001 >>