Lecture 2 - The Purpose of Testing -> Quick Intro -> Exposing Misconceptions -> How To Make Use Of Stats For Post-Release Bugs
There are 2 misconceptions associated with the purpose of testing which you should be aware of, because they are widespread and harmful.
Misconception #1: “Testers must make sure that 100% of the software works fine.”
Here is Spec #1522:
1.0. Program froggy.py accepts user input.
1.1. Text of prompt for input: “What animal do you like more: frog or cat?”
1.2. If input is either “frog” or “cat”, the program must print on screen: “<user input> is a great animal”.
1.3. If user enters nothing and just presses “Enter” the program should print message: “You did not type anything”.
1.4. In all other cases the message should be “You did not enter the expected word”.
Here is a code of froggy.py written in Python (the text after # is a comment for the current line of code):
user_input = raw_input('What animal do you like more: frog or cat?') #Display a prompt for text input. animal_list = ['frog','cat'] #this is a list of 2 words one of which is expected to be entered if user_input in animal_list: #if user entered word that matches any element inside animal_list print user_input + ' is a great animal' elif user_input == '': #if user entered nothing and just pressed Enter print 'You did not type anything' else: #in all other cases print 'You did not enter the expected word'
Let’s list the four possible inputs:
1. ‘frog’
2. ‘cat’
3. ” (Null input)
4. Any input different from input 1 to 3 inclusively
Brain Positioning
Input can mean different things depending on the context. We’ll use an incremental approach for a smooth introduction to this and many other concepts. For now, input is data passed to a system.
From a user’s perspective, the typical inputs are:
– Text, e.g., username entered during log in.
– File, e.g., image file uploaded to a photo-sharing Web site.
Brain Positioning
There are two kinds of input:
Valid input
Invalid input
We need to know the validity of the input because software must process valid and invalid inputs differently. In the above example, valid input can be determined by looking at the user prompt: “What animal do you like more: frog or cat?” The valid inputs are “frog” and “cat.” All other inputs are invalid.
If the spec doesn’t clarify which input is valid and which input is invalid, we can send an email to the product manager and/or file a bug against the spec. If there is no spec, we can use other sources for the expected result to find out about the validity of various inputs.
Depending on the situation, Null input can belong either to valid or invalid input.
In order to verify that 100% of the software “works,” we have to create four inputs and verify four actual results against four expected results.
Test Case #1:
Input: ‘frog’
Expected Result: frog is a great animal.
Test Case #2:
Input: ‘cat’
Expected Result: cat is a great animal.
Test Case #3:
Input: <nothing>
Expected Result: You did not type anything.
Test Case #4:
Input: ‘dragon’
Expected Result: You did not enter the expected word.
Wait a minute! Why are we so sure that ‘dragon’ covers all variety of possibilities called “Any input different from input 1 to 3 inclusively”?
Our confidence comes from fact that looking at the code of the extremely simple program froggy.py, we saw that all inputs (except “frog,” “cat,” or null) will be redirected by “else” to the output “You did not enter expected word.” But, as a rule, testers don’t look into the software code at all! Why? Because:
– On the one hand, the programmer and tester usually receive the spec at the same time, so when the tester starts writing test cases, there is no code to look into.
– On the other hand, software usually has thousands of lines of code, and even programmers cannot be sure how it will actually handle each possible input.
So, let’s assume that we don’t look into the code. How can we verify that 100% of the program froggy.py “works fine”? We have to test all possible inputs in addition to inputs from Test Cases 1-3 inclusively.
BTW
If the language of our input is English, we can type
94 printable characters:
– Letters (a-z, A-Z)
– Numbers (0-9)
– Special characters (punctuation and symbols; e.g., “;” and “$”)
and 1 invisible graphic character
– Space (produced by the space bar on a keyboard)
There are also 33 other ASCII characters*, most of which are obsolete. Let’s not take them into consideration for the sake of example below.
*”American Standard Code for Information Interchange (ASCII), is a character encoding based on the English alphabet.” (Definition from Wikipedia)
If we test the condition “Any input different from input 1 to 3 inclusively” with one character, we have 95 possible inputs. But what if a user enters two characters? In that case we have 9,025 possible inputs (95×95). What about three characters? We’ll have 857,375 possible inputs; i.e., 857,375 test cases. How about ten characters? (That would be 59,873,693,923,837,890,625 possible inputs). So, even in the case of a trivial task to test user input that’s different from
‘frog’
‘cat’
”
there is a humanly unthinkable number of possible inputs! But the real pain only starts here.
The basic concept of the computer program is: Input -> Processing ->Output.
We can look at this analogy:
When somebody calls your cell phone, it’s an Input ->
When you look at the caller ID and decide to take the call or not, this is Processing ->
And then, depending on your decision, you either pick up or you don’t, it’s an Output.
In order to test “Processing,” certain Input must be fed into the program, so we can verify the Output. To produce Output, Processing takes certain logical routes or paths (i.e., “If this is my wife, I’ll pick up. Otherwise I won’t.”). In our simple case, there are 3 paths:
Path 1: If the Input is ‘frog’ or ‘cat’, Processing produces the Output ‘<animal> is a great animal’
Path 2: If the Input is ”, Processing produces the Output ‘You did not type anything’.
Path 3: If the Input is not ‘frog’, ‘cat’, or ”, Processing produces the Output ‘You did not enter the expected word”.
In case of the worthless program froggy.py, each path is short, simple, and straightforward. In the case of real software, e.g.; software known as the “Google search engine”, Processing is much more complex.
Example
Imagine that you are hiking in the park. You’re walking along the trail, and you see a fork with two trails: “Montebello” on the left and “Skyline” on the right. You take the trail called “Skyline” and keep hiking. Half a mile later, you see another fork and make another choice of where to go, and so on. Now, if you took the trail called “Montebello,” it would have been a different route for you. To explore the whole park, you’d have to hike all the trails.
In the case of real software, processing can contain tens, hundreds, thousands, or even millions of unique, logical paths that must be tested if you want to test 100% of the software. What’s important here is:
– In the case of inputs, you usually can determine a piece of data that represents some class of values (e.g., ‘dragon’ represents the class “Any input different from input 1 to 3 inclusively“), so if the processing of the input ‘dragon’ produces the expected result, you can assume (without being 100% sure!) that the input “*&^98gB&T*” would produce the same result.
– In the case of logical paths, you usually cannot say that Path 1 works if Path 2 works! In order to say that Path 1 works, you have to actually test that path.
Therefore, on top of an enormous number of possible inputs, we have an enormous number of possible logical paths.
Let’s pour some more salt on the wound. In addition to inputs and logical paths, we have conditions – i.e., presets and/or environments that exist when the software is used (you can also say “under which the software is used”). For example, execution of the same test case can end up with different results if you execute it using different Web browsers, e.g., Internet Explorer 6.0 and Firefox 2.0 (first condition is: “Test case is executed using IE 6.0”; second condition is “Test case is executed using Firefox 2.0”). So, in order to test 100% of software, not only we have to test all inputs and all logical paths, but we also must execute them in each version of each Web browser known to humans.
Brain Positioning
Looking at the enormous mountain of possibilities produced by combination of
– inputs,
– logical paths, and
– conditions,
we can clearly see that
it’s usually absolutely impossible to test 100% of the software. In other words, testing usually cannot cover 100% of the possibilities of how software can operate.
Therefore, we can
– EITHER set up a 200-year plan to test software that probably nobody will need in 10 years.
– OR be smart testers.
Brain Positioning
Whether we like it or not, there is always a probability that bugs will be missed by testers. But we, software testers, can substantially reduce that probability by working hard and, most importantly, smart to find and address bugs.
Misconception 2: “A tester’s effectiveness is reflected in the number of bugs found before the software is released to users.”
First, let’s introduce a new term: “release“. Depending on the context, a release can be defined as:
– Either the process of passing on some version of the product* to the consumer
– Or some version of the product
* In software terms, “product” is another term for “software.”
Here’s what happens: First we develop and test software, and then we release it (make it available) to our users.
During this process, bugs can be found:
– Before a release – these are called pre-release bugs
– After a release – these are called post-release bugs
Pre-release bugs are found and addressed by folks inside the company (testers, developers, etc.). Post-release bugs are found and addressed by anyone using the software (e.g., a Web site), but mostly by our users.
Here is the question: Do users care how many pre-release bugs the testers found during testing? Let’s see. We all use the Google search. Do we, as users, care how many bugs have been found by the Google testers? Nope. The only thing we care about is whether Google search retrieves relevant results. Users are happy when the software works fine, and they are not happy if it doesn’t (because of bugs, servers being down, or whatever reason). So there is no connection between user happiness (or more formally “customer satisfaction”) and internal company data about how many bugs have been found.
Now let’s look at this from a software development perspective. Every release is different. It’s different because of the complexity of the development and testing, the number of lines of code, the time given to develop and test it, and many other factors. Thus, not all releases are created equal, and the number of findable bugs can vary substantially from release to release.
Therefore, the number of pre-release bugs means nothing and tester’s effectiveness cannot be judged by it. It’s just a number.
What really matters is the number of post-release bugs and the seriousness of those bugs. That’s why once post-release bugs emerge, we have to analyze what, if any, was done incorrectly, and what can be done to prevent releasing similar bugs in a future. Next ->
Lecture 2 - The Purpose of Testing -> Quick Intro -> Exposing Misconceptions -> How To Make Use Of Stats For Post-Release Bugs