Systematically establishing the cause of the bug
“But that’s not possible!” is a response I have heard from developers more than once when I have told them about a problem.
If you work in product development, you’ll often run into problems that at first seem like complete mysteries. They don’t seem to have a clear root cause, so you need to investigate them systematically. If you treat them like normal problems, they may not get solved effectively, or they turn into zombie problems. Zombie problems keep surfacing even though you thought you fixed them already.
Another common mistake with mystery problems is to rush to your favorite explanation without establishing the facts first. Through disbelief, bubble-gum fixes and favorite explanations, the problem remains unsolved, and its actual cause unknown.
“Elementary, my dear Watson”
Product development staff should use the methods of systematic problem solving. This is quite like detective work, so the Sherlock Holmes comparison is appropriate! In leadership literature, systematic problem solving was first introduced by Charles Kepner and Ben Tregoe in their book The Rational Manager in the 1960s. But the actual methods are age-old since their basics are familiar to us from Arthur Conan Doyle’s Sherlock Holmes stories.
Don’t jump to conclusions
“It is a capital mistake to theorize before you have all the evidence. It biases the judgment.”
– Sherlock Holmes, A Study in Scarlet
To accurately describe and investigate the problem, resist the urge to jump to any theories, favorite explanations, and other conclusions. Do the work.
Ensure an accurate problem title
All energy should be focused on collecting data on the problem. It is easier to investigate the problem if its title describes it as accurately as possible.
“The warehouse floor is slippery” is a bad title (not enough information).
“There is oil on the warehouse floor” is better.
“There is an oil leak under air vent number 1” is much better. People instantly know what you mean.
With today’s electronic bug management tools, it is easy to update the title of the problem. Using the wrong title misleads new people that start looking at the problem and can be confusing if the list of problems is long.
Use a clear and detailed description
“Always approach a case with an absolutely blank mind. It is always an advantage. Form no theories, just simply observe and draw inferences from your observations.”
– Sherlock Holmes, The Adventure of the Cardboard Box
A great way to help outlining a description is to ask yourself the questions: what, where, when, and how much.
Finding the answers helps you understand the problem.
- When was this bug first detected?
- What activities immediately preceded this time?
- In what environment is the problem repeatable?
- How often does it occur?
- How severe is it?
- Where does it occur?
After this, identify where the problem could happen, but is not detected. This is a very important step, yet often overlooked.
For example, even running the same software, the problem is seen in one configuration or environment, but not in others.
- Does it only occur at a certain time of the day, for example, every day at 4 a.m., but never at other times?
- Does it only occur when the user logs in?
- Does it only occur when several users log in simultaneously?
This kind of information will help people who troubleshoot direct their attention to the differences in different environments or setups. This phase should be continued until the investigators feel that they have enough information and are now able to move on to the next step.
Round up the suspects!
“There should be no combination of events for which the wit of man cannot conceive an explanation.”
– Sherlock Holmes, The Valley of Fear
Time to find the possible culprits. This step is much like brainstorming, and the best way to carry it out is in pairs or in a small group. Remember to remain disciplined and not prematurely move on to the next phase.
When enough potential causes have been found, you should consider their mechanisms. Which sequence of events is required so that this potential cause could explain the problem? How likely is that sequence of events?
If there are three potential causes, one of which is much more likely to be the actual cause compared to the other suspects, the next step is obvious.
Consider the probabilities and test the most probable cause first
“When you have eliminated the impossible, whatever remains, no matter how improbable, must be the truth.”
– Sherlock Holmes, The Sign of Four
Consider what kind of test could establish which one of the suspects is the actual cause, and begin testing. Start with the most probable suspect, to find out which one of the potential causes explains the problem. The testing becomes less difficult if the bug can be easily repeated.
When the culprit has been found, the last step is to fix the bug and carry out other necessary actions, which is usually quite a lot easier than finding the cause.
Systematic problem solving can be learned
Thanks to the methods of systematic problem solving, you can solve your mystery bugs easier and faster.
These methods should be understood by:
- project managers
- Product Owners
- developers
- testers
- customer support staff
If they all understand this, you can create good titles and descriptions of bugs right from the start. And by finding the possible causes and determining their probabilities quickly, you can find the root cause… even if the butler did it.
Published: Mar 15, 2022