Troubleshooting in Review

I know I’m a little late celebrating Robbie Burns Day, which was at the end of January, but a line from one of his famous poems is a good intro to this column: “The best laid plans of mice and men gang aft a-gley.”

What he meant was: “Things go wrong.”

It’s a reality of the nature of the work you do that sometimes, systems don’t do what you want them to. When that happens, it’s your job to find out what’s wrong and correct it.

Troubleshooting skills are critical. Not to in any way downplay the importance of skilled installation work, but fixing things requires a different mindset. And it’s important to be prepared. It’s virtually guaranteed that once an installation nears completion, there will be something that needs to be troubleshot. Likewise, a system may have been functioning well for months or years before something goes wrong.

Regardless of when it happens, you need to follow a methodical process to make best use of your time. Following a clear process is the difference between being effective and efficient.

Alternatively, you could just try a bunch of stuff at random. I’ve seen people do this a lot. It’s a popular option, but it takes a lot more time and is not nearly as effective.

As it was drummed into me in training, and then in practice effective troubleshooting follows precise steps, in order: .

Diagnosing is when you describe the nature of the fault. Think in terms of what, when and where. The answers to those questions will lead you to the why and how. After that it’s time to analyze and identify the source of the problem. Be methodical and work through the possibilities from the most likely to the least.

Once you’ve identified the problem, now you know how to fix it. This is where the rubber meets the road, and the whole point of the previous two steps was to get here, and fix the fault in a timely manner. 
But it’s not enough to fix it; you have to be certain that your repair has worked. That means testing the equipment to make sure that not only have you fixed the fault, but that you haven’t caused others to occur, which happens more often than I’d like to admit. There’s an old maxim I stole from an engineer I used to know, that “Every solution has two problems.” Long story short, test your repairs before you call it a day.

Finally, wherever possible, after you’ve identified and repaired what caused the problem, devise a solution that will ensure that it doesn’t happen again. Avoid Band-Aid solutions that lead to another service call later on. If something in the system needs to be re-engineered, take care of it and take that as a lesson learned.

Nobody enjoys spending their time bogged down in troubleshooting hell. Have a process stick to your process, be rigorous and methodical and you’ll be finished sooner than if you hadn’t.