Want to Improve Your Investigation Results?
Keep an open mind well into the investigation. Resist the temptation to reject proposed scenarios until there is sufficient evidence to do so.
- By Jack Philley, CSP
- Jan 01, 2006
SMALL changes in investigation technique can sometimes produce large
improvements in incident investigation results. This article discusses several
common weaknesses found in investigation management systems. These weaknesses
could be viewed as low-hanging fruit that, if present and harvested, can
significantly enhance effectiveness of incident investigations. These weaknesses
also can be viewed as avoidable and possibly deleterious mistakes that can
poison an otherwise well-developed investigation system.
Although there are other important aspects to incident investigation, these
six items in particular represent typical avoidable defects that are correctable
using internal resources. If any of these weaknesses is present in your
investigation management system or practices, you could consider them as
opportunities to further improve investigation results.
1. Inadequate Target. If the performance target for the investigation is
set too low, the root cause level may not be reached. An effective investigation
policy will clearly define expected objectives in language understandable to all
levels of the organization, accompanied by written examples (both good and bad).
Investigators need to know when they have reached the intended root cause level.
If the performance bar is set too low, investigations may stop before
identifying underlying causes, and this can create an opportunity for a repeat
incident. Identifying and repairing system defects has broad-reaching benefits.
Underlying system weaknesses that remain uncorrected have potential to
broadly affect other activities and may generate similar incidents in other
departments.
Once a clear target is established, investigators and managers need a
sufficient and consistent understanding of what is considered as an
adequate level of performance in identifying root causes. Achieving adequate
understanding of targets is a function of the training, auditing, and quality
assurance aspects of the investigation management system.
2. Premature Stopping Point--Improper Screening of Information and
Evidence. It is a potential mistake to gather only information related to
the cause scenario that is felt to be most likely. If alternate scenarios are
rejected prematurely, evidence can be irrevocably lost and ultimate
identification of root causes may be incomplete or inaccurate. Seasoned
investigators try to keep an open mind well into the investigation. They resist
the temptation to reject proposed scenarios until there is sufficient evidence
to do so. Inexperienced investigators often decide on the root cause(s) before
starting the investigation and therefore selectively screen out information and
evidence that is ultimately found to be relevant and important.
In well-implemented investigation management systems, the concept of "delayed
selection decision of the most likely cause scenario" is incorporated into
investigator training, written protocol, and auditing/monitoring
activities.
3. Premature Stopping Point--Event Level. Events are results of
underlying root causes and conditions. Events themselves should not be viewed as
root causes. If a person experiences a flat tire and subsequently loses control
of the vehicle, the flat tire is an event. The investigation team should extend
the investigation activities to pursue the reasons and causes that resulted in
the tire becoming flat. If the investigation team mistakes an event for a root
cause, the investigation may not discover the underlying reasons for what caused
the event.
4. Premature Stopping Point--"Failure to Follow Procedure." In most
instances, failure to follow established procedure is an event that is
the result of underlying root causes. If the investigation stops at the "failure
to follow procedure" level, underlying causes for the failure often will not be
identified and corrected.
Most of the time, there are correctable reasons why the procedure was not
followed. These reasons can be discovered, and preventive actions can be
implemented to minimize the occurrence of similar incidents in the future. In
many cases, the investigation uncovers the fact this was not the first and only
time the procedure was not followed, although it may be the first time adverse
consequences resulted from the deviation.
5. Premature Stopping Point--Single Root Cause. In most instances, there
are multiple root causes and corresponding multiple opportunities in the
scenario where the accident sequence could have been arrested. If the
investigation finds only one root cause, it is likely that other remaining root
causes will remain in place (and uncorrected) to contribute to future incidents.
In literally all major accidents, there are multiple adverse events in the
accident sequence. If any of these events were missing, the chain of events
would be broken and the actual outcome of the scenario could be significantly
different. Multiple failures of safeguards occurred in the Challenger Space
Shuttle disaster, the aborted Apollo 13 moon mission, the Three Mile Island
nuclear power plant incident, and the Bhopal, India toxic chemical release (references 1, 2, 3). In the Apollo 13 incident, there were at
least four opportunities where safeguard features failed, including:
- Inadequate management-of-change activities when addressing voltage
specification changes from the original 28 volts to 65 volts resulted in failure
of a thermal protection temperature-limiting safety device. The failure of this
device is believed to have allowed internal temperature to reach excessive and
damaging levels during tank heating that occurred 17 days prior to launch.
- Improper rigging during assembly resulted in the tank being dropped and
damaged several years prior to launch.
- After the tank was dropped, inadequate testing and inspection failed to
detect internal damage caused by being dropped.
- An inadequate temporary procedure was developed and executed 17 days before
the mission when it was discovered that liquid oxygen could not be removed in
the normal manner following a dress rehearsal. This inadequate procedure,
coupled with the failed temperature-limiting device, resulted in generating
significant internal temperatures (on the order of 1,000 degrees F), which
damaged the internal wiring insulation. This damage provided the third and
missing fuel leg of the fire triangle when the internal tank-stirring device was
activated 56 hours into the mission.
In the absence of any of these four adverse events, outcome of the incident
scenario probably would have been much different. The lesson is that multiple
root causes are present in almost every incident. It would be a mistake for the
investigation team to stop after identifying a single root cause.
6. Marginal Quality Witness Interviews. Witness interviews present a wide
range of opportunities for success or failure. High-quality incident
investigation management systems often address the interview competency skill
set for investigation team members and provide corresponding training and
written interview guidelines. It is easy for a witness's perception of the event
to become changed by interaction and discussion with others prior to the
interview. This can happen both consciously and subconsciously. It is a
recognized good practice to conduct the initial interview in a private location
as soon as practical. One important phase of the interview is the "uninterrupted
narrative," where the witness is asked to tell what happened from his or her
perspective. Experienced interviewers will allow this narrative to proceed
without interruption, despite the strong temptation to ask questions and clarify
points during the narrative.
The ultimate purpose of incident investigation is prevention of a repeat
event. Accurately identifying and correcting the multiple underlying root causes
are keys to success. Recognizing and managing these six critical aspects of
incident investigation can significantly affect the results.
This column appeared in the January 2006 issue of Occupational Health
& Safety.
References
1. Apollo 13--Godwin, Robert, Apollo 13--The NASA Mission Reports,
2000, Apogee Books, Burlington, Burlington, Ontario Canada, ISBN
1-896522-55-6.
2. Bhopal--Mannan, Sam, Lees' Loss Prevention in the Process
Industries, Third Edition, 2005, Elsevier Butterworth Publishing, NY, NY,
ISBN 0-7506-7589-3.
3. Three Mile Island--Center for Chemical Process Safety, Guidelines for
Investigating Chemical Process Incidents, Second Edition, 2003, American
Institute of Chemical Engineers, NY, NY, ISBN 0-8169-0897-4.
This article originally appeared in the January 2006 issue of Occupational Health & Safety.