What Is Root Cause Failure Analysis?
Written by Robert J. Latino
AISE Expo - May 1998
     Root Cause Failure Analysis or RCFA as it is sometimes called, tends to have a different meaning depending upon who you ask. To put it simply, it is a method or series of actions taken to find out why a particular failure or problem exists and correcting those causes. It is similar to what detectives do when a crime occurs or what the NTSB does when an airplane crashes.
     Let?s talk a little about why you would want to conduct a Root Cause Failure Analysis in the first place. It is a proven fact that most of the failures/problems that plague industry are what we would call CHRONIC. This means that they happen more than once for the same reason. Furthermore, out of all of the CHRONIC failures that you experience in a given year, 20% of those failures represent 80% of the loss. These are important facts to understand when you think about the benefits of Root Cause Failure Analysis. It means that if you investigate the 20% of the failures representing 80% of your losses, you will reap quantum benefits in a short period of time. We call these few failures the "Significant Few" failures.
     Once we have identified the "Significant Few" failures, we must begin analyzing. While there are many ways to go about this, I would like to talk about a process that has been successfully field tested over a 30 year period. This process is called PROACT?. PROACT? is an acronym for the following: 
    PReserving Failure Data 
    Ordering the Analysis 
    Analyzing the Data 
    Communicating Findings & Recommendations 
    Tracking for Success
     Let?s take a look at each of these steps in greater detail: 
Preserving Failure Data
     With any failure or problem it is of the utmost importance to collect data with respect to that problem. Consider what an NTSB investigator does right after an airplane crash. They comb the area for data such as the data flight recorder ("black box"), broken airplane parts, instrument readings, etc. You must do the same when you are analyzing a problem. We have a data collection procedure called the 5P?s. The 5P?s are the categories of information necessary to collect to begin analyzing a failure. The 5P?s stands for People, Parts, Paper, Position and Paradigms. As an analyst, you must, through interviews, brainstorm with all of the people involved to collect the other items based on these 5 categories. 
Ordering the Analysis
     It would be difficult to analyze any problem by yourself and it would be unrealistic to think you could analyze a "Significant" problem without the assistance of a multi-disciplined group of individuals. Ordering the analysis consists of putting the right expertise on your team. You need a Principal Analyst to facilitate your Root Cause Failure Analysis project and a team of analysis experts. In addition to this team of experts, you need to delineate exactly what your team charter or objective is. This creates focus for the team. The team must also delineate exactly what rule or guideline they will follow while analyzing this problem. We call these critical success factors or CSF?s for short.
Analyzing the Data
     Once you have assembled your team and have collected the critical data, you must now begin to analyze the problem. PROACT? uses a logic tree process to help the team members focus on the problem at hand. They break the problem down to it smallest components and then begin hypothesizing as to what the underlying causes might be. The logic tree is broken down to 5 basic steps:
  • Stating the Failure Event
  • Stating the Failure Modes
  • Hypothesizing
  • Verifying Hypotheses
  • Determining Underlying Causes (Physical, Human and Latent)
     The team asks a series of "How Can" questions to come up with their hypotheses. For instance, if our failure mode is a failed bearing our hypotheses might be fatigue, overload, corrosion and erosion. We then have to use real field data to prove or disprove the analysis team?s hypothesis. Once all hypotheses have been proven or disproved we can assess which underlying causes are physical, human or latent. These are the three main categories of problem or failure causes. The logic tree is really a visual brainstorming tool to help you logically figure out the root cause of a problem.
Communicate Findings and Recommendations to Decision Makers
     Once you have successfully completed the search for causes with your logic tree, it is now time to communicate your findings and solutions to the decision-makers. You must provide a detailed report to help the decision-makers understand the effectiveness of your failure analysis so that your recommendations are given a fair assessment. You must create such a compelling case that it would seem foolish not to go ahead with your recommendations.
Tracking for Results
     Assuming that your communication with the decision-makers was successful, you must now track the effectiveness of your recommendations to make sure that you are getting the return on investment that you have anticipated. You can do this with a number of measurements such as reduced maintenance costs, improved production rates, reduced failure rates, etc?
     What does all this mean to you? Reliability Center, Inc. clients who successfully apply the PROACT? methodology have captured returns in excess of 800% to 1000%. These numbers may sound unrealistic, but think about what failures cost the average manufacturing facility. In a typical oil refinery or other continuous process plant, downtime costs can be staggering. Add in the cost of the repair itself and we?re talking hundreds of thousands to millions of dollars in a given year. Also remember that these failures are chronic, so if we do not eliminate them they WILL happen again. 
     RCI offers a number of products and services to assist you in your endeavor to begin analyzing failures. We have training courses designed to teach engineers and technical representatives the methods necessary to analyze and eliminate the "Significant Few" failures. We also offer training that gives your operators and craftspeople the tools necessary to identify, analyze and eliminate failures that they come in contact with every day while still performing their normal job functions. 
     In addition to training, RCI has a proven track record of assisting clients with Root Cause Failure Analysis facilitations. This simply means that we work with your team for several days to get them off to a successful start. From there we consult with the team to help them eliminate any roadblocks that might prevent progress.
     RCI has recently released PROACT? software, to greatly enhance the effectiveness and efficiency of all Root Cause Failure Analysis teams. PROACT? is a software tool to help manage the Root Cause Failure Analysis process. It provides the forms for data entry, a built in logic tree with integrated verification logs, a reporting system to make communicating your results a cinch. Last but not least, there is a tracking module to track key metrics to make sure the recommendations that are implemented are providing the required return. Isn?t it about time you started eliminating your "Significant Few" failures?
