Failure Modes & Effects Analysis "A Modified Approach"
_ Presented by: Kenneth C. Latino - National Petroleum Refineries Association (NPRA) Maintenance Conference, May 1996

For more products related to this 
topic visit our Product Showcase 

For other articles related to this 
topic visit our Reference Library.

top
Need more training?
Search Database of training courses and conferences on this subject in our Tradeshow/Seminar Search Section
 
top

Wouldn't it be great if you knew exactly what all of those costly chronic (repetitive) failures were actually costing your facility? There is a way. It is called Failure Modes & Effects Analysis or FMEA for short. FMEA is a technique first utilized in the aerospace industry to find problems with an aircraft before it ever left the ground. In short, it is a way of looking into the future and determining where potential failures might be located. This sounds wonderful in theory but it takes a tremendous amount of time and energy to do this. Sometimes as much as 100 man years.

Realizing that we just do not have that much time or resources in our manufacturing plants, we had to devise a way to make the process less cumbersome. The modified approach makes one simple change to the process. Instead of looking into the future, we are going to take a look at our past failures. This changes the analysis time from 100 man years to an average of a few man weeks. This makes the process practical to use in our facilities.

The "MODIFIED" FMEA process is used to determine what failures are occurring, in our facility, and what their impact and frequencies are. Think of it this way:

Failure Event Failure Mode Frequency Impact Total Loss
Failure of pump p-1002 Bearing Failures 12 failure / yr. $2,000 / failure $24,000 / year

Sample Failure Event

This simplified table demonstrates the power of this technique. Imagine performing the above calculation for every failure event in your facility. I can assure you that the results will be astounding. Of course, we do not want to work on every failure event, so we need to determine which failure events are the most significant. It just so happens that, typically, 20% or less of the failure events represent 80% of our losses. This means that we do not have to do Root Cause Failure Analysis (RCFA) on everything. Just the ones that are "most" important.

Let's take a look at the steps involved in performing a "modified" FMEA:

# Steps Description
1 Perform preparatory work Develop a failure definition, contact flow diagram, gap analysis and preliminary worksheet and interview schedule.
2 Collect data Interview facility personnel to determine what the failures are, their frequencies and their impacts.
3 Summarize & encode results Input into an electronic spreadsheet and determine any redundancies.
4 Calculate loss Multiply frequency X impact for every failure event in the analysis.
5 Determine "Significant Few" Determine the 20% or less of the failures that result in 80% of the losses.
6 Validate results Verify that the results are valid.
7 Issue a report Communicate results .

Steps to perform a FMEA

Step 1 - Perform Preparatory Work

Before beginning any analysis, it is important to do some preliminary prep work. This analysis is no different. The first thing that needs to be accomplished is to select a system to analyze. For instance, we may want to select a small subset of the facility, as opposed to selecting the entire facility, as our system.

Once we know what system we want to work on, we must DEFINE FAILURE. This may seem trivial, but it is an essential step in the analysis. If we were to ask 100 people to define failure, we would probably get 100 different definitions. This would make our analysis far to broad. We need to focus, not on everything, but on the things that are most important to our business at that point in time. For instance, if utilization is critical to our business today, we should center our definition around utilization; if our priority issue is quality than our definition should center around quality.

Let's take a look at some examples of common failure definitions:

  • Failure is any loss that interrupts the continuity of production.
  • Failure is a loss of asset availability.
  • Failure is the unavailability of equipment.
  • Failure is a deviation from the status quo.
  • Failure is not meeting target expectations.
  • Failure is any secondary defect.

The definitions above are some common industrial failure definitions. Please note that there are no perfect failure definitions. For instance, "Failure is any loss that interrupts the continuity of production" has to include planned shutdowns, rate reductions for decreased sales, etc. It would not pick up failures on equipment that is spared since it does not interrupt the continuity of production.

A precise failure definition is important since it focuses the facility on the priority issues. It fosters good communications since everyone knows what is important and it also provides a basis for a common understanding of what the facility's needs are. Not to mention, it is an essential step in the development of a "Significant Few" failure list.

There are few rules of thumb to consider when developing a failure definition. It must be concise and easily understandable. If it is not, it will leave too much room for interpretation. It should not have to be interpreted. It must only address one topic. This is important to maintain the focus of the analysis. If we include too many topics our target becomes too large. Finally, it should be approved and signed by someone in authority so that everyone in the organization sees that it is a priority issue.

The next step in the preparation process is to develop a contact flow diagram. The contact flow diagram will allow you to break down your system into smaller, more manageable subsystems. The rule for this diagram is to map all of the process units that come into contact with the product. This diagram, as well as the failure definition, will be used when we begin to collect the data for the analysis.

The next thing we need to accomplish before we begin our FMEA is to perform a gap analysis. In other words, we need to uncover the disparity between what we are producing now and what is our potential. This will give us some indication as to the potential opportunity in our facility. For instance, we produce widgets in our facility, and we currently produce 150,000 per year. However, our potential is 300,000 per year. Now we have a gap of 150,000 widgets per year.

The final step in the preparation stage is to design a preliminary interview sheet and a schedule of people to interview to collect the data. This will be the form to assist you in collecting the data from your interviews.

To put this all into perspective, the following is a checklist of items to be covered prior to beginning a FMEA.

FMEA Preparatory Steps
Completed (Y/N)
Define the system to analyze  
Define failure  
Draw a contact diagram  
Calculate the gap  
Develop data worksheets  
Develop preliminary interview schedule  

FMEA preparation checklist

Step 2 - Collect the Data

There are a couple of ways of collecting the data for this analysis. You can rely on your computer data systems (i.e. Maintenance Management System) or you can go to the people who are closest to the work and get their input. Although each has its advantages, interviewing is probably the best since the information will be coming straight from the source. If you have enough confidence in your data systems, then it will be useful to use that information to later validate your interviews.
At this point let's discuss how you would use interviews to collect the data for your analysis. The process is really quite simple. Let's look at a simple scenario ....

You send out a message to all of the people that you would like to interview. You state the date, time and a brief description of the FMEA process for the interviewees. Note: it is important to interview at least 2 or 3 people in each session so that the interviewees can bounce ideas off of each other. Once in the room, you will need to display a large copy of the contact flow diagram and the failure definition so that they are in clear view of the interviewees. Now you will begin the process of asking your questions. There really is only one initiating question that needs to be asked; "What events or conditions satisfy the definition of failure within each of the subsystems in the contact flow diagram?". At this point the interviewees will begin to brainstorm all of the failure events that they have experienced within each of the subsystems. Once you have exhausted all of the possibilities, ask the interviewees what the frequency and impact is, on each of the failure events. The frequency should be based on the number of occurrences per year. The interviewees, however, will give you the information in the measurement units that make most sense to them. For instance, they may say it happens once per shift. It is your job to later translate that figure into the number of occurrences per year. The impact should include items such as manpower requirements, material costs and any downtime that might have been experienced. This is all there is to it!

When you begin the interview process, it is best to interview the people who are closest to the work (i.e. mechanics and operators). You should also talk with supervisors and possibly managers but certainly not to the extent that you would for mechanics and operators.

As a principal analyst, you will also need to be the principal interviewer. This means that you have to explain the process to the interviewees, ask the questions and capture the information on your log sheet. This can be a difficult job. If it is feasible, it would be advantageous to have an associate interviewer to assist you by recording the information on the log sheets. This allows you to focus on the questions and the interviewees.

The job of interviewing can be quite an experience, particularly if you do not have a lot of experience in conducting them. It tends to be more of an art form than a science. Below is a listing of some tips that may be useful when you begin to conduct your FMEA interviews.

Interview Tips

  • Be very careful to ask the exact same lead questions to each of the interviewees. This will eliminate the possibility of having different answers depending on the interpretation of the question. Later you can expand on the questions, if further clarification is necessary.

  • Make sure that the participants know what a FMEA is as well as the purpose and structure of the interviews. If you are not careful, the process may begin to look more like an interrogation than an interview to the interviewees. You want the interviewees to be comfortable.

  • Allow the interviewees to see what you are writing. This will set them at ease since they can see that the information they are providing is being recorded correctly. NEVER use a tape recorder in a FMEA session because it tends to make people uncomfortable and less likely to share information.

  • Never argue with an interviewee. Even if you do not agree with the person, it is best to accept what they are saying at face value and double check it with the information from other interviews. The minute you become argumentative, it reduces the amount of information that you can get from that person.

  • Always be aware of interviewees names. There is nothing sweeter to a persons ears than the sound of their own name. If you have trouble remembering, simply write the names down in front of you so that you can always refer to them.

  • It is important to develop a strategy to draw out quiet participants. There are many quiet people in our workforce who have a wealth of data to share but are not comfortable sharing it with others. We have to make sure that we draw out these quiet interviewees in a gentle and inquiring manner.

  • Be aware of the body language of interviewees. There is an entire science behind body language. It is not important that you become an expert in this area. However, it is important to know that a substantial portion of human communication is through body language. Let the body language talk to you.

  • In any set of interviews, there will be a number of people who are able to contribute more to the process than the others. It is important to make a note of the extraordinary contributors so that they can assist you later in the analysis. They will be extremely helpful if you need additional information, for validating your finished FMEA, as well as assisting you when you begin your actual Root Cause Failure Analysis (RCFA).

  • Remember to use your failure definition and block diagram to keep interviewees on track if they begin to wander off of the subject.

Step 3 - Summarize & Encode

At this point we have conducted a series of separate interviews and we need to look through our data to reduce redundant entries. Then we convert frequencies from the interviewees measurement units into occurrences per year (i.e. 2 per month would translate into 24 times per year).

The easiest way to summarize this information is to input the information into an electronic spreadsheet. There are many products on the market that you could use. Microsoft Excel, Lotus 123 or Borland's Quattro Pro are just a few of the more popular spreadsheet programs you should consider. Once the information is input, you can use your spreadsheet to sort the raw data first by sub-system and then by failure event. This will give you a closer look at the events that are redundant. As far as making the conversions to numbers of times per year, your more advanced spreadsheets can do many of these tasks for you. Consult your users manual for creating lookup tables.

The following example should give you an idea of what is meant by summarizing your data:

Sub-System
Failure Event
Failure Mode
Frequency
Impact
Recovery Recirculation Pump Fails Bearing Fails 1 per month 1 shift
Recovery Recirculation Pump Fails Oil Contamination 1 per 2 months 1 day
Recovery Recirculation Pump Fails Bearing Locks Up 1 per month 12 hours
Recovery Recirculation Pump Fails Shaft Fractures 1 per year 1 day

This data suggests that the first three items are the same since they each impact the bearings and have fairly consistent frequencies and impacts. The last item is also related to bearings but went one step beyond the others since we not only lost the bearings but we also suffered a fractured shaft. This would indicate a separate mode of failure. A summarization of this data might look something like this:

Sub-System
Failure Event
Failure Mode
Frequency
Impact
Recovery Recirculation Pump Fails Bearing Problems 12 per year 12 hours
Recovery Recirculation Pump Fails Shaft Fractures 1 per year 1 day

Completed FMEA failure event summarization

Step 4 - Calculate Loss

At this point, we want to do a simple calculation to generate our total loss for each event in the analysis. The calculation is as follows:

Frequency x Loss Per Occurrence(Impact) = Total Loss Per Year

Let's look at an example of just how to apply this:

Sub-System
Failure Event
Failure Mode
Frequency
Impact
Total Loss (hrs./yr.)
Recovery Recirculation Pump Fails Bearing Fails 12 per year 12 lost hrs. 144 lost hrs. of prod.
Compressor Seal Failure Blown Seals 4 per year 24 lost hrs. 96 lost hrs. of prod.
Mixers Filter Switches Filters Clogged 26 per year 2 lost hrs. 52 lost hrs. of prod.
Vent Condensers Pressure Gauge Leaks Leaks
Due To Corrosion
.33 per year 24 lost hrs. 8 lost hrs. of prod.

Completed Loss Calculation Example

What we need to do is multiply the frequency times the impact to get our total loss. In the first event, we have a failure occurring once per month or 12 times per year. We lose a total of 12 hours production every time this occurs. So we simply multiply 12 occurrences times 12 hours of lost production to get a total loss of 144 hours per year. If you decide to use an electronic spreadsheet all of these calculations can be performed automatically by multiplying the frequency and impact columns. Refer to the section in your software's user manual that concerns multiplying columns.

It is important to make sure that total loss is communicated in the most appropriate units. For example, we used hours of downtime per year in the example above. Hours of downtime might not mean much to some people. So it might be more advantageous to convert that number from hours per year to dollars per year since everyone can relate to dollars. In other words, use the units that will get the most attention from everyone involved.

Step 5 - Determining the "Significant Few"

The concept of the "Significant Few" is derived from a famous Italian Economist name Vilfredo Pareto. Pareto said that "In any set or collection of objects, ideas, people and events, a FEW within the sets or collections are MORE SIGNIFICANT than the remaining majority". Consider these examples:

80% of a bank's assets are representative of 20% or less of its customers.

80% of the care given in a hospital is received by 20% or less of its patients.

Well it is no different in industry. 80% of the losses in a manufacturing facility are represented by 20% or less of its failure events. This means that we only have to perform root cause failure analysis on 20% or less of our failure events to reduce or eliminate 80% of our facilities losses. Now that is significant!!!

In order to determine the significant few you must perform a few simple steps:

  1. Total all of the failure events in the analysis to create a global total loss.

  2. Sort the total column in descending order (i.e. highest to lowest)

  3. Multiply the global total loss column by 80% or .80. This will give you the "Significant Few" loss figure that you will need to determine what the "Significant Few" failures are in your facility.

  4. Go to the top of the total loss column and begin adding the top events from top to bottom. When the sum of these losses is equal to or greater than the "Significant Few" loss figure than those events are your "Significant Few" failure events.

Let's take a look at how this applies to our discussion on FMEA.

Sub System Failure Event Failure Mode Freq. Impact Total Loss
Sub System 3 Failure Event 1 Failure Mode 1 2000 $850 $1,700,000
Sub System 2 Failure Event 2 Failure Mode 2 1000 $1,250 $1,250,000
Sub System 4 Failure Event 3 Failure Mode 3 4 $75,000 $300,000
Sub System 2 Failure Event 4 Failure Mode 4 18 $6,000 $108,000
Sub System 3 Failure Event 5 Failure Mode 5 6 $12,000 $72,000
Sub System 2 Failure Event 6 Failure Mode 6 52 $1,000 $52,000
Sub System 3 Failure Event 7 Failure Mode 7 80 $500 $40,000
Sub System 3 Failure Event 8 Failure Mode 8 12 $3,000 $36,000
Sub System 4 Failure Event 9 Failure Mode 9 365 $75 $27,375
Sub System 3 Failure Event 10 Failure Mode 10 24 $1,000 $24,000
Sub System 1 Failure Event 11 Failure Mode 11 12 $1,300 $15,600
Sub System 2 Failure Event 12 Failure Mode 12 40 $300 $12,000
Sub System 1 Failure Event 13 Failure Mode 13 12 $1,000 $12,000
Sub System 2 Failure Event 14 Failure Mode 14 10 $1,000 $10,000
Sub System 1 Failure Event 15 Failure Mode 15 48 $200 $9,600
Sub System 3 Failure Event 16 Failure Mode 16 3 $2,000 $6,000
Sub System 2 Failure Event 17 Failure Mode 17 6 $1,000 $6,000
Total Global Loss         $3,680,575
Significant Few Losses         $2,944,460

In the example above, we have totaled the loss column and have a total global loss of $3,680,575. The total loss column has been sorted in descending order so that it is easy to identify the "Significant" failure events. Our "Significant Few" loss figure that we are looking for is $2,944,460 ($3,680,575 x .80). Now all we have to do is simply go to the top of the total loss column and begin adding from top to bottom until we reach the "Significant Few" loss figure of $2,944,460. It turns out that the first 2 failure events represent approximately 80% of our losses ($2,950,000 ) or our "Significant Few" failure list. Now, instead of doing Root Cause Failure Analysis on everything, we are only going to do it on the ones in our "Significant Few" failure list.

Step 6 - Validate Your Results

There are a few validations that should be performed to make sure that our analysis is correct. You can use the gap analysis to make sure that all of the events add up to +/- 10% of the gap. If it ends up being less, you have probably left some important failure events off the listing. If you have more than the gap then you probably have not summarized your results well enough. There may be some redundancies in your list.

A second validation that you can use is having a group of experienced people from your facility review your findings. This will help ensure that you are not too far off base. A third, and final, validation would be to use your computerized data systems to see if the events closely match the data in your maintenance management system. This will give you further confidence in your analysis. Do not worry if your list varies from your maintenance management system (MMS), since you will pick a lot of events that are never even recorded in the work order system (i.e. those events that may take only a few minutes to repair).

Step 7 - Issue a Report

As with any analysis, it is important to communicate your findings to all interested parties. Your report should include the following items:

An explanation of the analysis technique.

The failure definition that was utilized.

The contact flow diagram that was utilized.

The results displayed graphically as well as the supporting spreadsheet lists.

Recommendations of which failures are candidates for Root Cause Failure Analysis.

A listing of everyone involved in the analysis including all of the interviewees.

Last but not least, make sure that you communicate the results of the analysis back to the interviewees who participated, so that everyone can feel a sense of accomplishment and ownership.

In summary, FMEA is a fantastic tool for limiting your analysis work to only those things that are of significant importance to the facility. You cannot perform Root Cause Failure Analysis on everything. However, you can use this tool to help narrow our focus to what is "most" important.

Ken Latino has a Bachelor's of Science Degree in Computerized Information Systems. He began his career developing and maintaining maintenance software applications for the continuous process industries. After working with clients to help them become more proactive in their maintenance activities he began instructing industrial plants on reliability methods and technologies to help improve the reliability of their facilities. He has co-authored two failure analysis training seminars for engineers and hourly craftspeople. He can be contacted at 804/458-0645 or klatino@reliability.com

top
 

This page, and all contents, are Copyright © 1994-2002 by

Reliability Center, Inc.
804-458-0645 | 804-452-2119 (Fax) | info@reliability.com | www.reliability.com

If you have any comments about the article you have just read and you would like to share them with us at Maintenance Resources, please feel free to email us by clicking on the email button below.

Current Issue

Archives
E-mail Us
© Copyright 2002 Maintenance Resources, Inc.
Phone: 812.877.7119  -  Fax: 812.877.7116  -  E-Mail: info@maintenanceresources.com
Address: 120 South Hunt Street  -  Terre Haute, IN 47805