Failure
Modes & Effects Analysis
"A
Modified Approach"
|
Presented
by: Kenneth C. Latino
National
Petroleum Refineries Association (NPRA) Maintenance Conference, May 1996
|
|
Wouldn't it be great if you knew exactly what all of those costly chronic
(repetitive) failures were actually costing your facility? There is a way.
It is called Failure Modes & Effects Analysis or FMEA for short. FMEA
is a technique first utilized in the aerospace industry to find problems
with an aircraft before it ever left the ground. In short, it is a way
of looking into the future and determining where potential failures might
be located. This sounds wonderful in theory but it takes a tremendous amount
of time and energy to do this. Sometimes as much as 100 man years. |
|
Realizing that we just do not have that much time or resources in our manufacturing
plants, we had to devise a way to make the process less cumbersome. The
modified approach makes one simple change to the process. Instead of looking
into the future, we are going to take a look at our past failures. This
changes the analysis time from 100 man years to an average of a few man
weeks. This makes the process practical to use in our facilities. |
|
The "MODIFIED" FMEA process is used to determine what failures are occurring,
in our facility, and what their impact and frequencies are. Think of it
this way: |
| Failure
Event |
Failure
Mode |
Frequency |
Impact |
Total
Loss |
| Failure
of pump p-1002 |
Bearing
Failures |
12
failure / yr. |
$2,000
/ failure |
$24,000
/ year |
|
|
Sample Failure Event
|
|
This simplified table demonstrates the power of this technique. Imagine
performing the above calculation for every failure event in your facility.
I can assure you that the results will be astounding. Of course, we do
not want to work on every failure event, so we need to determine which
failure events are the most significant. It just so happens that, typically,
20% or less of the failure events represent 80% of our losses. This means
that we do not have to do Root Cause Failure Analysis (RCFA) on everything.
Just the ones that are "most" important. |
| Let's take a look at the
steps involved in performing a "modified" FMEA: |
| # |
Steps |
Description |
| 1 |
Perform
preparatory work |
Develop
a failure definition, contact flow diagram, gap analysis and preliminary
worksheet and interview schedule. |
| 2 |
Collect
data |
Interview
facility personnel to determine what the failures are, their frequencies
and their impacts. |
| 3 |
Summarize
& encode results |
Input
into an electronic spreadsheet and determine any redundancies. |
| 4 |
Calculate
loss |
Multiply
frequency X impact for every failure event in the analysis. |
| 5 |
Determine
"Significant Few" |
Determine
the 20% or less of the failures that result in 80% of the losses. |
| 6 |
Validate
results |
Verify
that the results are valid. |
| 7 |
Issue
a report |
Communicate
results . |
|
|
Steps to perform
a FMEA
|
| Step
1 - Perform Preparatory Work |
|
Before beginning any analysis, it is important to do some preliminary prep
work. This analysis is no different. The first thing that needs to be accomplished
is to select a system to analyze. For instance, we may want to select a
small subset of the facility, as opposed to selecting the entire facility,
as our system. |
|
Once we know what system we want to work on, we must DEFINE FAILURE. This
may seem trivial, but it is an essential step in the analysis. If we were
to ask 100 people to define failure, we would probably get 100 different
definitions. This would make our analysis far to broad. We need to focus,
not on everything, but on the things that are most important to our business
at that point in time. For instance, if utilization is critical to our
business today, we should center our definition around utilization; if
our priority issue is quality than our definition should center around
quality. |
| Let's
take a look at some examples of common failure definitions: |
Failure is any loss that
interrupts the continuity of production.
|
Failure is a loss of asset
availability.
|
Failure is the unavailability
of equipment.
|
Failure is a deviation from
the status quo.
|
Failure is not meeting target
expectations.
|
Failure is any secondary
defect.
|
|
The definitions above are some common industrial failure definitions. Please
note that there are no perfect failure definitions. For instance, "Failure
is any loss that interrupts the continuity of production" has to include
planned shutdowns, rate reductions for decreased sales, etc. It would not
pick up failures on equipment that is spared since it does not interrupt
the continuity of production. |
|
A precise failure definition is important since it focuses the facility
on the priority issues. It fosters good communications since everyone knows
what is important and it also provides a basis for a common understanding
of what the facility's needs are. Not to mention, it is an essential step
in the development of a "Significant Few" failure list. |
|
There are few rules of thumb to consider when developing a failure definition.
It must be concise and easily understandable. If it is not, it will leave
too much room for interpretation. It should not have to be interpreted.
It must only address one topic. This is important to maintain the focus
of the analysis. If we include too many topics our target becomes too large.
Finally, it should be approved and signed by someone in authority so that
everyone in the organization sees that it is a priority issue. |
|
The next step in the preparation process is to develop a contact flow diagram.
The contact flow diagram will allow you to break down your system into
smaller, more manageable subsystems. The rule for this diagram is to map
all of the process units that come into contact with the product. This
diagram, as well as the failure definition, will be used when we begin
to collect the data for the analysis. |
|
The next thing we need to accomplish before we begin our FMEA is to perform
a gap analysis. In other words, we need to uncover the disparity between
what we are producing now and what is our potential. This will give us
some indication as to the potential opportunity in our facility. For instance,
we produce widgets in our facility, and we currently produce 150,000 per
year. However, our potential is 300,000 per year. Now we have a gap of
150,000 widgets per year. |
|
The final step in the preparation stage is to design a preliminary interview
sheet and a schedule of people to interview to collect the data. This will
be the form to assist you in collecting the data from your interviews. |
|
To put this all into perspective, the following is a checklist of items
to be covered prior to beginning a FMEA. |
| FMEA
Preparatory Steps |
Completed
(Y/N) |
| Define
the system to analyze |
|
| Define
failure |
|
| Draw
a contact diagram |
|
| Calculate
the gap |
|
| Develop
data worksheets |
|
| Develop
preliminary interview schedule |
|
|
|
FMEA preparation
checklist
|
| Step
2 - Collect the Data |
|
There are a couple of ways of collecting the data for this analysis. You
can rely on your computer data systems (i.e. Maintenance Management System)
or you can go to the people who are closest to the work and get their input.
Although each has its advantages, interviewing is probably the best since
the information will be coming straight from the source. If you have enough
confidence in your data systems, then it will be useful to use that information
to later validate your interviews. |
|
At this point let's discuss how you would use interviews to collect the
data for your analysis. The process is really quite simple. Let's look
at a simple scenario .... |
You send out a message to all of the people that you would like to interview.
You state the date, time and a brief description of the FMEA process for
the interviewees. Note: it is important to interview at least 2 or 3 people
in each session so that the interviewees can bounce ideas off of each other.
Once in the room, you will need to display a large copy of the contact
flow diagram and the failure definition so that they are in clear view
of the interviewees. Now you will begin the process of asking your questions.
There really is only one initiating question that needs to be asked; "What
events or conditions satisfy the definition of failure within each of the
subsystems in the contact flow diagram?". At this point the interviewees
will begin to brainstorm all of the failure events that they have experienced
within each of the subsystems. Once you have exhausted all of the possibilities,
ask the interviewees what the frequency and impact is, on each of the failure
events. The frequency should be based on the number of occurrences per
year. The interviewees, however, will give you the information in the measurement
units that make most sense to them. For instance, they may say it happens
once per shift. It is your job to later translate that figure into the
number of occurrences per year. The impact should include items such as
manpower requirements, material costs and any downtime that might have
been experienced. This is all there is to it!
|
|
When you begin the interview process, it is best to interview the people
who are closest to the work (i.e. mechanics and operators). You should
also talk with supervisors and possibly managers but certainly not to the
extent that you would for mechanics and operators. |
|
As a principal analyst, you will also need to be the principal interviewer.
This means that you have to explain the process to the interviewees, ask
the questions and capture the information on your log sheet. This can be
a difficult job. If it is feasible, it would be advantageous to have an
associate interviewer to assist you by recording the information on the
log sheets. This allows you to focus on the questions and the interviewees. |
|
The job of interviewing can be quite an experience, particularly if you
do not have a lot of experience in conducting them. It tends to be more
of an art form than a science. Below is a listing of some tips that may
be useful when you begin to conduct your FMEA interviews. |
| Interview
Tips |
|
Be very careful to ask the exact same lead questions to each of the interviewees.
This will eliminate the possibility of having different answers depending
on the interpretation of the question. Later you can expand on the questions,
if further clarification is necessary. |
|
Make sure that the participants know what a FMEA is as well as the purpose
and structure of the interviews. If you are not careful, the process may
begin to look more like an interrogation than an interview to the interviewees.
You want the interviewees to be comfortable. |
|
Allow the interviewees to see what you are writing. This will set them
at ease since they can see that the information they are providing is being
recorded correctly. NEVER use a tape recorder in a FMEA session because
it tends to make people uncomfortable and less likely to share information. |
|
Never argue with an interviewee. Even if you do not agree with the person,
it is best to accept what they are saying at face value and double check
it with the information from other interviews. The minute you become argumentative,
it reduces the amount of information that you can get from that person. |
|
Always be aware of interviewees names. There is nothing sweeter to a persons
ears than the sound of their own name. If you have trouble remembering,
simply write the names down in front of you so that you can always refer
to them. |
|
It is important to develop a strategy to draw out quiet participants. There
are many quiet people in our workforce who have a wealth of data to share
but are not comfortable sharing it with others. We have to make sure that
we draw out these quiet interviewees in a gentle and inquiring manner. |
|
Be aware of the body language of interviewees. There is an entire science
behind body language. It is not important that you become an expert in
this area. However, it is important to know that a substantial portion
of human communication is through body language. Let the body language
talk to you. |
|
In any set of interviews, there will be a number of people who are able
to contribute more to the process than the others. It is important to make
a note of the extraordinary contributors so that they can assist you later
in the analysis. They will be extremely helpful if you need additional
information, for validating your finished FMEA, as well as assisting you
when you begin your actual Root Cause Failure Analysis (RCFA). |
|
Remember to use your failure definition and block diagram to keep interviewees
on track if they begin to wander off of the subject. |
| Step
3 - Summarize & Encode |
|
At this point we have conducted a series of separate interviews and we
need to look through our data to reduce redundant entries. Then we convert
frequencies from the interviewees measurement units into occurrences per
year (i.e. 2 per month would translate into 24 times per year). |
|
The easiest way to summarize this information is to input the information
into an electronic spreadsheet. There are many products on the market that
you could use. Microsoft Excel, Lotus 123 or Borland's Quattro Pro are
just a few of the more popular spreadsheet programs you should consider.
Once the information is input, you can use your spreadsheet to sort the
raw data first by sub-system and then by failure event. This will give
you a closer look at the events that are redundant. As far as making the
conversions to numbers of times per year, your more advanced spreadsheets
can do many of these tasks for you. Consult your users manual for creating
lookup tables. |
|
The following example should give you an idea of what is meant by summarizing
your data: |
| Sub-System |
Failure
Event |
Failure
Mode |
Frequency |
Impact |
| Recovery |
Recirculation
Pump Fails |
Bearing
Fails |
1
per month |
1
shift |
| Recovery |
Recirculation
Pump Fails |
Oil
Contamination |
1
per 2 months |
1
day |
| Recovery |
Recirculation
Pump Fails |
Bearing
Locks Up |
1
per month |
12
hours |
| Recovery |
Recirculation
Pump Fails |
Shaft
Fractures |
1
per year |
1
day |
|
|
This data suggests that the first three items are the same since they each
impact the bearings and have fairly consistent frequencies and impacts.
The last item is also related to bearings but went one step beyond the
others since we not only lost the bearings but we also suffered a fractured
shaft. This would indicate a separate mode of failure. A summarization
of this data might look something like this: |
| Sub-System |
Failure
Event |
Failure
Mode |
Frequency |
Impact |
| Recovery |
Recirculation
Pump Fails |
Bearing
Problems |
12
per year |
12
hours |
| Recovery |
Recirculation
Pump Fails |
Shaft
Fractures |
1
per year |
1
day |
|
|
Completed FMEA failure
event summarization
|
| Step
4 - Calculate Loss |
|
At this point, we want to do a simple calculation to generate our total
loss for each event in the analysis. The calculation is as follows: |
Frequency
x Loss Per Occurrence(Impact) = Total Loss Per Year
|
| Let's look at an example
of just how to apply this: |
| Sub-System |
Failure
Event |
Failure
Mode |
Frequency |
Impact |
Total
Loss (hrs./yr.) |
| Recovery |
Recirculation
Pump Fails |
Bearing
Fails |
12
per year |
12
lost hrs. |
144
lost hrs. of prod. |
| Compressor |
Seal
Failure |
Blown
Seals |
4
per year |
24
lost hrs. |
96
lost hrs. of prod. |
| Mixers |
Filter
Switches |
Filters
Clogged |
26
per year |
2
lost hrs. |
52
lost hrs. of prod. |
| Vent
Condensers |
Pressure
Gauge Leaks |
Leaks
Due To Corrosion |
.33
per year |
24
lost hrs. |
8
lost hrs. of prod. |
|
|
Completed Loss Calculation
Example
|
|
What we need to do is multiply the frequency times the impact to get our
total loss. In the first event, we have a failure occurring once per month
or 12 times per year. We lose a total of 12 hours production every time
this occurs. So we simply multiply 12 occurrences times 12 hours of lost
production to get a total loss of 144 hours per year. If you decide to
use an electronic spreadsheet all of these calculations can be performed
automatically by multiplying the frequency and impact columns. Refer to
the section in your software's user manual that concerns multiplying columns. |
|
It is important to make sure that total loss is communicated in the most
appropriate units. For example, we used hours of downtime per year in the
example above. Hours of downtime might not mean much to some people. So
it might be more advantageous to convert that number from hours per year
to dollars per year since everyone can relate to dollars. In other words,
use the units that will get the most attention from everyone involved. |
| Step
5 - Determining the "Significant Few" |
|
The concept of the "Significant Few" is derived from a famous Italian Economist
name Vilfredo Pareto. Pareto said that "In any set or collection of objects,
ideas, people and events, a FEW within the sets or collections are MORE
SIGNIFICANT than the remaining majority". Consider these examples: |
80% of a bank's assets are
representative of 20% or less of its customers.
|
80% of the care given in
a hospital is received by 20% or less of its patients.
|
|
Well it is no different in industry. 80% of the losses in a manufacturing
facility are represented by 20% or less of its failure events. This means
that we only have to perform root cause failure analysis on 20% or less
of our failure events to reduce or eliminate 80% of our facilities losses.
Now that is significant!!! |
| In order to determine
the significant few you must perform a few simple steps: |
Total all of the failure
events in the analysis to create a global total loss.
|
Sort the total column in
descending order (i.e. highest to lowest)
|
Multiply the global total
loss column by 80% or .80. This will give you the "Significant Few" loss
figure that you will need to determine what the "Significant Few" failures
are in your facility.
|
Go to the top of the total
loss column and begin adding the top events from top to bottom. When the
sum of these losses is equal to or greater than the "Significant Few" loss
figure than those events are your "Significant Few" failure events.
|
| Let's take a look at how
this applies to our discussion on FMEA. |
| Sub
System |
Failure
Event |
Failure
Mode |
Freq. |
Impact |
Total
Loss |
| Sub
System 3 |
Failure
Event 1 |
Failure
Mode 1 |
2000 |
$850 |
$1,700,000 |
| Sub
System 2 |
Failure
Event 2 |
Failure
Mode 2 |
1000 |
$1,250 |
$1,250,000 |
| Sub
System 4 |
Failure
Event 3 |
Failure
Mode 3 |
4 |
$75,000 |
$300,000 |
| Sub
System 2 |
Failure
Event 4 |
Failure
Mode 4 |
18 |
$6,000 |
$108,000 |
| Sub
System 3 |
Failure
Event 5 |
Failure
Mode 5 |
6 |
$12,000 |
$72,000 |
| Sub
System 2 |
Failure
Event 6 |
Failure
Mode 6 |
52 |
$1,000 |
$52,000 |
| Sub
System 3 |
Failure
Event 7 |
Failure
Mode 7 |
80 |
$500 |
$40,000 |
| Sub
System 3 |
Failure
Event 8 |
Failure
Mode 8 |
12 |
$3,000 |
$36,000 |
| Sub
System 4 |
Failure
Event 9 |
Failure
Mode 9 |
365 |
$75 |
$27,375 |
| Sub
System 3 |
Failure
Event 10 |
Failure
Mode 10 |
24 |
$1,000 |
$24,000 |
| Sub
System 1 |
Failure
Event 11 |
Failure
Mode 11 |
12 |
$1,300 |
$15,600 |
| Sub
System 2 |
Failure
Event 12 |
Failure
Mode 12 |
40 |
$300 |
$12,000 |
| Sub
System 1 |
Failure
Event 13 |
Failure
Mode 13 |
12 |
$1,000 |
$12,000 |
| Sub
System 2 |
Failure
Event 14 |
Failure
Mode 14 |
10 |
$1,000 |
$10,000 |
| Sub
System 1 |
Failure
Event 15 |
Failure
Mode 15 |
48 |
$200 |
$9,600 |
| Sub
System 3 |
Failure
Event 16 |
Failure
Mode 16 |
3 |
$2,000 |
$6,000 |
| Sub
System 2 |
Failure
Event 17 |
Failure
Mode 17 |
6 |
$1,000 |
$6,000 |
| Total
Global Loss |
|
|
|
|
$3,680,575 |
| Significant
Few Losses |
|
|
|
|
$2,944,460 |
|
|
In the example above, we have totaled the loss column and have a total
global loss of $3,680,575. The total loss column has been sorted in descending
order so that it is easy to identify the "Significant" failure events.
Our "Significant Few" loss figure that we are looking for is $2,944,460
($3,680,575 x .80). Now all we have to do is simply go to the top of the
total loss column and begin adding from top to bottom until we reach the
"Significant Few" loss figure of $2,944,460. It turns out that the first
2 failure events represent approximately 80% of our losses ($2,950,000
) or our "Significant Few" failure list. Now, instead of doing Root Cause
Failure Analysis on everything, we are only going to do it on the ones
in our "Significant Few" failure list. |
| Step
6 - Validate Your Results |
|
There are a few validations that should be performed to make sure that
our analysis is correct. You can use the gap analysis to make sure that
all of the events add up to +/- 10% of the gap. If it ends up being less,
you have probably left some important failure events off the listing. If
you have more than the gap then you probably have not summarized your results
well enough. There may be some redundancies in your list. |
|
A second validation that you can use is having a group of experienced people
from your facility review your findings. This will help ensure that you
are not too far off base. A third, and final, validation would be to use
your computerized data systems to see if the events closely match the data
in your maintenance management system. This will give you further confidence
in your analysis. Do not worry if your list varies from your maintenance
management system (MMS), since you will pick a lot of events that are never
even recorded in the work order system (i.e. those events that may take
only a few minutes to repair). |
| Step
7 - Issue a Report |
|
As with any analysis, it is important to communicate your findings to all
interested parties. Your report should include the following items: |
|
An explanation of the analysis technique. |
|
The failure definition that was utilized. |
|
The contact flow diagram that was utilized. |
|
The results displayed graphically as well as the supporting spreadsheet
lists. |
|
Recommendations of which failures are candidates for Root Cause Failure
Analysis. |
|
A listing of everyone involved in the analysis including all of the interviewees. |
|
Last but not least, make sure that you communicate the results of the analysis
back to the interviewees who participated, so that everyone can feel a
sense of accomplishment and ownership. |
|
In summary, FMEA is a fantastic tool for limiting your analysis work to
only those things that are of significant importance to the facility. You
cannot perform Root Cause Failure Analysis on everything. However, you
can use this tool to help narrow our focus to what is "most" important. |
RCI Offers the full
range of Reliability Consulting Services and Training Programs for Industry.
We conduct facilitations, reliability assessments, FMEA & Root Cause
Failure Analysis Training - Public & On-Site.
For more information
contact:
Reliability Center, Inc.
P.O. Box 1421
Hopewell, Virginia 23860
Phone: (804) 458-0645
Fax: (804) 452-2119
Website: http://www.reliability.com
|
|
Return
to Failure Analysis Reference Library Index
|