|
Wouldn't it
be great if you knew exactly what all of those costly chronic
(repetitive) failures were actually costing your facility? There
is a way. It is called Failure Modes & Effects Analysis or
FMEA for short. FMEA is a technique first utilized in the aerospace
industry to find problems with an aircraft before it ever left
the ground. In short, it is a way of looking into the future and
determining where potential failures might be located. This sounds
wonderful in theory but it takes a tremendous amount of time and
energy to do this. Sometimes as much as 100 man years.
Realizing
that we just do not have that much time or resources in our manufacturing
plants, we had to devise a way to make the process less cumbersome.
The modified approach makes one simple change to the process.
Instead of looking into the future, we are going to take a look
at our past failures. This changes the analysis time from 100
man years to an average of a few man weeks. This makes the process
practical to use in our facilities.
The "MODIFIED"
FMEA process is used to determine what failures are occurring,
in our facility, and what their impact and frequencies are. Think
of it this way:
| Failure
Event |
Failure
Mode |
Frequency |
Impact |
Total
Loss |
| Failure
of pump p-1002 |
Bearing
Failures |
12
failure / yr. |
$2,000
/ failure |
$24,000
/ year |
Sample
Failure Event
This
simplified table demonstrates the power of this technique.
Imagine performing the above calculation for every failure
event in your facility. I can assure you that the results
will be astounding. Of course, we do not want to work on every
failure event, so we need to determine which failure events
are the most significant. It just so happens that, typically,
20% or less of the failure events represent 80% of our losses.
This means that we do not have to do Root Cause Failure Analysis
(RCFA) on everything. Just the ones that are "most"
important.
Let's
take a look at the steps involved in performing a "modified"
FMEA:
| # |
Steps |
Description |
| 1 |
Perform
preparatory work |
Develop
a failure definition, contact flow diagram, gap analysis and
preliminary worksheet and interview schedule. |
| 2 |
Collect
data |
Interview
facility personnel to determine what the failures are, their
frequencies and their impacts. |
| 3 |
Summarize
& encode results |
Input
into an electronic spreadsheet and determine any redundancies. |
| 4 |
Calculate
loss |
Multiply
frequency X impact for every failure event in the analysis. |
| 5 |
Determine
"Significant Few" |
Determine
the 20% or less of the failures that result in 80% of the
losses. |
| 6 |
Validate
results |
Verify
that the results are valid. |
| 7 |
Issue
a report |
Communicate
results . |
Steps
to perform a FMEA
Step
1 - Perform Preparatory Work
Before beginning
any analysis, it is important to do some preliminary prep work.
This analysis is no different. The first thing that needs to be
accomplished is to select a system to analyze. For instance, we
may want to select a small subset of the facility, as opposed
to selecting the entire facility, as our system.
Once we know
what system we want to work on, we must DEFINE FAILURE. This may
seem trivial, but it is an essential step in the analysis. If
we were to ask 100 people to define failure, we would probably
get 100 different definitions. This would make our analysis far
to broad. We need to focus, not on everything, but on the things
that are most important to our business at that point in time.
For instance, if utilization is critical to our business today,
we should center our definition around utilization; if our priority
issue is quality than our definition should center around quality.
Let's
take a look at some examples of common failure definitions:
- Failure
is any loss that interrupts the continuity of production.
- Failure
is a loss of asset availability.
- Failure
is the unavailability of equipment.
- Failure
is a deviation from the status quo.
- Failure
is not meeting target expectations.
- Failure
is any secondary defect.
The definitions
above are some common industrial failure definitions. Please note
that there are no perfect failure definitions. For instance, "Failure
is any loss that interrupts the continuity of production"
has to include planned shutdowns, rate reductions for decreased
sales, etc. It would not pick up failures on equipment that is
spared since it does not interrupt the continuity of production.
A precise
failure definition is important since it focuses the facility
on the priority issues. It fosters good communications since everyone
knows what is important and it also provides a basis for a common
understanding of what the facility's needs are. Not to mention,
it is an essential step in the development of a "Significant
Few" failure list.
There are
few rules of thumb to consider when developing a failure definition.
It must be concise and easily understandable. If it is not, it
will leave too much room for interpretation. It should not have
to be interpreted. It must only address one topic. This is important
to maintain the focus of the analysis. If we include too many
topics our target becomes too large. Finally, it should be approved
and signed by someone in authority so that everyone in the organization
sees that it is a priority issue.
The next step
in the preparation process is to develop a contact flow diagram.
The contact flow diagram will allow you to break down your system
into smaller, more manageable subsystems. The rule for this diagram
is to map all of the process units that come into contact with
the product. This diagram, as well as the failure definition,
will be used when we begin to collect the data for the analysis.
The next thing
we need to accomplish before we begin our FMEA is to perform a
gap analysis. In other words, we need to uncover the disparity
between what we are producing now and what is our potential. This
will give us some indication as to the potential opportunity in
our facility. For instance, we produce widgets in our facility,
and we currently produce 150,000 per year. However, our potential
is 300,000 per year. Now we have a gap of 150,000 widgets per
year.
The final
step in the preparation stage is to design a preliminary interview
sheet and a schedule of people to interview to collect the data.
This will be the form to assist you in collecting the data from
your interviews.
To put this
all into perspective, the following is a checklist of items to
be covered prior to beginning a FMEA.
|
FMEA
Preparatory Steps
|
Completed
(Y/N)
|
| Define
the system to analyze |
|
| Define
failure |
|
| Draw
a contact diagram |
|
| Calculate
the gap |
|
| Develop
data worksheets |
|
| Develop
preliminary interview schedule |
|
FMEA
preparation checklist
Step
2 - Collect the Data
There
are a couple of ways of collecting the data for this analysis.
You can rely on your computer data systems (i.e. Maintenance
Management System) or you can go to the people who are closest
to the work and get their input. Although each has its advantages,
interviewing is probably the best since the information will
be coming straight from the source. If you have enough confidence
in your data systems, then it will be useful to use that information
to later validate your interviews.
At this point let's discuss how you would use interviews to
collect the data for your analysis. The process is really
quite simple. Let's look at a simple scenario ....
You
send out a message to all of the people that you would like
to interview. You state the date, time and a brief description
of the FMEA process for the interviewees. Note: it is important
to interview at least 2 or 3 people in each session so that
the interviewees can bounce ideas off of each other. Once
in the room, you will need to display a large copy of the
contact flow diagram and the failure definition so that they
are in clear view of the interviewees. Now you will begin
the process of asking your questions. There really is only
one initiating question that needs to be asked; "What
events or conditions satisfy the definition of failure within
each of the subsystems in the contact flow diagram?".
At this point the interviewees will begin to brainstorm all
of the failure events that they have experienced within each
of the subsystems. Once you have exhausted all of the possibilities,
ask the interviewees what the frequency and impact is, on
each of the failure events. The frequency should be based
on the number of occurrences per year. The interviewees, however,
will give you the information in the measurement units that
make most sense to them. For instance, they may say it happens
once per shift. It is your job to later translate that figure
into the number of occurrences per year. The impact should
include items such as manpower requirements, material costs
and any downtime that might have been experienced. This is
all there is to it!
When
you begin the interview process, it is best to interview the
people who are closest to the work (i.e. mechanics and operators).
You should also talk with supervisors and possibly managers
but certainly not to the extent that you would for mechanics
and operators.
As
a principal analyst, you will also need to be the principal
interviewer. This means that you have to explain the process
to the interviewees, ask the questions and capture the information
on your log sheet. This can be a difficult job. If it is feasible,
it would be advantageous to have an associate interviewer
to assist you by recording the information on the log sheets.
This allows you to focus on the questions and the interviewees.
The
job of interviewing can be quite an experience, particularly
if you do not have a lot of experience in conducting them.
It tends to be more of an art form than a science. Below is
a listing of some tips that may be useful when you begin to
conduct your FMEA interviews.
Interview
Tips
-
Be very
careful to ask the exact same lead questions to each of
the interviewees. This will eliminate the possibility of
having different answers depending on the interpretation
of the question. Later you can expand on the questions,
if further clarification is necessary.
-
Make
sure that the participants know what a FMEA is as well as
the purpose and structure of the interviews. If you are
not careful, the process may begin to look more like an
interrogation than an interview to the interviewees. You
want the interviewees to be comfortable.
-
Allow
the interviewees to see what you are writing. This will
set them at ease since they can see that the information
they are providing is being recorded correctly. NEVER use
a tape recorder in a FMEA session because it tends to make
people uncomfortable and less likely to share information.
-
Never
argue with an interviewee. Even if you do not agree with
the person, it is best to accept what they are saying at
face value and double check it with the information from
other interviews. The minute you become argumentative, it
reduces the amount of information that you can get from
that person.
-
Always
be aware of interviewees names. There is nothing sweeter
to a persons ears than the sound of their own name. If you
have trouble remembering, simply write the names down in
front of you so that you can always refer to them.
-
It is
important to develop a strategy to draw out quiet participants.
There are many quiet people in our workforce who have a
wealth of data to share but are not comfortable sharing
it with others. We have to make sure that we draw out these
quiet interviewees in a gentle and inquiring manner.
-
Be aware
of the body language of interviewees. There is an entire
science behind body language. It is not important that you
become an expert in this area. However, it is important
to know that a substantial portion of human communication
is through body language. Let the body language talk to
you.
-
In any
set of interviews, there will be a number of people who
are able to contribute more to the process than the others.
It is important to make a note of the extraordinary contributors
so that they can assist you later in the analysis. They
will be extremely helpful if you need additional information,
for validating your finished FMEA, as well as assisting
you when you begin your actual Root Cause Failure Analysis
(RCFA).
-
Remember
to use your failure definition and block diagram to keep
interviewees on track if they begin to wander off of the
subject.
Step
3 - Summarize & Encode
At
this point we have conducted a series of separate interviews
and we need to look through our data to reduce redundant entries.
Then we convert frequencies from the interviewees measurement
units into occurrences per year (i.e. 2 per month would translate
into 24 times per year).
The
easiest way to summarize this information is to input the
information into an electronic spreadsheet. There are many
products on the market that you could use. Microsoft Excel,
Lotus 123 or Borland's Quattro Pro are just a few of the more
popular spreadsheet programs you should consider. Once the
information is input, you can use your spreadsheet to sort
the raw data first by sub-system and then by failure event.
This will give you a closer look at the events that are redundant.
As far as making the conversions to numbers of times per year,
your more advanced spreadsheets can do many of these tasks
for you. Consult your users manual for creating lookup tables.
The
following example should give you an idea of what is meant
by summarizing your data:
|
Sub-System
|
Failure
Event
|
Failure
Mode
|
Frequency
|
Impact
|
| Recovery |
Recirculation
Pump Fails |
Bearing
Fails |
1
per month |
1
shift |
| Recovery |
Recirculation
Pump Fails |
Oil
Contamination |
1
per 2 months |
1
day |
| Recovery |
Recirculation
Pump Fails |
Bearing
Locks Up |
1
per month |
12
hours |
| Recovery |
Recirculation
Pump Fails |
Shaft
Fractures |
1
per year |
1
day |
This
data suggests that the first three items are the same since
they each impact the bearings and have fairly consistent frequencies
and impacts. The last item is also related to bearings but
went one step beyond the others since we not only lost the
bearings but we also suffered a fractured shaft. This would
indicate a separate mode of failure. A summarization of this
data might look something like this:
|
Sub-System
|
Failure
Event
|
Failure
Mode
|
Frequency
|
Impact
|
| Recovery |
Recirculation
Pump Fails |
Bearing
Problems |
12
per year |
12
hours |
| Recovery |
Recirculation
Pump Fails |
Shaft
Fractures |
1
per year |
1
day |
Completed
FMEA failure event summarization
Step
4 - Calculate Loss
At
this point, we want to do a simple calculation to generate
our total loss for each event in the analysis. The calculation
is as follows:
Frequency
x Loss Per Occurrence(Impact) = Total Loss Per Year
Let's
look at an example of just how to apply this:
|
Sub-System
|
Failure
Event
|
Failure
Mode
|
Frequency
|
Impact
|
Total
Loss (hrs./yr.)
|
| Recovery
|
Recirculation
Pump Fails |
Bearing
Fails |
12
per year |
12
lost hrs. |
144
lost hrs. of prod. |
| Compressor
|
Seal
Failure |
Blown
Seals |
4
per year |
24
lost hrs. |
96
lost hrs. of prod. |
| Mixers
|
Filter
Switches |
Filters
Clogged |
26
per year |
2
lost hrs. |
52
lost hrs. of prod. |
| Vent
Condensers |
Pressure
Gauge Leaks |
Leaks
Due To Corrosion |
.33
per year |
24
lost hrs. |
8
lost hrs. of prod. |
Completed
Loss Calculation Example
What
we need to do is multiply the frequency times the impact
to get our total loss. In the first event, we have a failure
occurring once per month or 12 times per year. We lose
a total of 12 hours production every time this occurs.
So we simply multiply 12 occurrences times 12 hours of
lost production to get a total loss of 144 hours per year.
If you decide to use an electronic spreadsheet all of
these calculations can be performed automatically by multiplying
the frequency and impact columns. Refer to the section
in your software's user manual that concerns multiplying
columns.
It
is important to make sure that total loss is communicated
in the most appropriate units. For example, we used hours
of downtime per year in the example above. Hours of downtime
might not mean much to some people. So it might be more
advantageous to convert that number from hours per year
to dollars per year since everyone can relate to dollars.
In other words, use the units that will get the most attention
from everyone involved.
Step
5 - Determining the "Significant Few"
The
concept of the "Significant Few" is derived
from a famous Italian Economist name Vilfredo Pareto.
Pareto said that "In any set or collection of objects,
ideas, people and events, a FEW within the sets or collections
are MORE SIGNIFICANT than the remaining majority".
Consider these examples:
80%
of a bank's assets are representative of 20% or less of
its customers.
80%
of the care given in a hospital is received by 20% or
less of its patients.
Well
it is no different in industry. 80% of the losses in a
manufacturing facility are represented by 20% or less
of its failure events. This means that we only have to
perform root cause failure analysis on 20% or less of
our failure events to reduce or eliminate 80% of our facilities
losses. Now that is significant!!!
In
order to determine the significant few you must perform
a few simple steps:
-
Total
all of the failure events in the analysis to create a global
total loss.
-
Sort
the total column in descending order (i.e. highest to lowest)
-
Multiply
the global total loss column by 80% or .80. This will give
you the "Significant Few" loss figure that you
will need to determine what the "Significant Few"
failures are in your facility.
-
Go to
the top of the total loss column and begin adding the top
events from top to bottom. When the sum of these losses
is equal to or greater than the "Significant Few"
loss figure than those events are your "Significant
Few" failure events.
Let's take
a look at how this applies to our discussion on FMEA.
| Sub
System |
Failure
Event |
Failure
Mode |
Freq. |
Impact |
Total
Loss |
| Sub
System 3 |
Failure Event
1 |
Failure
Mode 1 |
2000 |
$850
|
$1,700,000
|
| Sub
System 2 |
Failure
Event 2 |
Failure
Mode 2 |
1000
|
$1,250 |
$1,250,000
|
| Sub
System 4 |
Failure
Event 3 |
Failure
Mode 3 |
4 |
$75,000
|
$300,000
|
| Sub
System 2 |
Failure
Event 4 |
Failure
Mode 4 |
18
|
$6,000
|
$108,000
|
| Sub
System 3 |
Failure
Event 5 |
Failure
Mode 5 |
6
|
$12,000
|
$72,000
|
| Sub
System 2 |
Failure
Event 6 |
Failure
Mode 6 |
52
|
$1,000
|
$52,000
|
| Sub
System 3 |
Failure
Event 7 |
Failure
Mode 7 |
80
|
$500
|
$40,000
|
| Sub
System 3 |
Failure
Event 8 |
Failure
Mode 8 |
12
|
$3,000
|
$36,000
|
| Sub
System 4 |
Failure
Event 9 |
Failure
Mode 9 |
365
|
$75
|
$27,375
|
| Sub
System 3 |
Failure
Event 10 |
Failure
Mode 10 |
24
|
$1,000
|
$24,000
|
| Sub
System 1 |
Failure
Event 11 |
Failure
Mode 11 |
12
|
$1,300
|
$15,600
|
| Sub
System 2 |
Failure
Event 12 |
Failure
Mode 12 |
40
|
$300
|
$12,000
|
| Sub
System 1 |
Failure
Event 13 |
Failure
Mode 13 |
12
|
$1,000
|
$12,000
|
| Sub
System 2 |
Failure
Event 14 |
Failure
Mode 14 |
10
|
$1,000
|
$10,000
|
| Sub
System 1 |
Failure
Event 15 |
Failure
Mode 15 |
48
|
$200
|
$9,600
|
| Sub
System 3 |
Failure
Event 16 |
Failure
Mode 16 |
3
|
$2,000
|
$6,000
|
| Sub
System 2 |
Failure
Event 17 |
Failure
Mode 17 |
6 |
$1,000
|
$6,000
|
| Total
Global Loss |
|
|
|
|
$3,680,575
|
| Significant
Few Losses |
|
|
|
|
$2,944,460
|
In
the example above, we have totaled the loss column
and have a total global loss of $3,680,575. The total
loss column has been sorted in descending order so
that it is easy to identify the "Significant"
failure events. Our "Significant Few" loss
figure that we are looking for is $2,944,460 ($3,680,575
x .80). Now all we have to do is simply go to the
top of the total loss column and begin adding from
top to bottom until we reach the "Significant
Few" loss figure of $2,944,460. It turns out
that the first 2 failure events represent approximately
80% of our losses ($2,950,000 ) or our "Significant
Few" failure list. Now, instead of doing Root
Cause Failure Analysis on everything, we are only
going to do it on the ones in our "Significant
Few" failure list.
Step
6 - Validate Your Results
There
are a few validations that should be performed to
make sure that our analysis is correct. You can use
the gap analysis to make sure that all of the events
add up to +/- 10% of the gap. If it ends up being
less, you have probably left some important failure
events off the listing. If you have more than the
gap then you probably have not summarized your results
well enough. There may be some redundancies in your
list.
A
second validation that you can use is having a group
of experienced people from your facility review your
findings. This will help ensure that you are not too
far off base. A third, and final, validation would
be to use your computerized data systems to see if
the events closely match the data in your maintenance
management system. This will give you further confidence
in your analysis. Do not worry if your list varies
from your maintenance management system (MMS), since
you will pick a lot of events that are never even
recorded in the work order system (i.e. those events
that may take only a few minutes to repair).
Step
7 - Issue a Report
As
with any analysis, it is important to communicate
your findings to all interested parties. Your report
should include the following items:
An
explanation of the analysis technique.
The
failure definition that was utilized.
The
contact flow diagram that was utilized.
The
results displayed graphically as well as the supporting
spreadsheet lists.
Recommendations
of which failures are candidates for Root Cause Failure
Analysis.
A
listing of everyone involved in the analysis including
all of the interviewees.
Last
but not least, make sure that you communicate the
results of the analysis back to the interviewees who
participated, so that everyone can feel a sense of
accomplishment and ownership.
In
summary, FMEA is a fantastic tool for limiting your
analysis work to only those things that are of significant
importance to the facility. You cannot perform Root
Cause Failure Analysis on everything. However, you
can use this tool to help narrow our focus to what
is "most" important.
|