|
|
|
Reliability Magazine, November 1994 |
| It is amazing how quickly failure data can disappear after a sporadic failure has occurred. Following such an incident there is sometimes a lot of confusion. Most sporadic incidents occur on off-shifts or on weekends which can add to the confusion. (Statistically, for a continuous manufacturing organization, there are more night and weekend hours than normal operating hours.) Nobody is sure what to do or who to call to preserve failure data. All everyone knows is, "We better get this process or piece of equipment back on line." This paradigm drives the scene. Oftentimes in the chaotic activities that follow, a wealth of failure data can be destroyed or altered. Failed parts are marred, discarded or taken to a shop and forgotten; lubrications and other fluids are mopped up; valve and instrument positions are changed in preparation for startup; distributive control systems start to average live data; shifts change and operation and maintenance personnel are replaced with people who were not present at the time of failure. Along with the disappearance of all this data, go the chances of uncovering the true root causes of the incident. |
| I often ask RCFA students, "Would you expect a homicide detective to be able to solve a murder without any clues?" The response is typically, "Of course not." Then I ask them, "How can you expect to uncover the true root causes of an incident without any failure data?" Drawing the parallel between being a homicide detective and a failure analyst is an effective analogy. Consider that the investigating officer?s sole responsibility at the scene of a homicide is to "FREEZE" the scene and collect as much data as possible for later analysis. How does he or she "FREEZE" the scene? They: |
|
|
|
|
|
|
| As failure analysts we can learn a lot from homicide detectives. When a failure occurs, we should treat it as if a homicide has occurred and develop the appropriate strategies to "PRESERVE THE FAILURE DATA." If the failure analyst is to be successful, he or she must collect data from each of the 5 P?s are simply memory joggers that stand for: |
|
|
|
|
|
| Failure data from each of these categories must be collected to ensure a successful Root Cause Failure Investigation. Let?s briefly review each classification. |
| PARTS |
| Any failed components such as bearings, seals, shafts, valves, nozzles, lubricants, chemicals from spills, and gases from leaks. |
| POSITION |
| Where were things at the time of failure? Was the valve open or closed? What are the instrument?s settings? Position of parts? |
| PAPER |
| Operating conditions prior to, during, and after the incident (temperatures, pressures, levels, etc.), vibration monitoring results, equipment histories, operating procedures, manufacturing procedures and equipment specifications. |
| PEOPLE |
&nbs>
Transfer interrupted!ey, and what did they see, hear, feel or smell prior to, during, and after the incident? Was anything unusual being done around the time of failure? What was their perception of the sequence of events? |
| PARADIGMS |
| What are the cultural norms of the organization? What do people accept as a way of doing business, such as communication between units or shifts? What repetitive remarks were made during the interview that indicate beliefs, values or deep-seated convictions? |
| Data needs to be collected from each of the five P?s as quickly as possible following the failure. Obviously, the principle failure analyst cannot be at the manufacturing facility 24 hours a day; therefore, provisions have to be made to train several people on each shift to be failure data collectors. These people should function much like the fire brigade at a plant. Each member is assigned a certain task, and when called into action should perform that task until the failure analyst or analysts can arrive and direct the effort in more detail. |
| Getting to the failure data before it becomes corrupted is a key to effective Root Cause Failure Analysis. In addition, there is a definitive pecking order for collecting the 5 P?s because some data is more fragile than others. The most fragile data (data that becomes distorted and easiest) are Position and People data. The fact that Position data is fragile makes sense to most people. In order to "FREEZE" the failure scene, I often suggest to students that the failure response team be equipped which brightly colored boundary tape so that they can "rope off" the area, and video camera to make a photographic account of the scene. (Note: Use the video camera only after the area is cleared of flammable materials.) This photographic data can be invaluable as the failure analyst tries to understand what caused the failure. |
| The fact that People data is extremely fragile often surprises would-be investigators. Due to this perception, valuable failure data is lost. The problem is that as time passes following an incident, the raw sensory data that was taken in by people who were at or around the failure scene starts to become distorted. People start to evaluate what they heard, saw, smelled or felt and draw conclusions based upon this input. If something they sensed doesn?t fit their mental models of what the scene should contain, they may discount it and only inform the failure analysts of their conclusions about what happened as opposed to providing the failure analysts with the raw data. It is imperative that the people who were at the scene be debriefed prior to their leaving the facility. At the very least, they should fill out a generic failure data collection sheet documenting what they sensed at the time of failure and anything unusual that was being done at the time of the incident. Preferably, each person should spend 15-20 minutes being debriefed by a failure analyst. This provides the failure analyst with much more meaningful data because he gathers it firsthand. |
| Following Postion and People, Parts should be bagged and tagged and taken to a secured area for later analysis. Paper data (which includes electronic data which might be distorted or disappear) such as Distributive Control System data, should be gathered and stored for later review. Finally, Paradigm data is the least fragile of the 5 P?s. The fact of the matter is that they are deeply ingrained within the organization as people are interviewed and later through interaction with the failure analysis team. Failure analysts should always be on the lookout for restraining paradigms that may have contributed to the failure. These restraining paradigms are considered latent root causes. |
| Doing a good job of "PRESERVING FAILURE DATA" is a key step in conducting Root Cause Failure Analysis. Unfortunately, it is also the step that is usually second in priority to getting the process or the piece of equipment back on line as quickly as possible. There is no better way to avoid future incidents than to learn from past mistakes. That is why Root Cause Failure Analysis is such a powerful tool. However, effective Root Cause Failure Analysis cannot be conducted without DATA, and to get the DATA manufacturing facilities must do a good job of PRESERVING it. |
|
For more information contact: Reliability Center, Inc. P.O. Box 1421 Hopewell, Virginia 23860 Phone: (804) 458-0645 Fax: (804) 452-2119 Website: http://www.reliability.com |
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|