ABSTRACT
Condition monitoring is a powerful technique used to detect incipient failures in rotating machinery and other plant assets. By accurately diagnosing fault conditions at an early stage, the risk of failure can be reduced, and the cost of corrective action can be minimized. However, in the majority of cases, condition monitoring techniques are used to detect fault conditions that should not exist – that have arisen due to poor procurement, storage, work management, installation, maintenance, and operating practices. This paper explores the idea that, while condition monitoring is vitally important, more must be done in order to maximize plant utilization through an active defect elimination programme.
Introduction
For many years, techniques such as vibration analysis, infrared thermography, ultrasound, oil analysis etc. have been used to monitor rotating machinery (and other plant assets) in order to detect incipient fault conditions, diagnose the nature and severity of the fault condition, and provide a report that informs the maintenance department about the assets that require repair or replacement. While these techniques are not perfect, and they do require training in order to attain a sufficient level of competence, condition monitoring has been used to reduce the occurrence of unexpected failures in the majority of industries around the world.
Over the past thirty years the author has witnessed a large number of plants initiate a condition monitoring programme only to see it scaled back or cancelled altogether. In some cases they were poorly organized or the technicians were poorly trained. In other cases they were too successful (reducing the number of unexpected failures) and they failed to inform upper management of their success and on-going importance, and management saw an opportunity to save money by cutting condition monitoring staff.
Regardless of the outcome of the condition monitoring programme it is rare to find companies that have successfully gone beyond condition monitoring to eliminate the root causes of the defects. While it is not possible to totally eliminate failures it is possible to greatly reduce the source of defects that are ultimately detected by the condition monitoring group.
Condition based maintenance
When condition monitoring is used to determine the health of equipment, and the corrective maintenance required, it enables a plant to move away from pure ‘reactive’ maintenance. We will explore this point further.
Reactive maintenance
When equipment fails unexpectedly, thus requiring Maintenance to react immediately in order to bring the machine back on line, the plant is said to be conducting reactive maintenance, which is expensive for many reasons, including:
- The downtime can result in lost production which, in many industries, can never be recovered.
- The labour costs of the maintenance action are typically higher, especially if the failure occurs at night or on a weekend.
- The repair costs will be higher due to the severity of the fault and the likelihood of secondary damage.
In addition, reactive maintenance can result in an increase in safety and environmental incidents.
Preventive maintenance
When reactive maintenance is common an organization often turns to a ‘preventive maintenance’ strategy where maintenance work is based on age – typically elapsed time or running hours. This assumes that failures are age-related. In a landmark study (Nowlan and Heap, 1978) completed in 1978 (and repeated by others with similar results) research showed that just 4% of assets followed the classic, Type A, ‘bath tub’ failure pattern (infant mortality followed by age-related failure) as shown in Figure 1.
A total of 11% of assets demonstrated age-related failure patterns. That means that 89% of failures were random in nature, so preventive maintenance is a flawed approach for rotating machinery and other types of assets. As a result, some assets will fail before they are scheduled to be maintained, and many other assets will have perfectly good parts replaced during the maintenance shutdown. The cost of the shutdown, and the high likelihood that new faults will be introduced during the maintenance work, results in a substantial waste of resources.
Predictive maintenance
Condition based maintenance (CBM), also known as predictive maintenance (PdM), recognizes the fact that the majority of failures are not age-related. Utilizing CBM makes it possible to perform maintenance only when the assets need that maintenance.
Does condition monitoring improve reliability?
Many people believe that the condition based maintenance strategy results in improved reliability. If your definition of reliability is a measure of the frequency of unexpected failures then you could conclude that condition based maintenance does result in improved reliability. However, if your definition of reliability correctly includes a measure of how frequently equipment requires maintenance, then condition monitoring does not improve reliability. The condition monitoring technologies, for the most part, simply alert you to problems that should not exist.
For the remainder of this paper we will explore how to improve reliability through defect elimination, and discuss the role of the condition monitoring technicians and analysts in this process.
Defect elimination
It is necessary to understand the root cause of the reliability problems. It is possible to approach this in two ways.
- A company can perform a detailed Reliability Centred Maintenance (RCM) study in order to attempt to identify the failure modes, and then examine the root cause of those failures.
- Or you can learn from thousands of industrial plants and consider the common sources of defects that result in poor reliability.
The first option can take a very long time and consume substantial resources. Many such programmes have failed. Instead, we will take a closer look at the second alternative.
Common solutions to common problems
Many industrial organizations suffer from the same problems. If you seek to eliminate those root causes then you will go a long way to solving the reliability problems at the plant. The following is a summary of the most common sources of defects (also see Figure 2).
Design
Reliability problems typically begin with the design. The system must be designed for reliability and maintainability. The lifecycle cost must take a higher priority than the purchase cost.
Procurement
Once again, the reliability and maintainability must be prioritized over the purchase price. The right design makes it harder to buy ‘cheap’ unreliable equipment. But the incentive to purchase the option with the lowest up-front cost must be replaced with an incentive to purchase the items with the lowest lifecycle cost.
External overhaul and service work
The same desire for reliability and maintainability must exist when selecting companies that provide services such as balancing, motor re-winds, lubricants, bearings, etc. Specifications must be provided to vendors; for example, rotors should be balanced to G 1.0.
Transportation
Equipment can be damaged during transport. The vendor may produce equipment that meets requirements, but if it is damaged during transport then you will still experience poor reliability.
Acceptance testing
As part of the design, procurement, overhaul, and transportation process it is essential to implement acceptance testing. The vendor must be told that if the equipment does not meet minimum requirements then it will not be accepted. An acceptance testing specification may put limits on vibration, balance tolerance, resonance, oil cleanliness, and other parameters.
Storage and inventory management
A great deal can be said about the financial impact that efficient inventory management has on an organization. And it is essential to have an organized system so that maintenance planning and scheduling functions correctly. However, the way in which spares are stored can also introduce defects. Bearings can be damaged while they sit on shelves that vibrate. Lubricants and parts such as greases and bearings become contaminated if they are not stored correctly.
Planning and scheduling
It is not possible to improve reliability until the planning and scheduling function is working correctly. The condition monitoring group will feed information to this group. However, corrective work cannot be performed efficiently unless the job can be planned, kitted, and scheduled with the right people, parts and tools.
Repair, overhaul and installation
When assets are repaired and overhauled – and when new, overhauled, or repaired equipment is installed – it is essential that the installation is performed with precision. Bearings must be installed correctly. Machines must be precision aligned and balanced. Soft foot must be eliminated. Resonance must be eliminated. Bolts must be correctly tightened. If these things are not done then the life of the machine will be reduced. The extra time required to perform these tasks correctly will pay for itself many times in increased life, and in many cases, improved product quality, reduced energy consumption, and greater throughput.
Equipment operation
The way equipment is operated has a huge impact on its reliability. Operators and production supervisors and managers must appreciate the importance of operating equipment within specification. For example, if pumps are not operated at the best efficiency point (BEP), the life of the pump will be reduced due to cavitation (damage to the impellor), excessive bearing wear, seal damage, and more, as illustrated in Figure 3.
Lubrication
One of the most important steps you can take in order to extend the life of rotating machinery is to lubricate bearings and gears with the correct volume of contaminant-free lubricant that has the correct specification (e.g. viscosity, additive pack, etc.). Lubricants can be contaminated during transportation, storage, dispensing, and while inside the machine. Each source of contamination must be identified and eliminated.
Solving the plant’s unique reliability problems
If all of the foregoing issues are addressed then the plant will achieve significantly improved financial results and there will be fewer safety and environmental incidents. However there will be unique issues at your particular plant. It is recommended that you take a two-pronged approach to identifying and resolving them.
Plant walk-through
Performing a plant walk-through, and inviting mechanics and operators to point out the common problems experienced on the ‘plant floor’ will achieve two goals:
- By listening to the mechanics and operators you will learn about problems that an RCM team may never identify – and you will do it quickly and effectively.
- Taking action on the identified problems will generate a lot of goodwill between the reliability group and the ‘plant floor’ staff. Additional suggestions will be forthcoming. This process will accelerate the culture change process.
Perform an RCM
There will be processes and equipment that demonstrate poor reliability where it will be more difficult to identify the root cause. This is where it is recommended to carry out an RCM, a Failure Modes and Effects Analysis (FMEA), or a Root Cause Failure Analysis (RCFA). You may need to involve consultants, the OEM, or people from a sister plant to get to the bottom of the problems. It may be necessary to replace equipment, re-design a process, or install monitoring and/or control systems. Whichever way, these processes can be performed in parallel with the defect elimination programme.
Utilizing condition monitoring skills to improve reliability
The condition monitoring technicians and analysts can play a role beyond detecting fault conditions and advising the maintenance group.
Acceptance testing (QA/QC)
As described earlier, it is important that acceptance testing is performed on new and overhauled equipment. The condition monitoring group can help to define the standard and conduct the tests. A form of acceptance testing can be performed when new, repaired or overhauled equipment is installed. Vibration and other checks should be performed to ensure that the equipment is fit to provide long, reliable life.
Detecting conditions that will lead to reduced life
Too many vibration analysis programmes focus on the detection of bearing defects and pay less attention to their prevention. Conditions such as unbalance, misalignment, bent shaft, run-out, looseness, resonance, soft foot, cavitation, cocked bearing and others will result in excessive load and reduced life. In many condition monitoring programmes, if these conditions are detected, they are typically not reported until the condition appears to be severe. The same is true for a wide range of fault conditions (lubricant contamination, under-lubricated bearings, over-lubricated bearings, electrical unbalance, poor performance characteristics, etc.) detected via other technologies.
The fact is that all of these conditions result in reduced life. All of the rotating components, especially the bearings, will develop faults far more quickly when any of these conditions exist. Therefore, although the vibration amplitude may not indicate that the unbalance is severe, it must be understood that the life of the bearings will be reduced.
Root cause failure analysis (RCFA)
The condition monitoring team holds important evidence in their database that will explain why a machine failed. It may be necessary, for example, to look at data that was collected before the bearing defect was detected. The analyst may see signs of unbalance, misalignment or some other condition. Of course, it is always important that actions are taken to reduce the likelihood of that root cause occurring again.
Improving reliability – the missing ingradients
More needs to be said regarding the implementation of a reliability improvement programme. Everything discussed in this paper is common sense, and it has been tried in many plants. However, a large percentage of the reliability improvement initiatives have failed. Some progress may have been made, but many programmes either get started and then peter out, or they make more substantial progress that proves to be unsustainable. There are five important steps that are often missed in these programmes:
- They do not have commitment from senior management. Leadership from the top is essential.
- The plant does not have a clear understanding of asset criticality, and it does not have a maintenance and reliability strategy. In the author’s opinion it is often not necessary to perform a full RCM analysis. However, a more efficient process should be undertaken in order to form that strategy.
- They try to do too much before they have a proper plan in place. For example, a plant needs proper maintenance and production planning before RCM or CBM programmes should be embarked upon.
- They do not take change management issues into account. Any plant can change if small, strategic steps are taken. Larger steps will be resisted.
- Everyone within the plant contributes to the reliability problem, therefore everyone needs to receive training; from basic awareness training to detailed skill-building training. People who do not receive training will feel left out and will act as anchors on the programme.
Conclusions
Condition monitoring provides a great service to an organization, reducing unexpected break-downs and thus reducing maintenance costs, downtime, safety incidents and environmental incidents. But the condition monitoring group should also work to improve reliability by assisting in the acceptance testing process, identifying conditions that will lead to reduced reliability, and assisting in the root cause failure analysis process when equipment does eventually fail. But all of this work should be part of a properly planned and orchestrated reliability improvement programme that involves defect elimination and process optimization.
Reference
Nowlan F S and.Heap H F, Reliability-Centered Maintenance, Department of Defense, Washington, D.C., Report No. AD-A066579, 1978.
The author, Jason Tranter, CMRP, CRL, may be contacted at jason@mobiusinstitute.com