How to Decrease Equipment Failures by Following a Few Simple Tips

Duration: 49 minutes
Ricky Smith
Published on November 8, 2021
Hosted By
Ricky Smith
Maintenance Expert in Residence

In this webinar, Ricky Smith, CRMP will discuss tips to decrease equipment failures. Join the session to learn about reactive maintenance, primary causes of equipment failures, and tips to reduce equipment failures.

Brought to you by The Maintenance Community Slack Group. Join here for more exclusive events.


0:00:00.0 Speaker 1: Wow, we got people from the Congo, all the way to Australia with us tonight. Unbelievable. So how do you decrease equipment failures by following a few simple steps? I've contemplated this thought for quite a while, trying to think about how can I put it in simple terms so people can go back and do something different, and that's what I'm trying to do today. So how do you know you're reactive maintenance? I guess it's a good question, how do you know if you're reactive? And you hear things like, "We need to write a new PM. The equipment keeps failing, we need to write a new PM. Who worked on the equipment last? Nobody knows, it's a mystery. I think it was Jimmy. We never have parts in the storeroom. We never have parts in the store. We never had them to begin with. Or maybe that we did but they were gone last week and we had new ones in and then they left pretty quick." We know we're reactive maintenance when we do that.

0:01:03.0 Speaker 1: When we do not have time for training, we've got too many problems. I hear that too often. I can't send my people to training, we've got too many problems. Really? Maybe training might help you solve some of those problems. We have a problem with technician morale. I was in a plant recently, technician morale was so bad, and here I was doing training with the technicians, so I had to turn it around to give them a positive look at it. Technicians are really the foundation for our reliability program. We have to keep their morale up. Not by giving them more money or anything else, just thanking them for the job they do.

0:01:44.9 S1: We cannot afford to hire a maintenance planner. Heard that one last week, too. Think about this, I've got maintenance technicians that are running around like crazy fighting fires and I have no planner, so I... You'll never get out of reactive maintenance until you have a maintenance planner. Why don't you take your best maintenance technician and make him a planner? I've seen places do that and it really turns them around fast. Our storeroom never has parts. So we never have parts in a storeroom but we keep going back again and again. We just don't see it. And we do not need a maintenance planner, we just need our people to step up. I don't know how many times I've heard that. What does that mean, step up? Maybe this lack of morale has something to do with reactive maintenance. But I guess when we have a failure, we have to know what constitutes a failure.

0:02:39.4 S1: So equipment failure refers to any event in which an equipment, any equipment, cannot accomplish its intended purpose or task. It also means that... That may also mean that the equipment stopped working, it was not performing as desired or it's not meeting target expectations. Two types of equipment failures I see: Total functional failure and partial functional failure. Okay, I see we've got a couple of reliability engineers on. What would you consider a type of failure? Would you consider a total functional failure as what your plants normally look at or do they also capture partial functional failures? 


0:03:32.0 S1: I know you guys are sitting quiet in the audience, I know you're there. Put it in the chat, if you don't mind.


0:03:43.3 Speaker 2: Can you restate the question, Ricky? 

0:03:46.2 S1: The question is, how do you measure failure? So you measure them by total functional failure or partial functional failure? How do you measure it? Got one said we just start measuring failures, capturing failures. Yeah, yeah. Both. Okay, good Jeremy. Thank you. Kevin, total. Yeah. So we have different ways of looking at it but both depend, how do we detect the failure? Absolutely, absolutely. Okay good, thank you.

0:04:29.9 S1: So what are primary causes of equipment failure? Let's get to the root of it. A lot equipment failures have to do with maintenance being reactive. It's like an alcoholic, you gotta admit that you drink alcohol. So it requires admitting we have a problem. So if we're reactive, if we're gonna stop the failures, we have to get out of the reactivity we're in right now. Improper operation. Most failures, I'm talking about partial functional failures and total functional failure, I think, many times, are created by operators, operator error, and I don't understand it 'cause their job's not that difficult.

0:05:10.8 S1: Failure to perform corrective or preventive maintenance to specifications. When I talk about performing corrective or preventive maintenance to specifications, I get the deer in the headlight look with a lot of people. It's like specifications? And the reason they give me that is 'cause they don't even know what the specifications are. So when they install a pump, they install a v-belt, install a electrical device, they don't even know what good looks like and that's sad. It really is.

0:05:40.0 S1: Doing too much preventive maintenance. I don't know how many places I've seen, they've got so many PMs, they can't get to 'em, and when you really dive down into it, you find that every time you have... They've had in the past, over the last 20, 30 years, they've had serious problems. Somebody writes a new PM and it adds into the system, and it doesn't work. You can't do that. We need preventive maintenance but we gotta understand what is preventive maintenance for and how do we do it effectively and correctly. That's a big deal.

0:06:12.0 S1: Ineffective or no predictive maintenance application to critical assets. I had one company today on some training with me and they don't do any predictive maintenance but all the rest of them were doing a lot of predictive maintenance. Just do it on critical assets if you're not doing it at all. Just start there. You don't have to do it in-house. Hire a contractor outside to start with. I don't want it in-house until I can prove the concept, then I wanna bring it in-house. We do it ourselves 'cause predictive maintenance is a great tool to mitigate failures over the long run. Maintenance planning and the scheduling is dysfunctional. Wrench time is low. Absolutely, no doubt. It's all a reason why we have it. If you don't have planning and scheduling...

0:07:02.1 S1: Here's the deal. So when you're talking about no planning and scheduling, that means you're reactive. You're just going out when somebody gives a trouble call, you send someone out and they go out and try to fix the problem, whatever that means. So they patch it, a few days later that same problem comes up again. That's not what we want. What we want, through preventive maintenance and predictive maintenance, the information that comes from them, we plan it, we schedule it, we execute it to specifications. And we have specifications on the procedures that we follow, and our wrench time should go up, but in order to do that, to make it effective, we need to have a maintenance planner. Take your best maintenance technician and make him a planner, one that can use a computer.

0:07:54.3 S1: No scorecard dashboards, I'm sure everyone knows they're scoring a specific position. We gotta know how we're gonna play this game if any of you watch football or soccer or rugby, whatever, with no scoreboard, the players won't play. That's the same thing in maintenance. We need to have scoreboards. We need to have dashboards. We need to have those scorecards up for people to see them. How well are we doing, good or bad? If it's bad then we need to talk about it. When I was a maintenance manager, one of the things I did, I had... Every morning before we started, all the crews, we looked at that dashboard. We knew our score in the game. We knew where we needed to focus on and it didn't... It wasn't impacting everybody right away, but ultimately everybody felt like they were part of that team support, making sure we got to where we needed to go to.

0:08:43.8 S1: The roles and responsibilities aren't clearly defined. How do you know? When you hear statements like, "That's not my job," or, "Hey, don't call me when you got a problem with that." That's not acceptable. Roles and responsibility... I'd be willing to bet you reliability engineers here, you're used for everything but reliability engineers in a lot of cases. I know Honda, I know you guys do it pretty dadgum good but a lot of them, they have reliability engineers, but they just get sucked into this reactivity and we gotta stop it. Wrench time's low. Hands-on tool time is low. So what is wrench time? It's the percentage of time a maintenance person has their hands on a tool, and it's called wrench time. So we look at world-class wrench times, 55% to 65%. It can't be 100%. Some people say, "My guy's a 100%." Yeah, but that's not what I'm talking about. Typical wrench time's between 15% and 25%. The worst in class wrench time is 5% to 10%.

0:09:49.7 S1: That's where a lot of companies are right there. You have to admit you got a problem in order to solve the problem. I'd say with world-class wrench time, what I'm talking about... And somebody said, 100%, think about breaks, think about travel time to and from job site, think about looking for parts, looking for tools. There's a lot of things that take away from wrench time, and so if we wanna get better, we gotta know where we are. Wrench time is a big deal, it's a big deal.

0:10:20.3 S1: Root causes of maintenance being in a reactive state. Lack of a proactive maintenance knowledge or experience. We don't know what we don't know. We may have smart people that know some of it, but the whole organisation needs to understand proactive maintenance, they need to understand it. That's the whole reason I created Toolbox Talks, to help people understand, to gain the knowledge so they can make changes then. Toolbox Talk is just a single-point lesson, in case you don't know what I'm talking about. It's the single-point lessons that I've created over the years. I just get frustrated when I go to a place and I see a problem, I create a single-point lesson and it's there for one reason only and that's to give people knowledge. So hopefully they could see hope. Lack of knowledgeable leadership and maintenance best practices. It's not just overall, but our lack of knowledge to leadership. In this one place I was at not too long ago, the only one that understood proactive maintenance was the president of the company. I couldn't get it. How does he know it and nobody else knows it? It must be a secret. That must be what it was. That's right.

0:11:30.0 S1: So production and maintenance. I want leadership on both sides to understand what maintenance best practices are. The environment is, and the organisation is reactive, everybody knows it, however, no one knows how to move out of it. And it's like any journey, you gotta take one step, but you need to take the step in the right direction. You better have a map to saying, Where I'm at right now. I gotta know where I am on earth, take that map, look at it, take my GPS out, which is my CMMS and start navigating through it. Everyone knows maintenance cost is out of control, but they're not certain what to do about it. How do you solve that? You can't cut your way to it. A lot of companies, they just start cutting costs, cutting costs doesn't do any good. Why don't we reallocate the money to the right place? That's what we need to do. The metric maintenance cost is the percent of replacement asset value, this came from Alcoa Mt Holly, where I worked many years ago. Maintenance cost is a replacement asset value, typical, 3.5% to 9%, world class, 2% to 3%. It's all about the money, it isn't by driving cost down, it's by doing the right thing. It's by doing things the right way, that's when you bring costs down. And this is all costs. That's the labor, materials, contract, maintenance, everything. The maintenance materials cost of replacement asset value. So you notice typical of 1% to 3.5%, look at this for world class, 0.25% to 0.75%.

0:13:15.3 S1: Wow. So replacement asset value, what SMRP says, is the amount of money it would take to replace the facility brand new. Everything like new. I'm just being generic about it. So if I were to destroy the plant and build a brand new one, right on top of the same site, what would it cost? So what would that replacement cost be in today's dollars? Well, it's not realistic. The number I'd go for with companies that wanna know where they are, set a baseline of where they are, is we go after what is the insured value? We just start with that 'cause that's a good number. We're not trying to figure out how we're better than somebody else, all we wanna do is just set a baseline, and then from that baseline determine how well we're doing and how are we getting better or not? So number two cause, improper operation, operator error. Some of the root causes of it, error, lack of leadership, lack of discipline. Everybody's doing their own thing. Lack of training. Believe it not, operators, many times, they're hired off the street, they're put into a role, but they have not been trained. It doesn't make sense.

0:14:28.3 S1: The best that you can do, they need to go do some written training, some classroom training, and then they need to go through training with coaching, somebody coaching them, that should be their supervisor. New operator care program integration with the maintenance PM program. Why don't we integrate? Why don't we work together? Why is a maintenance person going out looking at something, the operator's there anyway? I like putting targets up. So a technician, like I used to do with my technicians, they had... Every technician had a route every day, in the morning, first thing, for the first hour, and they walked through certain areas of the plant and we had numbers up, one, two, three. You look at your chart on you clipboard, and you look at it, bump, bump, bump, and it tells you what to write down. What do you see? And it's very simple. Lack of a scorecard as seen inaccurate. So I just put together an operator care leading and lagging KPIs, what are the number of abnormalities identified? What's the number of less than six minute stops? Sounds like OE. Percentage of timeline running to rate. What's the percent of first-pass quality? Then operator care, what's their backlog? Do they have a backlog? 

0:15:41.4 S1: And in some cases, there's some companies that do have backlog for operators to do things. And then, so they have total backlog, ready backlog, ready to schedule, and that's... What they do is they send that... So this is categorised in maintenance to have it ready to schedule and all that comes down to OE. So failure to perform corrective and preventative maintenance to specifications, number three. Root causes of failure, a failure to perform corrective, preventative maintenance to specifications, lack of knowledge of what best practices are and how to manage as a proactive organization, not a reactive one. Lack of resources. We don't have enough people. How do you get enough people in a maintenance organisation? Some of you guys... Somebody tell me that. Chat it in. If we don't have enough maintenance people, how do we get enough maintenance people? Or how do we increase wrench time? Both the same question. Never have enough time. Yeah. How do you have enough people? One is maintenance planning and scheduling, start allocating your resources appropriately.

0:17:13.2 S1: I always had my people set up that we knew and they knew who was gonna be pulled off the job first. We had react... Everybody was scheduled by day, by hour, but we had certain people, had certain people assigned, "Okay Jimmy, you're the first one. You get the first call, you take care of it. Bobby, you got the second one. Sam, you got the third one," and so on. But all I want them to know is know that, but that doesn't mean people come to them, they come to me first, and then I'll allocate the people for resources. So lack of resources, money, the right people, leadership from top to bottom. We gotta have leadership, not just leadership at the top, but we need to have leaders as technicians too. Lack of training dollars. It's amazing to me how much money we spend... I spoke to this president of this company, we were talking about the money. And he says, "You know Ricky, I got plenty of money for training, the problem is I gotta spend it wisely because I'm not gonna spend money on training and not get something out of it." And that's a fact. That's a fact.

0:18:21.1 S1: Don't send someone to training that clearly expectations are set forth, so they understand what those are. Doing too much preventative maintenance, mentioned that earlier. Root causes of why you're not... Why you're doing too much preventative maintenance. What are they? No training in preventative maintenance. All knowledge has been passed down over the years, so tribal knowledge. It's just been passed down. So over time, we've learned from the past, but really what we've done is we've watered things down from what they were probably doing it great at one time. And if you ever been in a company that say, "Yeah, we used to do that. Yeah, we used to do that. Yeah, we used to do that." Yeah, I know, we need to get back to that. Not knowing how to perform preventative maintenance as a controlled experiment. I call preventative maintenance should be performed and prevented as a controlled experiment. So what that means is you put equipment in a control state, maintainable state, and do certain things to it to give you the desired outcome. Whether that's lubrication, whether that's time-based replacements, whatever that is, lubrication. We do certain things to it to give us a certain outcome. Equipment's not in a maintainable condition. If the equipment's not maintainable...

0:19:33.0 S1: You gotta restore it. That doesn't mean you gotta full-blown buy a new one, or rebuild it totally from top to bottom, but we need to get it functioning. Remember those functional failures? We need to mitigate or eliminate those functional failures that happen with equipment. We're measuring the wrong thing. PM compliance. Okay, I got that, but that's more to me, that just says, "We did it," that's all, but my question is, "Did we do it the right way? Are we finding something from it?" Used to be, when I worked at Alco one of the things we said, we... Our manager expected, John Day, he expected to see a certain amount of work coming from preventive maintenance. If he didn't see it, then he's gonna call somebody into his office and you never... I only went to his office one time, no one wants to go to his office, 'cause it doesn't take long for the discussion to end. But why don't we find stuff with it? We need to know that.

0:20:27.8 S1: But to me, if I wanna know if preventive maintenance is effective or not, I look at emergency labour hours, that's what I look at. And you notice this one, this chart I've got PM labour hours are steady, emergency labour hours decreasing. That's what I wanna see. That's what I wanna see. Then no one understands the PF curve, where does preventive maintenance fit on the PF curve? So what is the PF curve? It's a graphical representation how something fails, it's fairly simple. Preventive maintenance starts way before point P, it starts way back. If we don't want it to get to the functional failure, partial or total functional failure, then we need to do preventative maintenance correctly. So ineffective or no predictive maintenance application, what is that? So no one understands how to detect failure modes early enough? Maybe we're not... Maybe one of the things we're not using predictive maintenance right. Maybe we're only using it to satisfy the insurance company regulations. What do they require? 

0:21:33.2 S1: And when we do that, we focus on the insurance company without regard to how much money we lose on a daily basis by not using predictive maintenance. We need to make predictive maintenance a integral part of our strategy to maintain the site. No one has ever heard of the PF curve. And I know most of the people on today, I know you know what the PF curve is. I think it... To me, it needs to be a thing that you have a talk with your people about what it is and how it works in organisations. But if we can detect defect or abnormality early enough, we can plan it and schedule it before it fails. So if we have a defect on an outboard bearing on a large horsepower motor, and it's a critical motor, I'm gonna go ahead and plan that job. I may not schedule it yet, but I'm gonna plan it 'cause I know it's gonna fail 'cause bearings fail randomly.

0:22:26.4 S1: So what I'm gonna do is I'm gonna identify that fault, and I'm gonna go ahead and plan that repair, and then when I have... If I can, I'm gonna go ahead and schedule it. If I don't schedule it, here's the worst thing that can happen, if I've already planned it and I got the spare parts, I got the procedure on the spare parts, how to do it, then the worse thing that can happen if we do have a premature failure, no matter when it is, we got the procedure, we got the spare parts and they're secured. So all we gotta do is pull them out, wherever we've got them locked up at and we can go out and do the work to specifications. So when production says, "How long is it gonna take?" I probably have a good idea how long it's gonna take. So maintenance planning and scheduling is dysfunctional. Root cause of dysfunctional maintenance planning and scheduling.

0:23:15.8 S1: Maintenance leadership has never been trained at maintenance planning and scheduling. When you talk to maintenance leadership, a lot of times they say, "Yeah, I know that stuff. Yeah, I know that." And they don't really understand it 'cause they never seen true proactive planning and scheduling. No one knows how bad the organisation's wrench time is. If we're low, we probably need to get it up higher and sometimes any direction we go we might get something, but it's only luck, we don't want luck, we want facts and we wanna have direction. Being as planners are trained, however no one's allowed to plan a schedule the right way. I've seen planners that go to training and come back and they're not allowed to plan a schedule. They still, "I know what you learned in the class. I know what you learned in the class, but we don't do it that way here." "But then why'd you send me to training? Don't build expectations up if you gonna send me to training." Maintenance is reactive. The maintenance planner's chasing parts.

0:24:11.2 S1: My gosh. [chuckle] One of the things I used to... I always say, "How do you know if you got a proactive planner? Call them on the phone and say, 'Hey, I need a part right now.' The next thing you should hear, if he's a proactive planner, is the dial tone 'cause evidently you called the wrong number." So some of the guiding principles that I see on this, maintenance planners focused on future work only. All today's issues handled by the maintenance supervisor lead person. All work scheduled which require parts and materials, are kitted in a secure area. All planned and scheduled work is tracked through status codes. See the status codes below, RTS, ready to scale. The parts are kitted, they're in a locked secured area and it's ready to be scheduled.

0:24:55.8 S1: So the worst thing that could happen, if we have a failure and it's not on the schedule yet to do that repair, we already have the parts, we have the procedures, we don't have to worry about it. Another one, waiting on parts, we need to know that. Awaiting production, they need to know that. We go into a production meeting, they need to know how much work is backed up, how many labour hours of maintenance work is backed up. All work scheduled one week in advance, typically scheduling meetings on a Thursday, for the following week where production, maintenance and others that are required, it could be contractors, safety, or whatever. I want, again, leading and lagging KPI for planning, scheduling, work execution process. If we don't measure it, we can't manage it, and that causes a lot of problems.

0:25:41.3 S1: Next one, no dashboards or scorecards to ensure everyone knows the score and their position. Root cause of no scorecards or dashboard, leaders don't have time to identify how to create or make a scorecard or dashboard. No one knows how to create one. If leadership don't know it, no one else knows how to create it, we've got a problem. Maintenance is data rich but no one knows how to assimilate and disseminate the data to help everyone know the score and the position. So we need to assimilate the data, disseminate the data, and then share it with each position. So here's a scorecard, PM compliance 100%, 40% plan work, schedule compliance's 100%, work order closed out accurately, 47%, rework, 78% and maintenance cost 12%. Now, to me, PM compliance is 100%, schedule compliance is 100% but a work order's not closed out accurately and we got a lot of rework and our maintenance cost is going up. I think we probably need to do a full-blown root cause analysis on that for sure.

0:26:57.3 S1: Roles and possibilities aren't defined. Root cause of unclear roles and responsibilities. We gotta have... How do we manage what we're doing? Reactive plant culture, our roles and responsibilities are just, we come in and we do whatever we do and the roles and responsibilities, they just happen, whatever that is. But everyday, roles and responsibilities could change. I'd like to have a RACI chart across the top in position, you see? We've got the positions across the top like plant manager through CMMS administrator. Down the left side, we've got the tasks, CMMS management. On each line, you can only have one A. You can have more than one R, more than one C, more than one I, but A is the buck stops here. So here, if CMMS is not managed properly, then the CMMS administrator is where the buck stops 'cause they control it.

0:28:00.1 S1: Okay, let's go all the way down to maintenance rework. Who's accountable? The maintenance manager. Let's see. I'm sorry, I didn't put any reliability engineers on this one but you see it. What this does, if you bring all the stakeholders in the room and you identify the tasks that we have to work together on and we identify roles and responsibilities, we can clearly determine how we're gonna work together more effectively and efficiently.

0:28:27.3 S1: So simple tips that result in a reduction of equipment failures. Define what constitute a partial functional and total functional failure. Hire someone to provide simple RCA training for the plant production, maintenance leadership, maintenance technicians, operators. If you've got maintenance engineers or reliability engineers, they're great ones to teach it 'cause they know it and they know it well. Write work orders for all failures and ensure the following is identified: Asset number, problem or work required, part used, labor type, hours, number of techs, root cause of the failure. Root cause of the failure. So if we're writing a work order, if we don't know the root cause, how are we correcting the failure? Maybe we need to do an RCA first, right? We gotta know that. So LS, lack of specifications, OE, operator error, these are some of the root causes of failure. Not enough time to repair to specifications. We had to patch it. We gotta go, hurry up. How much longer is it gonna take? No repeatable procedure and whatever else you wanna add to it. I think that it's important to do that.

0:29:45.3 S1: Okay, condition as found. Alright, bear with me a second here. I'm sorry but I had to move my screens around. Condition as found, condition as left, recommended changes to the procedure. That's one of the things we need to have. You see this chart I've got in front of me right here in front of you? So we've got, at the top, job description. So there's a lubricate bearings, the frequency is the asset numbers there, line one. Frequency, estimated labor hours is one person times one hour. Estimated production downtime is zero.

0:30:23.1 S1: And we go next to the originator of it, the owner. The cautions. Cautions go in yellow. Cautions have to do with equipment damage, so yellow. Failure to follow PM requirements could result in equipment failures. We got parts if we need it. Here we've got synthetic lube so in case... What this is for is when the planner is planning a job, if he needs something out of the storeroom, it's already on the procedure. All they gotta do is say, "Okay, I need to check that out, put it in the kitted area so the maintenance people, when they get ready to do the work, it's already in the kitted area." Any consumables needed, lint-free towels, whatever that is. Special tools required, single pump grease gun with type 237 synthetic grease gun. So we want synthetic grease. Any mobile equipment, none on this case, required departmental coordination production lead will be notified before execution of lubrication.

0:31:27.1 S1: And then here we have step-by-step. Now we have the craft type, we have the number of craft hours, but I want... In each step on this PM, I want initials. And it doesn't matter if it was a PM or a corrective maintenance procedure, I want initials beside each step. That way, we don't skip a step, 'cause a lot of times people say, "I've been doing this for 20 years, I know how to do it." Right, and that's why we got problems too. We need to start focusing in on putting initials by each step. That way, if we go back and we have a failure, I like to pull the work orders out that we've completed and look at those work orders, doing a root cause analysis, say what's going on. This tells us a lot.

0:32:14.1 S1: Condition as found, so how did they find the asset? Leak coming from number one gearbox, we need to know that. Condition is left, clean up oil, notify production leader and keep area clean of oil. And we're probably gonna have to write a new work... Write a work order to resolve that problem. So any comments, none, craft feedback, all good. I want craft feedback on procedures. So the first time you write good procedures, I wanna see something back saying, "It doesn't make sense." So if I put equipment history with this and I'll send it out with the procedure, then when technicians look at it and they look at the failure history for the last 30, 60, 90 days, they can match up if this procedure makes sense or not? Is it working? 

0:33:00.5 S1: So more simple tips. Number four, track and manage failures. Cost of failures, lost production. I like that, it's about the money. Parts, expedited cost. How much does it cost to airfreight something in overnight delivery, or I call a vendor in town and I say, "Hi, I need a part real quick." "So how bad do you really need it?" "I need it real bad." Okay, so when they look at the... When they punch the number in what it costs, it's not gonna be the normal value, it's gonna be up. So you pay more money for the part. And then parts and material cost over all, what's going on there? Create an A3 report to share failures with everyone. I know this is a crazy thought. This is just one of my crazy ideas, but I was at a plant and it just so happens they had a gearbox failure, and production lost 330 units of production.

0:34:04.9 S1: Asset criticality is high and they said it caught the defect severity on it. So they lost 330 units, which four hours of downtime cost them $7050. I like pictures that show the failures. So the gearbox failed, I put pictures of the failed bearings on there. So we got the problem, so what was the root cause? Could be many factors. It could've been a perfect storm. Known gearbox noise reported on daily checklist for two weeks, production needed to run, could not take downtime to replace gearbox. That probably may or may not be the reason why it failed but it could be. Resolution, replace gearbox to specification. I sent the gearbox out for rebuild and forensics. When I send a gearbox to some company that rebuilds the gear boxes, or if I send it in-house, what I want is a few things. I wanna know forensics, I want them to tell me what they think the root cause of the failure was. Why it failed? I want the parts that were replaced in the gearbox, I want it come back to me so I could lay it out for all the technicians to see. And what I used to do when I had a failure like that, I'd lay the parts out.

0:35:25.4 S1: I gave everybody one chance to come up what they think the root cause of the problem was, and the one that they got it right or close to it, then I'd buy them and their spouse dinner at whatever restaurant in town they want. We lived in a small town, so it wasn't like there was a lot of fancy restaurants, but still it really got them moving. We wanna review all PM as frequencies in the gearbox, review pass all sample results. So the cost of the gearbox was $200, the gearbox was $800. So let me think $1000 compared to $7500. That's a lot of difference. So how are we gonna measure sustainment on this gear box, PM compliance 10% of the time frequency and I wanna do it all critical assets. So in other words, if it's a monthly PM, you got three days to do it and then you're out of compliance. Oil sample, time from the sample taken to results received and reviewed, measured, if re-sample required, three days of re-sample, if out of specs, corrective maintenance work order written, replacement planned and scheduled, very simple. Create a dashboard to measure reduction or increase in failures by asset area overall.

0:36:37.8 S1: PM compliance, 98%, my goal is 100%. Rework, my goal is zero, I don't wanna have rework. Got seven reworks. We just gotta know where we are in a game. If any of you watch football or soccer, the score in the game is a big deal. So MTBF of critical assets 11.4 hours, goal 342. We got a long way to go. The maintenance costs continues to go up. I like to trend it. I want people to see a line graph on it. It's the best way to do it. And the last one, emergency labour hours, what is it doing? It's going up. It's trending up. We got a problem. Final thoughts. Keep your maintenance storeroom locked and secure 24 hours a day, seven days a week. Ensure 90% plus of maintenance work is planned and scheduled. Two different functions can be done by the same person, it should be done by the same person, unless you've got a different type of organisation, a very large one. Ensure 100% of scheduled work has repeatable procedures.

0:37:50.3 S1: If we're gonna schedule a job, I want procedures that are repeatable, so if we have a failure, we take that procedure and we may have to change it to make that next step change to get out of those failures. State equipment failure... Make the statement, equipment failures are unacceptable and must be eliminated unless run-to-failure is the maintenance strategy for a specific asset or has a... Run-to-failure is a maintenance strategy, but most companies that apply run-to-failure don't even know it's a maintenance strategy, it's just the way they run it.

0:38:23.0 S1: Train maintenance leadership and maintenance best practices, it's a big deal. We know that. You don't get a lot of time with them. I just say, what you need to do is just sit down with them, just make sure you've got your plan together, you've got your thoughts together, put together a PowerPoint presentation with leadership, so all I need is just 30 minutes. 'Cause you're not gonna get much more than that, maybe an hour, maybe, with a plant leadership team. I'll tell you what, last week, I had a plant, I had the maintenance technicians in the training and it so happens the senior leadership of the company, I'm talking about the president, he showed up in the room and I'll tell you what he didn't leave, he loved it.

0:39:03.7 S1: Because he felt like maintenance leadership needs to hear this stuff, they need to understand. Train all maintenance planners plus one technician in formal maintenance planning and scheduling. Plus one technician 'cause in case the planner's out, I need someone that can slide in, and not be sitting there going, "What do I do?" I want them formally trained and they know how to do it. Provide formal training to two maintenance technicians in root cause analysis. If you gotta send them off to training, do it. If you're gonna do root cause analysis in-house, bring a vendor in, I recommend you identify some of the problems you have, so the class can solve the root cause. They can identify the root cause of the problems and solve the problems that you have, in the training. That way you're paying for the training.

0:39:53.5 S1: Create a new position. Alright, you reliability engineers have been waiting on this one, I know. I like maintenance engineer and technician 'cause it doesn't threaten other technicians. But a maintenance engineering technician, is just your best maintenance tech. And they focus on failure elimination and mitigation. That's all they're focused on. They're not chasing fires or anything else. They're out there to help me. If I'm a maintenance engineer or reliability engineer, they're my right-hand person that's gonna help me get out of this ditch that I'm in right now. Then post the scorecard to provide all levels of plant personnel the status of the maintenance process. I want people to know how well we're doing. If we're not doing good, that's okay. At least we know where we are. But don't need to hide stuff. We need to make sure we share appropriately.

0:40:42.5 S1: Alright. So questions? Here's my planning and scheduling class coming up in January, 19 to 21. It's in-person at Southern Wesleyan University outside of Clemson, South Carolina, really central South Carolina. Most people are gonna be on Zoom. So you get a Zoom link just like this. You just go to it. So for three days, it's on Zoom or live, whichever you wanna do. So what are your questions now? How to measure scheduled compliance during the week? Okay, so one of the questions was, how do you measure scheduled compliance during the week? So every maintenance technician should be scheduled. If they're supposed to work 40 hours, they should be scheduled for 40 hours of work. And they should be scheduled by day, by hour. So Monday morning, they don't come out of the shop and say, "Okay, I gotta look through these work orders and decide what to do."

0:41:41.2 S1: No. If it's 7 o'clock, the equipment, a line is expecting someone to come to replace a gearbox or something, then it should happen. That maintenance technician should be there at 7 o'clock. Even if he comes in at 7 o'clock. What I had to do when I first started this getting planning and scheduling really going with my crew as maintenance supervisor, I had to bring my technicians in early. No one knew it but me. Okay, so I'd bring them in early, so we'd get everything ready, so at 7 o'clock, the technician was standing on the line. So we're waiting on them to shut down this but, we agreed in the scheduling meeting last week, that on Monday morning, this equipment shut down at 7 o'clock. Cool down, cleaned up, ready to be worked on. If we don't then that breaks... That's what they call a break to the schedule, which is a metric. How many breaks to the schedule? 

0:42:28.0 S1: So you schedule people, each, all maintenance people based on priority of work by day, by hour, it should happen. If you got people seven days a week, all of them should be scheduled. Even... Think about it. I mean, I was a ship mechanic, I worked at Exxon. I was a ship mechanic. What my job was was dealing with problems production may have and deal with things that maybe make it through the next day. But when I didn't have that, I had PMs to do. So I was always had PMs that weren't as critical as what day shift PMs were but certain ones I could do on the night shift or an all shift.

0:43:07.3 S1: So how do you present your MTBF for critical equipment during the week, month, if there's no breakdown. If no breakdown, there's no failure. A breakdown is a failure then it's not. What else? Caitlyn, do you got the questions from anybody? Can you see them? 

0:43:32.1 Speaker 2: You've done a great job asking them or reading them. How do you effectively measure wrench time on the day-to-day? 

0:43:46.6 S1: Really don't. The biggest thing to do is look at scheduled compliance. If you're scheduling by hour, by day, then you could pretty well say that your wrench time is gonna be high. I don't really care what the number is. I just wanna know, I wanna increase wrench time. And the only way you're gonna increase wrench time is by planning and scheduling correctly. Planning and scheduling is the only way you're gonna increase wrench time. Telling people to work harder, work faster, work smarter, it's not logical.

0:44:19.2 S2: Got it. Is a partial failure considered a machine breakdown? 

0:44:23.7 S1: And so to me, a machine breakdown is a total functional failure. The machine functionally failed, then no functions are working, so it's a total functional failure. Partial functional failure could be that you've got a bearing failing and you can hear it. It's reducing the speed on the line, but it could be a partial functional failure.

0:44:47.8 S2: Perfect.

0:44:49.8 S1: What else? 

0:44:49.8 S2: Are there any other questions for Ricky? 

0:44:52.1 S1: Come on now for Honda. I wanna hear a Honda question.

0:45:00.9 S2: Okay. I don't know if I understand the question, but maybe you will Ricky. How do we measure stockout material 12 from the store? 

0:45:12.2 S1: Stockout is... It's very simple, it's the SMRP metric. When you go to the storeroom, the store... The computer says it's in stock, and you go to it and it's not there. It's a stockout. Not there. I got some bad news for you on that. If your plant's reactive, you're gonna have a lot of stockouts. So managing stockouts isn't the root cause of the problem. The root cause is planning and scheduling because you've so reacted. And when I say if you don't have planning and scheduling working well, you start off with a small area and small part of the crew, plan and schedule for them. Just start off with two technicians planning and scheduling, and then you start to grow it and grow it. You can't start out and just say, "I'm gonna do it all at one time." Alright, what else? 

0:46:02.2 S2: What would you recommend if the storeroom is not a closed storeroom and there is no place to locate kitted parts in a secured location? 

0:46:14.6 S1: [chuckle] So the question is, back to the person, do they have an area where all the parts are right now, in one area? 

0:46:27.7 S2: Armon, that's for you. Yes.

0:46:31.4 S1: Okay. Just put a cage up. Put a cage up around... A fence around it high enough that it makes it difficult to climb over. You can put Concertina wire on top, too. That works pretty good. I'm ex-Army guy, I like that Concertina wire. So you can do that, and then I would put... You gotta lock it. Yeah, you're gonna have to lock it. We got technology now. We got these things called card readers, so we can... When we go to the storeroom, we scan the badge, and then when we open the door an alarm goes off, "Wonk, wonk, wonk!" Why do we have an alarm? So you get in the door and close it behind you. Now, I wanna have a system that tracks... You can do it... You can use your PLC to track this, how long is a person in there? All of you got some type of PLC, so why not track how long have they've been in there, and then for the next morning when you come in, you get a report saying Jimmy and Billy was in there. They didn't check out anything. So if that was a night shift, I'd call them at home and say, "Hey," when they're sleeping, "Hey, Billy, Jimmy, what's going on, man? You were in the storeroom last night. I see you didn't check out any parts." "Oh, yeah, we forgot to." Yeah. This time it's on me, next time it's on you, okay? We wanna have the parts secure. That's a lot of money in a company, not just in parts, but the parts that we need to put use in that plant. Alright, what's next? 

0:48:06.8 S2: That's it. That's all we've got. Thank you so much, Ricky.

0:48:09.9 S1: Thank you.

0:48:12.1 S2: Awesome. So if you would like to get a little more involved, please make sure to sign up for Maintenance Planning and Scheduling with Ricky. His email is right there, [email protected] It's a three-day workshop in January. Ricky is also available through our Upkeep Connect platform, so that's to be a one-on-one mentor for you and your maintenance organisation. So if this workshop isn't something you're able to commit to but you would like to be able to call on him and ask for one-on-one advice, check out and see if regular mentoring is something you're interested in. Have a great rest of your day and thank you, Ricky.

0:48:56.2 S1: Thank you, everybody. Have a great evening or a great day wherever you're at. Bye.

UpKeep Icon UpKeep makes maintenance easy.

Maintenance shouldn’t mean guesswork and paperwork. UpKeep makes it simple to see where everything stands, all in one place. That means less guesswork and more time to focus on what matters.

Get a free product tour

Want to keep watching?

Good choice. We have more webinars about maintenance!


Troubleshooting Equipment Failure Panel

Watch the first part of a panel discussion on Troubleshooting Equipment Failure to learn about the typical warning signs of equipment failure due to lubrication degradation. 

Ryan Chan
CEO of UpKeep

Lean Maintenance and How to Apply

In this webinar, Ricky Smith teaches Lean Maintenance by leveraging teamwork, employee empowerment and continuous education.

Ricky Smith
Maintenance Expert in Residence

The Future of Maintenance: Asset Operations Management

Watch UpKeep CEO and Founder, Ryan Chan and Henry Pray, Director of Product Management, for “The Future of Maintenance: Asset Operations Management''

Ryan Chan
CEO of UpKeep
Get Started

Sign up for a personalized tour today.

Request a Personalized Tour
Information is 100% secure.
This website uses cookies to ensure you get the best experience. Privacy Policy