Mean Time to Failure

Discussion in 'Amps and Cabs [BG]' started by IamGroot, Oct 25, 2021.


  1. IamGroot

    IamGroot

    Jan 18, 2018
    Mean Time to Failure means how long a component or system can be expected to run before failing. One of my college roomates back in the mid 70's (PhD in EE) had a consulting business based on MTF of electronic systems.

    Any WAG for the typical MTF for a solid state amp? I am guessing it depends on % of rated load, so if any info in that area would be good.
     
    BOOG, Iristone and EatS1stBassist like this.
  2. bholder

    bholder Affable Sociopath Supporting Member

    Sep 2, 2001
    Vestal, NY
    Received a gift from Sire* (see sig)
    Nice Time To Success?
     
  3. bholder

    bholder Affable Sociopath Supporting Member

    Sep 2, 2001
    Vestal, NY
    Received a gift from Sire* (see sig)
    We use "MTBF" a lot in the computer OS world. ("Mean Time Between Failures")
     
    Patrice B, dbase, Goatrope and 5 others like this.
  4. matti777

    matti777 Supporting Member

    Dec 13, 2007
    Edmonton, Canada
    You can look up reliability data such as MTBF on electronic components. The military has done a lot of work in this area using RAM models and simulations to predict overall system reliability and availability. I would bet the expected MTBF would be very long.
     
    Eli_Kyiv and HolmeBass like this.
  5. arbiterusa

    arbiterusa

    Sep 24, 2015
    SoCal
    It depends a lot more on the architecture; are we talking a GK? An SWR? Or an old Silvertone SS amp, or Taiwan's finest, the Gorilla series? I suspect that matters more than how hard it's run.

    (for what it's worth, I run a Gorilla for lo-fi/punk guitar tones in my home studio and it's the oldest amp I have at 37 years now. Still works!)

    The old SWR SM-400s used to fail routinely due to a poor vent design and user ignorance of how they were supposed to be rack-mounted, and in fairness to those users, SWR did not include specific instruction at the time to not take the feet off and leave at least that much space under the amp - so most of them overheated and burned, usually at gigs, usually within a couple of years of purchase. Contrast that to their ST-220 or any GK amp, neither of which I have ever seen fail.
     
  6. two fingers

    two fingers Opinionated blowhard. But not mad about it. Gold Supporting Member

    Feb 7, 2005
    Eastern NC USA
    If anyone has hands-on experience in this area around here, it's gotta be @agedhorse

    (And I'm already tagging you in first thing Monday morning! I'll try to leave you alone for the remainder of the day.)
     
  7. groovaholic

    groovaholic The louder the better. Supporting Member

    Sep 19, 2004
    Mount Prospect, IL
    The primary MTBF descriptor that I remember from my EE classes was the "bathtub graph" of failures vs. time.

    The basic idea is that the curve is shaped like a bathtub:
    The spigot side of the graph representing initial failures is steep - almost vertical.
    Then there is the bottom of the tub; a long, relatively flat period of stability and reliability.
    As components reach the end of their service life, the failure rates start to climb again - represented as the angled far wall of the tub, against which you'd lean your back.

    So the question becomes - how do we shrink that initial vertical wall, and make the bottom of the bathtub as long as possible?

    And that's all about keeping components in their optimal range, limiting special cause variation, and reducing the incidence of failure modes -- all while balancing reliability with cost and other practical considerations.

    If company A spends $40 to make an amp that runs perfectly for 20 years, what do they gain or lose by spending $80 on an amp that runs perfectly for 40 years? On the flipside, if company B spends $10 to make an amp that retails for 50% of company A's amp, they might sell a bunch...until company B's amp starts exhibiting lots of failures and scares buyers away.

    Company B has pocketed a bunch of cash, but also earned a poor reputation.

    And that metric is part of what's called CONQ: Cost of Non-Quality. What does it cost the company to fix the errors, process warranties, ship replacement parts and units...and how do you quantify the damage to reputation, and what is the unseen impact of all that on future sales.
    [In case you can't tell, we have dipped into what I do for my "day job"]

    Bottom line is that I have a ton of appreciation for a well-engineered product, and tend to spend my money accordingly.
     
  8. ardgedee

    ardgedee

    May 13, 2018
    I haven't seen MTF used to describe complex systems as often as for components. Part of the problem being a large-scale system might influence the expected health and lifespan of any given component for the better or worse, though usually for the worse. Another part of the problem being some items' lifespans are described in terms of interactions or duty cycles rather than time -- For example, connectors by the number of times they're plugged and unplugged or switches by the number of times they're toggled, and other parts by the number of times they're powered up and powered down (a hypothetical widget might only stand to be powered up 20 times, but if the device it's a part of is expected to run for 10 years between power cycles, it'll outlive any of us).

    So an amplifier might have an MTF of 15 years AND 500 power-ups AND 800 connections of instrument cables AND 500 connections of speaker cables AND...
     
    Goatrope and IamGroot like this.
  9. groovaholic

    groovaholic The louder the better. Supporting Member

    Sep 19, 2004
    Mount Prospect, IL
    This brings up another reliability topic: the FMEA (Failure Modes Effect Analysis)

    My #1 amp, coming up on 8 years now, is a GK 700RB-II, which I consider a killer piece of engineering - but the first time I bought a GK, the "Achilles' Heel" was obvious; the fan inlet on the top of the amp.
    My solution was to put the amp in a 3-space rack so there was no direct path of liquids (i.e. rain or beer) onto the power amp circuit board.
    That approach was affirmed a couple years ago, when I bought an "as is" 1001RB-II that had died from liquid exposure - and it turned out the damage was catastrophic and not repairable.

    Aside from that - the #1 most frequently damaged component on ANY amp is the input jack, so I always loop my cable through the handle on my speaker cab or the amp itself.

    FMEA teaches us to identify what could go wrong, then decrease the likelihood or impact of those instances.
     
  10. Plus 1 on input jacks. I have had to replace a few being honest.
     
  11. themickster

    themickster

    Oct 4, 2015
    England
    What's the mean length of string?
     
    comatosedragon likes this.
  12. agedhorse

    agedhorse Supporting Member Commercial User

    Feb 12, 2006
    Davis, CA (USA)
    Development Engineer-Mesa Boogie, Development Engineer-Genzler (pedals), Product Support-Genz Benz
    @groovaholic and @ardgedee both provided good, solid information.

    When doing these calculations, it’s important to separate actual product failures from products broken due to abuse and products that are not actually defective (NFF, or no fault found).

    Of products that are truly defective, it’s further broken down into early or infant failures and lifecycle failures.

    Infant failures mean time is meaningless because the time is essentially zero BUT the failure rate of infant failures is important.

    For lifecycle failures, the mean time for products that I design is a minimum of 40 years. This number seems to be consistent with products from examples by other “premium level” designers/manufacturers.

    Using the example Yamaha, looking at their pro product designs at the circuit and mechanical level, it’s clear that a lot of thought went into both the design and execution for long lifecycle purposes. This is a company I studied closely at the beginning of my career, and one of the few companies that have been consistent over decades because of their strong company culture.
     
  13. Smart move, looping your cable at the source. Do it at both ends, ime. Especially if you have a bass where the input is mounted on the side.
     
    Chain_Lightning likes this.
  14. basscooker

    basscooker Commercial User

    Apr 11, 2010
    cincy ky
    Owner, ChopShopAmps
    Do you factor in consumables or not? For an amp, I'd guess power tubes and or output devices would be consumable due to heat. I.E.; under perfect conditions these would be the first components expected to drift from spec or fail.

    For example if it were not for the power tubes needing replaced, my Ampro is going on 70 years. And I have not been over the top in babying it either. As well, who knows what it went through before I got it.

    I'd guess there are maybe hundreds of 50+ year old amps working just fine here in TBers gear piles.

    Then factor in also the "basically works" amps too. Does failure mean fracked or if there is anything at all outside of design. A preamp isn't working, but you have really only used its power amp anyway, or vice versa. Little details like a pot that needs a twiddle to lose some scratchiness, input connector wants a wiggle, those things. All statistics can be tweaked to make things appear better or worse.
     
  15. Hounddog409

    Hounddog409

    Oct 27, 2015
    ohio
    Mtbf of components dont care what amp its used in.

    Any OEM of electronic equipment should list this in the specs. I know we do, as do all our competitors
     
  16. rantbot

    rantbot

    Jul 10, 2020
    Vacuum tubes have a limited lifetime due to the (slow) evaporation rate of the tungsten filament used to heat the cathode. There are vacuum tube designs which avoid this failure mode but most of us will never see any of them. These rates have been measured experimentally, and so make a good basis for time-to-failure calculations.

    Electrolytic capacitors seem to dry out with age. I've never examined the exact mechanism, but they do eventually die even if used within their design limits. Again, I'm sure someone has measured this, though I've never come across the number.

    Semiconductors have a weird possible failure mode. They're built up on the surface of a slice of purified and polished silicon. The semiconductor junctions (all that NPN or PNP stuff) can be poisoned by slight traces of some elements - a few parts per billion may be enough. The problem is any contaminants on the bottom (or back) of the wafer - the side which doesn't have the circuitry on it. If the contaminants happen to consist of any of these killer poisons - and human fingerprints definitely do - then the finished device will work just fine . . . for a while. But atoms of the poisonous elements will eventually diffuse through the silicon, and when they reach the semiconductor junctions, you don't have a proper semiconductor any more, and the device dies. The diffusion rate is an exponential function of temperature. So "burn-in" periods are just that - a completed device is run at high temperature for a time calculated to allow contaminants to diffuse through the silicon and kill everything. If the device survives the burn-in, then it's fairly safe to assume that there are no dangerous contaminants close to the junctions, so there's nothing to diffuse into the junctions later - meaning the device should last approximately forever. (Maybe.)

    All very nice. But burn-in costs money, while these devices which should be on vendor's shelves just sit around taking up space while running hot. So it's not generally used on consumer products, which are notoriously price-sensitive. The idea is that the eventual user will do the burn-in just through normal use. If the device survives for a while, it's considered equivalent to the formal burn-in, and it's probably fine after that. This is the rationale behind things like 90-day warranties. If it's going to fail via diffusion, it will happen relatively soon at high temperature, when at low temperature (like, while just sitting around on the shelf) it might last for decades (that being the way diffusion through solids works). So the customer does the burn-in, and if the thing fails, he brings it back and gets another. This is not so good for aerospace applications. If the thing is in orbit when it fails, it's a big deal to send it back and exchange for another one. So the aerospace biz is willing to pay for the full burn-in test procedure. But for consumers, that would just be money spent testing for something which probably won't happen.

    There are other failure modes. A biggie is mechanical mounting. I ran into a strange problem on a circuit board with power transistors along the edges. The cooling tabs for these transistors extended over the edges of the board, where they were clamped between the upper and lower edge flanges of a metal cover. The cover thus did double duty as a heat sink. But the devices tended to fail inside their packages, where low-cycle fatigue eventually cracked their copper leads. The problem was that the circuit board was unsupported in the center, so the whole thing could vibrate (albeit only slightly) like a drumhead, and the flexure at the edges eventually broke the metal conductors. But something like this is a design problem; once solved it shouldn't make it into consumer gear.

    Another weird one is caused by oxidation. Tin oxide can be a semiconductor. The tin in a lead-tin solder joint can react with atmospheric oxygen to grow long thin whiskers, several millimeters long. For reasons which remain a mystery to me, mechanical stress of the joint makes such a whisker more likely. These whiskers can break off and fall across a circuit board's conductors, causing shorts. The Navy found that some AA missiles on aircraft carriers were firing off spontaneously, and this oxide was the cause. Diagnosis was complicated by the fact that tin oxide is transparent. The fix was to lower the tin content of the solder, support the joints better to reduce stress, and to paint an insulator over completed boards. But no worries, this problem should have been solved by now, even in the non-military market.

    So, there are a number of known mechanisms which can kill even semiconductors operated within their design limits, and I'd imagine there are more to be discovered. Outside their design limits, there are voltage spikes, overcurrent conditions, gross overheating, mechanical shock, ionizing radiation, blah blah. Some of these are time-dependent, some not. So calculation of MTBF is dicey. If records for repairs and overhauls are kept, as they are for, say, certificated aircraft, a useful MTBF can be measured. But aside from that, after you've designed around the failure modes you know about, in my book it's still a guessing game.
     
  17. groovaholic

    groovaholic The louder the better. Supporting Member

    Sep 19, 2004
    Mount Prospect, IL
    I do, and it makes me a bit bananas when my bandmates don’t…but I’m trying to not be THAT guy who tells other people what they should be doing
     
    scott sinner likes this.
  18. brianrost

    brianrost Gold Supporting Member

    Apr 26, 2000
    Boston, Taxachusetts
    I have a late 1950s Danelectro-made Silvertone amp...it's like a champ, about 5W into an 8" speaker. I got it 40 years ago and have had to replace one tube since then.
     
    dbase and scott sinner like this.
  19. Bassamatic

    Bassamatic keepin' the beat since the 60's

    Nice rant, Rantbot! A lot depends on the quality of the parts used and how hard they are stressed. This is a reason cheaper products tend to fail faster. The parts are lower quality and worked close to the limit much of the time.

    I suspect that the tin whiskering thing has been mostly solved by now. If it wasn't we would have having a lot more failures. This was an extremely serious problem caused by the ROHS lead-free initiative. Created a LOT of problems for our vendors.
     
    RocknRay likes this.
  20. agedhorse

    agedhorse Supporting Member Commercial User

    Feb 12, 2006
    Davis, CA (USA)
    Development Engineer-Mesa Boogie, Development Engineer-Genzler (pedals), Product Support-Genz Benz
    Consumables are calculated separately of course when they are less than the MTBF of the finished product. I typically design around a minimum of 10,000 hours for preamp tubes, I am obviously careful to respect all the limits AND the variations/tolerances that these limits have. For the average player this represents between 10 and 20 years, and looking back at the history of my prior designs this appears to be conservative.

    I don't normally consider pots to be consumable, but in humid, or corrosive environments and in some circuit applications they can get a little noisy. A TINY application of Caig DeOxit D-5 will almost always resolve this, and represents the gray area of consumable/maintenance. This doesn't include pots ruined by folks spraying "miracle cure-all" substances into their pots, or hosing them down. That falls into the abuse category similar to dropping the amp. Jack issues are not consumable in general.

    MTBF of components are in fact dependent on the circuits they are used in, and the MTBF number is a variable that changes based on the parameters that the part is exposed to. For example, electrolytic caps have a series of de-rating and aging curves that are heavily dependent on temperature, ripple current and applied voltage. Respect these curves and the MTBF can be almost forever.
     
  21. Primary

    Primary TB Assistant

    Here are some related products that TB members are talking about. Clicking on a product will take you to TB’s partner, Primary, where you can find links to TB discussions about these products.

     
    Nov 28, 2021

Share This Page