Reply by Phil Hobbs December 6, 20222022-12-06
John Walliker wrote:
> On Tuesday, 6 December 2022 at 19:38:38 UTC, Don Y wrote: >> On 12/6/2022 10:05 AM, legg wrote: >>> On Mon, 5 Dec 2022 05:09:48 -0700, Don Y <blocked...@foo.invalid> >>> wrote: >>> >>>> On 12/4/2022 10:46 AM, Joe Gwinn wrote: >>>>> I was doing much the same in the late 1970s. We had a number of new >>>>> SEL 32/55 midi computers, with this brand new semiconductor RAM memory >>>>> (replacing magnetic core memory), and were having lots of early >>>>> failures. >>>>> >>>>> So, I decided to give them some hot summer days: The computers were >>>>> looping on a memory test, as before, but now with their air intakes >>>>> partially blocked by cardboard, with a thermocouple in the core so we >>>>> could adjust the cardboard to achieve the max allowed temperature. >>>>> >>>>> Initially, delivered units would fail within a day. We would remove >>>>> the cardboard et al and call the vendor, who would then find and >>>>> replace the failed memory. Rinse and repeat. >>>>> >>>>> Pretty soon, the vendor instituted a hot screening program before >>>>> delivery, it being far cheaper to fix in factory than the field, and >>>>> in a year or two semiconductor memory field reliability had improved >>>>> greatly. >>>> >>>> But, the vendor likely didn't just "block the vents" and *hope* >>>> ALL the early faults would manifest in the first 24 hours. >>>> >>>> Instead, he likely stressed a sample population over a longer >>>> period of time and recorded the failure rates, over time -- looking >>>> for the "knee" at which the failure rate leveled off. Longer burnin >>>> times would just needlessly shorten the useful life of the device; >>>> shorter would risk some number of infant mortality failures slipping >>>> through to manifest at the customer. >>>> >>>> It seems that most folks have a naive understanding of how burnin is >>>> supposed to work. That "simply" plugging the unit in before sale >>>> is enough to catch the early failures. Unless you know where (in time) >>>> those failures are probabilistically going to manifest, how can >>>> you know that 24, 48, 72, 168 hours is "enough"? Or, that 60C is >>>> the best temperature to accelerate failures? (my residential >>>> devices have to *operate* at 60C. And, -40C.) >>>> >>>> [If you're not going to approach it with a scientific basis, you're >>>> likely just looking to capitalize on your customers' ignorance: >>>> "We burn in our products for ## hours to ensure quality". Yeah. >>>> Right. "Then why did OUR unit shit the bed after two weeks?"] >>> >>> The best temperature to accellerate failures is the operating limit >>> for which the design is intended to address, under functioning >>> conditions that produce the highest intended self-generated rise. >> Yes. My point was that naively assuming N hours at T degrees is >> just silly. You need to characterize your failure pattern before >> you can figure out how long and at which conditions you should >> stress the design. >>> If you have access to early testing, you'll have some idea of the >>> margins for functional operation that this limit condition provides, >>> and the accompanying MTBF calculation for this previously-measured >>> condition. >>> >>> It is only when margins to the limits are actually exceeded that >>> predicted life is possibly compromised. >> Any time "operating" comes at the expense of "remaining useful life". >> If you can assume that the time spent operating is << the expected >> useful life, then you can ignore it. >> >> OTOH, if your "usage in test/burnin" represents a significant >> portion of the useful life of the device, approaching it willy-nilly >> can be costly. >> >> E.g., there are SIMM/DIMM connectors that are rated for a *handful* >> of operations. You'd not want to be designing a test plan that >> called for them to be exercised *dozens* of times. And, then wonder >> why their reliability suffered post-test! >>> Complete thermal cycling is impractical for simple burn-in. >>> It is usually restricted in application to design verification or >>> later sample process quality assurance. >> Most devices operate in a narrow temperature range -- esp if >> deployed in human-occupied environments. Something intended >> for use in a lab will likely see constant temperatures. >> >> OTOH, there are classes of devices that are not constrained by >> "human habitation". E.g., an outdoor whether station will >> typically see 100+C variations, over its lifetime -- though >> likely only ~30C in a given (short) interval. Here, I expect >> to encounter 0F to ~140F as a normal yearly range. In >> North Dakota, it might be -40F to +100F, etc. If you don't want >> to design AZ and ND models, you have to test the design at >> the union of those conditions. >> >> Then, there are devices that are intended to operate in environments >> where the operating conditions are varied by necessity (e.g., >> many manufacturing processes). >>> Cold cycling tolerance is relevant to consumer products mainly to >>> demonstrate air-shipment worthiness. >> Or, it's -26F outside and will be that way for a few days! >>> For burn in, simple on-off cycling to allow stress over self- >>> generated temperature swings is considered adequate. > > There are some assumptions about the thermodynamics of the > failure mechanisms built into many accelerated testing scenarios > which are sometimes not justified. > > John >
The idea that all failure modes follow an Arrhenius temperature dependence over many orders of magnitude is completely up a pole. MIL-HDBK-217 really ought to be relegated to a museum. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
Reply by Don Y December 6, 20222022-12-06
On 12/6/2022 3:10 PM, John Walliker wrote:
> There are some assumptions about the thermodynamics of the > failure mechanisms built into many accelerated testing scenarios > which are sometimes not justified.
Again, my point is that you need to understand your design and the environment in which it will operate before you can create a burn-in strategy. And, you need to monitor the EXPECTED failures in your burnin process to determine if the characterization of the product needs to be updated (new model). Places that take this seriously usually have staff on hand to keep track of the process on all such products. After release to manufacturing, engineering only gets re-involved when "they" discover something has changed (component suppliers, process, etc.) Early in my career, I used an MNOS WAROM in a design. The *suggested* burnin regimen would have caused 100% failures in the first few *hours* as the device was only guaranteed for ~1,000 (that's "one thousand") write cycles. So, we had to come up with a different way of exercising the design that didn't involve repeated accesses to that device. Similar issues exist, today, with MLC flash, etc.
Reply by John Walliker December 6, 20222022-12-06
On Tuesday, 6 December 2022 at 19:38:38 UTC, Don Y wrote:
> On 12/6/2022 10:05 AM, legg wrote: > > On Mon, 5 Dec 2022 05:09:48 -0700, Don Y <blocked...@foo.invalid> > > wrote: > > > >> On 12/4/2022 10:46 AM, Joe Gwinn wrote: > >>> I was doing much the same in the late 1970s. We had a number of new > >>> SEL 32/55 midi computers, with this brand new semiconductor RAM memory > >>> (replacing magnetic core memory), and were having lots of early > >>> failures. > >>> > >>> So, I decided to give them some hot summer days: The computers were > >>> looping on a memory test, as before, but now with their air intakes > >>> partially blocked by cardboard, with a thermocouple in the core so we > >>> could adjust the cardboard to achieve the max allowed temperature. > >>> > >>> Initially, delivered units would fail within a day. We would remove > >>> the cardboard et al and call the vendor, who would then find and > >>> replace the failed memory. Rinse and repeat. > >>> > >>> Pretty soon, the vendor instituted a hot screening program before > >>> delivery, it being far cheaper to fix in factory than the field, and > >>> in a year or two semiconductor memory field reliability had improved > >>> greatly. > >> > >> But, the vendor likely didn't just "block the vents" and *hope* > >> ALL the early faults would manifest in the first 24 hours. > >> > >> Instead, he likely stressed a sample population over a longer > >> period of time and recorded the failure rates, over time -- looking > >> for the "knee" at which the failure rate leveled off. Longer burnin > >> times would just needlessly shorten the useful life of the device; > >> shorter would risk some number of infant mortality failures slipping > >> through to manifest at the customer. > >> > >> It seems that most folks have a naive understanding of how burnin is > >> supposed to work. That "simply" plugging the unit in before sale > >> is enough to catch the early failures. Unless you know where (in time) > >> those failures are probabilistically going to manifest, how can > >> you know that 24, 48, 72, 168 hours is "enough"? Or, that 60C is > >> the best temperature to accelerate failures? (my residential > >> devices have to *operate* at 60C. And, -40C.) > >> > >> [If you're not going to approach it with a scientific basis, you're > >> likely just looking to capitalize on your customers' ignorance: > >> "We burn in our products for ## hours to ensure quality". Yeah. > >> Right. "Then why did OUR unit shit the bed after two weeks?"] > > > > The best temperature to accellerate failures is the operating limit > > for which the design is intended to address, under functioning > > conditions that produce the highest intended self-generated rise. > Yes. My point was that naively assuming N hours at T degrees is > just silly. You need to characterize your failure pattern before > you can figure out how long and at which conditions you should > stress the design. > > If you have access to early testing, you'll have some idea of the > > margins for functional operation that this limit condition provides, > > and the accompanying MTBF calculation for this previously-measured > > condition. > > > > It is only when margins to the limits are actually exceeded that > > predicted life is possibly compromised. > Any time "operating" comes at the expense of "remaining useful life". > If you can assume that the time spent operating is << the expected > useful life, then you can ignore it. > > OTOH, if your "usage in test/burnin" represents a significant > portion of the useful life of the device, approaching it willy-nilly > can be costly. > > E.g., there are SIMM/DIMM connectors that are rated for a *handful* > of operations. You'd not want to be designing a test plan that > called for them to be exercised *dozens* of times. And, then wonder > why their reliability suffered post-test! > > Complete thermal cycling is impractical for simple burn-in. > > It is usually restricted in application to design verification or > > later sample process quality assurance. > Most devices operate in a narrow temperature range -- esp if > deployed in human-occupied environments. Something intended > for use in a lab will likely see constant temperatures. > > OTOH, there are classes of devices that are not constrained by > "human habitation". E.g., an outdoor whether station will > typically see 100+C variations, over its lifetime -- though > likely only ~30C in a given (short) interval. Here, I expect > to encounter 0F to ~140F as a normal yearly range. In > North Dakota, it might be -40F to +100F, etc. If you don't want > to design AZ and ND models, you have to test the design at > the union of those conditions. > > Then, there are devices that are intended to operate in environments > where the operating conditions are varied by necessity (e.g., > many manufacturing processes). > > Cold cycling tolerance is relevant to consumer products mainly to > > demonstrate air-shipment worthiness. > Or, it's -26F outside and will be that way for a few days! > > For burn in, simple on-off cycling to allow stress over self- > > generated temperature swings is considered adequate.
There are some assumptions about the thermodynamics of the failure mechanisms built into many accelerated testing scenarios which are sometimes not justified. John
Reply by Don Y December 6, 20222022-12-06
On 12/6/2022 10:05 AM, legg wrote:
> On Mon, 5 Dec 2022 05:09:48 -0700, Don Y <blockedofcourse@foo.invalid> > wrote: > >> On 12/4/2022 10:46 AM, Joe Gwinn wrote: >>> I was doing much the same in the late 1970s. We had a number of new >>> SEL 32/55 midi computers, with this brand new semiconductor RAM memory >>> (replacing magnetic core memory), and were having lots of early >>> failures. >>> >>> So, I decided to give them some hot summer days: The computers were >>> looping on a memory test, as before, but now with their air intakes >>> partially blocked by cardboard, with a thermocouple in the core so we >>> could adjust the cardboard to achieve the max allowed temperature. >>> >>> Initially, delivered units would fail within a day. We would remove >>> the cardboard et al and call the vendor, who would then find and >>> replace the failed memory. Rinse and repeat. >>> >>> Pretty soon, the vendor instituted a hot screening program before >>> delivery, it being far cheaper to fix in factory than the field, and >>> in a year or two semiconductor memory field reliability had improved >>> greatly. >> >> But, the vendor likely didn't just "block the vents" and *hope* >> ALL the early faults would manifest in the first 24 hours. >> >> Instead, he likely stressed a sample population over a longer >> period of time and recorded the failure rates, over time -- looking >> for the "knee" at which the failure rate leveled off. Longer burnin >> times would just needlessly shorten the useful life of the device; >> shorter would risk some number of infant mortality failures slipping >> through to manifest at the customer. >> >> It seems that most folks have a naive understanding of how burnin is >> supposed to work. That "simply" plugging the unit in before sale >> is enough to catch the early failures. Unless you know where (in time) >> those failures are probabilistically going to manifest, how can >> you know that 24, 48, 72, 168 hours is "enough"? Or, that 60C is >> the best temperature to accelerate failures? (my residential >> devices have to *operate* at 60C. And, -40C.) >> >> [If you're not going to approach it with a scientific basis, you're >> likely just looking to capitalize on your customers' ignorance: >> "We burn in our products for ## hours to ensure quality". Yeah. >> Right. "Then why did OUR unit shit the bed after two weeks?"] > > The best temperature to accellerate failures is the operating limit > for which the design is intended to address, under functioning > conditions that produce the highest intended self-generated rise.
Yes. My point was that naively assuming N hours at T degrees is just silly. You need to characterize your failure pattern before you can figure out how long and at which conditions you should stress the design.
> If you have access to early testing, you'll have some idea of the > margins for functional operation that this limit condition provides, > and the accompanying MTBF calculation for this previously-measured > condition. > > It is only when margins to the limits are actually exceeded that > predicted life is possibly compromised.
Any time "operating" comes at the expense of "remaining useful life". If you can assume that the time spent operating is << the expected useful life, then you can ignore it. OTOH, if your "usage in test/burnin" represents a significant portion of the useful life of the device, approaching it willy-nilly can be costly. E.g., there are SIMM/DIMM connectors that are rated for a *handful* of operations. You'd not want to be designing a test plan that called for them to be exercised *dozens* of times. And, then wonder why their reliability suffered post-test!
> Complete thermal cycling is impractical for simple burn-in. > It is usually restricted in application to design verification or > later sample process quality assurance.
Most devices operate in a narrow temperature range -- esp if deployed in human-occupied environments. Something intended for use in a lab will likely see constant temperatures. OTOH, there are classes of devices that are not constrained by "human habitation". E.g., an outdoor whether station will typically see 100+C variations, over its lifetime -- though likely only ~30C in a given (short) interval. Here, I expect to encounter 0F to ~140F as a normal yearly range. In North Dakota, it might be -40F to +100F, etc. If you don't want to design AZ and ND models, you have to test the design at the union of those conditions. Then, there are devices that are intended to operate in environments where the operating conditions are varied by necessity (e.g., many manufacturing processes).
> Cold cycling tolerance is relevant to consumer products mainly to > demonstrate air-shipment worthiness.
Or, it's -26F outside and will be that way for a few days!
> For burn in, simple on-off cycling to allow stress over self- > generated temperature swings is considered adequate.
Reply by legg December 6, 20222022-12-06
On Mon, 5 Dec 2022 05:09:48 -0700, Don Y <blockedofcourse@foo.invalid>
wrote:

>On 12/4/2022 10:46 AM, Joe Gwinn wrote: >> I was doing much the same in the late 1970s. We had a number of new >> SEL 32/55 midi computers, with this brand new semiconductor RAM memory >> (replacing magnetic core memory), and were having lots of early >> failures. >> >> So, I decided to give them some hot summer days: The computers were >> looping on a memory test, as before, but now with their air intakes >> partially blocked by cardboard, with a thermocouple in the core so we >> could adjust the cardboard to achieve the max allowed temperature. >> >> Initially, delivered units would fail within a day. We would remove >> the cardboard et al and call the vendor, who would then find and >> replace the failed memory. Rinse and repeat. >> >> Pretty soon, the vendor instituted a hot screening program before >> delivery, it being far cheaper to fix in factory than the field, and >> in a year or two semiconductor memory field reliability had improved >> greatly. > >But, the vendor likely didn't just "block the vents" and *hope* >ALL the early faults would manifest in the first 24 hours. > >Instead, he likely stressed a sample population over a longer >period of time and recorded the failure rates, over time -- looking >for the "knee" at which the failure rate leveled off. Longer burnin >times would just needlessly shorten the useful life of the device; >shorter would risk some number of infant mortality failures slipping >through to manifest at the customer. > >It seems that most folks have a naive understanding of how burnin is >supposed to work. That "simply" plugging the unit in before sale >is enough to catch the early failures. Unless you know where (in time) >those failures are probabilistically going to manifest, how can >you know that 24, 48, 72, 168 hours is "enough"? Or, that 60C is >the best temperature to accelerate failures? (my residential >devices have to *operate* at 60C. And, -40C.) > >[If you're not going to approach it with a scientific basis, you're >likely just looking to capitalize on your customers' ignorance: >"We burn in our products for ## hours to ensure quality". Yeah. >Right. "Then why did OUR unit shit the bed after two weeks?"]
The best temperature to accellerate failures is the operating limit for which the design is intended to address, under functioning conditions that produce the highest intended self-generated rise. If you have access to early testing, you'll have some idea of the margins for functional operation that this limit condition provides, and the accompanying MTBF calculation for this previously-measured condition. It is only when margins to the limits are actually exceeded that predicted life is possibly compromised. Complete thermal cycling is impractical for simple burn-in. It is usually restricted in application to design verification or later sample process quality assurance. Cold cycling tolerance is relevant to consumer products mainly to demonstrate air-shipment worthiness. For burn in, simple on-off cycling to allow stress over self- generated temperature swings is considered adequate. RL
Reply by Don Y December 5, 20222022-12-05
On 12/4/2022 10:46 AM, Joe Gwinn wrote:
> I was doing much the same in the late 1970s. We had a number of new > SEL 32/55 midi computers, with this brand new semiconductor RAM memory > (replacing magnetic core memory), and were having lots of early > failures. > > So, I decided to give them some hot summer days: The computers were > looping on a memory test, as before, but now with their air intakes > partially blocked by cardboard, with a thermocouple in the core so we > could adjust the cardboard to achieve the max allowed temperature. > > Initially, delivered units would fail within a day. We would remove > the cardboard et al and call the vendor, who would then find and > replace the failed memory. Rinse and repeat. > > Pretty soon, the vendor instituted a hot screening program before > delivery, it being far cheaper to fix in factory than the field, and > in a year or two semiconductor memory field reliability had improved > greatly.
But, the vendor likely didn't just "block the vents" and *hope* ALL the early faults would manifest in the first 24 hours. Instead, he likely stressed a sample population over a longer period of time and recorded the failure rates, over time -- looking for the "knee" at which the failure rate leveled off. Longer burnin times would just needlessly shorten the useful life of the device; shorter would risk some number of infant mortality failures slipping through to manifest at the customer. It seems that most folks have a naive understanding of how burnin is supposed to work. That "simply" plugging the unit in before sale is enough to catch the early failures. Unless you know where (in time) those failures are probabilistically going to manifest, how can you know that 24, 48, 72, 168 hours is "enough"? Or, that 60C is the best temperature to accelerate failures? (my residential devices have to *operate* at 60C. And, -40C.) [If you're not going to approach it with a scientific basis, you're likely just looking to capitalize on your customers' ignorance: "We burn in our products for ## hours to ensure quality". Yeah. Right. "Then why did OUR unit shit the bed after two weeks?"]
Reply by Joe Gwinn December 4, 20222022-12-04
On Sat, 3 Dec 2022 20:07:59 +0200, Dimiter_Popoff <dp@tgi-sci.com>
wrote:

>On 12/2/2022 19:53, Don Y wrote: >> We're making devices that are typically built from 3+ "modules" >> (form factor is highly constrained so board space is similarly). >> >> Presently looking at the cost-benefit assessment of burning-in >> the modules and/or assembled devices. >> >> *Modules* will be made offshore so any test/burnin has to be part >> of the quoted manufacturing cost. >> >> Modules will be "post processed" domestically to install final >> firmware, S/Ns, private keys and watermarks.&#4294967295; This allows for >> all of that information to be tracked, here (instead of relying >> on an offshore vendor who may decide to copy IP). >> >> Aside from installing the above, the only real "mechanical" >> modifications are connecting modules together and packaging. >> >> So, a "final test" can just verify proper operation of >> the *device* (instead of its constituent modules). >> >> If modules are burned-in and tested prior to acceptance, domesticly, >> the number of failures after assembly should be minimal.&#4294967295; (Yet, >> you'd still want to verify proper operation as the cost to repair >> or replace far exceeds the price of the devices) >> >> OTOH, an assembly problem that could manifest during burn-in >> would want some assurances that it wouldn't sneek past final >> inspection in the absence of a post-assembly burn-in phase. >> >> At the very least, this sort of question would apply to anyone who >> installs firmware after offshore manufacturing.&#4294967295; Or, assembles >> subsystems sourced offshore. >> >> So, what is the best practices guidance? >> > >Apart from the burn-in done here I already told you about >I remember one more example I know of. >During the early 80-s, a friend of mine worked as a production >engineer at a factory which made clones of the PDP-11, they >were shipped to the USSR. Many boards, full of TTL chips, many of which >Russian (they used to make things up to say 4 bit counters >etc., like the 74193 etc. under names no one could repeat) >so the failure rate was huge. >Their standard procedure was - as I remember the stories, never >witnessed these - some time in a 40C chamber (don't know how >many hours) where most of the failures would manifest, then >on some rattling machine for a vibration test, may be more I >don't know of.
I was doing much the same in the late 1970s. We had a number of new SEL 32/55 midi computers, with this brand new semiconductor RAM memory (replacing magnetic core memory), and were having lots of early failures. So, I decided to give them some hot summer days: The computers were looping on a memory test, as before, but now with their air intakes partially blocked by cardboard, with a thermocouple in the core so we could adjust the cardboard to achieve the max allowed temperature. Initially, delivered units would fail within a day. We would remove the cardboard et al and call the vendor, who would then find and replace the failed memory. Rinse and repeat. Pretty soon, the vendor instituted a hot screening program before delivery, it being far cheaper to fix in factory than the field, and in a year or two semiconductor memory field reliability had improved greatly. Joe Gwinn
Reply by Don Y December 3, 20222022-12-03
On 12/3/2022 11:07 AM, Dimiter_Popoff wrote:
> During the early 80-s, a friend of mine worked as a production > engineer at a factory which made clones of the PDP-11, they > were shipped to the USSR. Many boards, full of TTL chips, many of which > Russian (they used to make things up to say 4 bit counters > etc., like the 74193 etc. under names no one could repeat) > so the failure rate was huge. > Their standard procedure was - as I remember the stories, never > witnessed these - some time in a 40C chamber (don't know how > many hours) where most of the failures would manifest, then > on some rattling machine for a vibration test, may be more I > don't know of.
The point being to operate the device at "extreme" conditions to effectively "age" it faster than real time to a point in its useful life AFTER most of the infant mortality failures have surfaced. Presumably, it wasn't NORMALLY operated in a 40C environment *or* under high vibration. You can also play games with the power supplies to make the components (and design) "uncomfortable". [Otherwise, you would have to age it at normal operating conditions for that full period of time -- a foolish strategy when you've got resources tied up for much longer than necessary! Imagine infant mortalities manifesting after weeks of normal operation... would you want to leave units running at "STP" for weeks just to be sure to capture all of those failures pre-sale?] Done correctly, its an ongoing *process* where you track failure rates (and modes) and revise your model so you continue to capture the failures of interest. In shops where we've done this, there was a *department* dedicated to tracking product quality. Way too much "Profanity and Sadistics" for non-math types!
Reply by John Larkin December 3, 20222022-12-03
On Sat, 3 Dec 2022 13:09:52 -0800 (PST), Fred Bloggs
<bloggs.fredbloggs.fred@gmail.com> wrote:

>On Friday, December 2, 2022 at 11:28:06 PM UTC-5, John Larkin wrote: >> On Fri, 2 Dec 2022 15:38:08 -0800 (PST), Fred Bloggs >> <bloggs.fred...@gmail.com> wrote: >> >> >On Friday, December 2, 2022 at 6:22:48 PM UTC-5, John Larkin wrote: >> >> On Fri, 2 Dec 2022 15:09:26 -0800 (PST), Fred Bloggs >> >> <bloggs.fred...@gmail.com> wrote: >> >> >> >> >On Friday, December 2, 2022 at 1:08:08 PM UTC-5, John Larkin wrote: >> >> >> On Fri, 2 Dec 2022 10:53:51 -0700, Don Y <blocked...@foo.invalid> >> >> >> wrote: >> >> >> >We're making devices that are typically built from 3+ "modules" >> >> >> >(form factor is highly constrained so board space is similarly). >> >> >> > >> >> >> >Presently looking at the cost-benefit assessment of burning-in >> >> >> >the modules and/or assembled devices. >> >> >> > >> >> >> >*Modules* will be made offshore so any test/burnin has to be part >> >> >> >of the quoted manufacturing cost. >> >> >> > >> >> >> >Modules will be "post processed" domestically to install final >> >> >> >firmware, S/Ns, private keys and watermarks. This allows for >> >> >> >all of that information to be tracked, here (instead of relying >> >> >> >on an offshore vendor who may decide to copy IP). >> >> >> > >> >> >> >Aside from installing the above, the only real "mechanical" >> >> >> >modifications are connecting modules together and packaging. >> >> >> > >> >> >> >So, a "final test" can just verify proper operation of >> >> >> >the *device* (instead of its constituent modules). >> >> >> > >> >> >> >If modules are burned-in and tested prior to acceptance, domesticly, >> >> >> >the number of failures after assembly should be minimal. (Yet, >> >> >> >you'd still want to verify proper operation as the cost to repair >> >> >> >or replace far exceeds the price of the devices) >> >> >> > >> >> >> >OTOH, an assembly problem that could manifest during burn-in >> >> >> >would want some assurances that it wouldn't sneek past final >> >> >> >inspection in the absence of a post-assembly burn-in phase. >> >> >> > >> >> >> >At the very least, this sort of question would apply to anyone who >> >> >> >installs firmware after offshore manufacturing. Or, assembles >> >> >> >subsystems sourced offshore. >> >> >> > >> >> >> >So, what is the best practices guidance? >> >> >> We were doing overnight burnin+test of our products, after automated >> >> >> test and cal, but the failure rate was zero. Useful burnin might take >> >> >> weeks and temperature cycling or something expensive like that. >> >> >> >> >> >> Temperature cycling is probably the biggest stressor of parts and >> >> >> solder joints and design margins. Shock+vibration next. Just benign >> >> >> burnin doesn't seem to do much. >> >> > >> >> >Infant Mortality--The Lesser Known Reliability Issue >> >> >https://ieeexplore.ieee.org/document/4274831 >> >> Paywalled. What's a reasonable burn time to catch infant mortality? If >> >> it's months, it wouldn't be practical for commercial gear. >> > >> >It's not the only game in town. There's tons literature on infant mortality with testing techniques dating to WW2. >> Military electronics used to require JAN/TX transistors, carefully >> assembled and tested and burned in and fabulously expensive. Then >> someone determined that regular equivalents were more reliable. > >Their main strength was hermetic encapsulation. That was something they knew how to do. Now look at them, installing parts salvaged off junk consumer products from China, washed off in some dirty river, after being removed en masse from a circuit board with a gas torch, and then re-labeled with some JAN code. The main criteria for acceptance was they looked shiny and new.
Plastic packaged transistors turned out to be just as good as TO-can parts too. We buy real parts from authorized distributors. It's amazing how much more reliable parts are now, vs the earlier days of ICs. We rarely have a bad part on newly built boards. When we see a pattern of "bad parts" it usually turns out to be a design issue.
Reply by Fred Bloggs December 3, 20222022-12-03
On Friday, December 2, 2022 at 6:22:48 PM UTC-5, John Larkin wrote:
> On Fri, 2 Dec 2022 15:09:26 -0800 (PST), Fred Bloggs > <bloggs.fred...@gmail.com> wrote: > > >On Friday, December 2, 2022 at 1:08:08 PM UTC-5, John Larkin wrote: > >> On Fri, 2 Dec 2022 10:53:51 -0700, Don Y <blocked...@foo.invalid> > >> wrote: > >> >We're making devices that are typically built from 3+ "modules" > >> >(form factor is highly constrained so board space is similarly). > >> > > >> >Presently looking at the cost-benefit assessment of burning-in > >> >the modules and/or assembled devices. > >> > > >> >*Modules* will be made offshore so any test/burnin has to be part > >> >of the quoted manufacturing cost. > >> > > >> >Modules will be "post processed" domestically to install final > >> >firmware, S/Ns, private keys and watermarks. This allows for > >> >all of that information to be tracked, here (instead of relying > >> >on an offshore vendor who may decide to copy IP). > >> > > >> >Aside from installing the above, the only real "mechanical" > >> >modifications are connecting modules together and packaging. > >> > > >> >So, a "final test" can just verify proper operation of > >> >the *device* (instead of its constituent modules). > >> > > >> >If modules are burned-in and tested prior to acceptance, domesticly, > >> >the number of failures after assembly should be minimal. (Yet, > >> >you'd still want to verify proper operation as the cost to repair > >> >or replace far exceeds the price of the devices) > >> > > >> >OTOH, an assembly problem that could manifest during burn-in > >> >would want some assurances that it wouldn't sneek past final > >> >inspection in the absence of a post-assembly burn-in phase. > >> > > >> >At the very least, this sort of question would apply to anyone who > >> >installs firmware after offshore manufacturing. Or, assembles > >> >subsystems sourced offshore. > >> > > >> >So, what is the best practices guidance? > >> We were doing overnight burnin+test of our products, after automated > >> test and cal, but the failure rate was zero. Useful burnin might take > >> weeks and temperature cycling or something expensive like that. > >> > >> Temperature cycling is probably the biggest stressor of parts and > >> solder joints and design margins. Shock+vibration next. Just benign > >> burnin doesn't seem to do much. > > > >Infant Mortality--The Lesser Known Reliability Issue > >https://ieeexplore.ieee.org/document/4274831 > Paywalled. What's a reasonable burn time to catch infant mortality? If > it's months, it wouldn't be practical for commercial gear.
Right. FedEx and UPS will put your gear through the shake, rattle and roll test. Maybe do some dummy shipments out and back and see how that works.