Electronics-Related.com
Forums

Tracking bug report frequency

Started by Don Y September 4, 2023
On Tue, 5 Sep 2023 17:47:41 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 05/09/2023 16:57, John Larkin wrote: >> On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> >>> On 04/09/2023 14:30, Don Y wrote: >>>> Anyone else use bug reporting frequency as a gross indicator >>>> of system stability? >>> >>> Just about everyone who runs a beta test program. >>> MTBF is another metric that can be used for something that is intended >>> to run 24/7 and recover gracefully from anything that may happen to it. >>> >>> It is inevitable that a new release will have some bugs and minor >>> differences from its predecessor that real life users will find PDQ. >> >> That's the story of software: bugs are inevitable, so why bother to be >> careful coding or testing? You can always wait for bug reports from >> users and post regular fixes of the worst ones. > >Don't blame the engineers for that - it is the ship it and be damned >senior management that is responsible for most buggy code being shipped. >Even more so now that 1+GB upgrades are essentially free. :( > >First to market is worth enough that people live with buggy code. The >worst major release I can recall in a very long time was MS Excel 2007 >(although bugs in Vista took a lot more flack - rather unfairly IMHO). > >(which reminds me it is a MS patch Tuesday today) > >>> The trick is to gain enough information from each in service failure to >>> identify and fix the root cause bug in a single iteration and without >>> breaking something else. Modern optimisers make that more difficult now >>> than it used to be back when I was involved in commercial development. >> >> There have been various drives to write reliable code, but none were >> popular. Quite the contrary, the software world loves abstraction and >> ever new, bizarre languages... namely playing games instead of coding >> boring, reliable applications in some klunky, reliable language. > >The only ones which actually could be truly relied upon used formal >mathematical proof techniques to ensure reliability. Very few >practitioners are able to do it properly and it is pretty much reserved >for ultra high reliability safety and mission critical code. > >It could be all be done to that standard iff commercial developers and >their customers were prepared to pay for it. However, they want it now >and they keep changing their minds about what it is they actually want >so the goalposts are forever shifting around. That sort of functionality >creep is much less common in hardware. > >UK's NATS system is supposedly 6 sigma coding but its misbehaviour on >Bank Holiday Monday peak travel time was somewhat disastrous. It seems >someone managed to input the halt and catch fire instruction and the >buffers ran out before they were able to fix it. There will be a >technical report out in due course - my guess is that they have reduced >overheads and no longer have some of the key people who understand its >internals. Malformed flight plan data should not have been able to kill >it stone dead - but apparently that is exactly what happened! > >https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790 >(might be paywalled) > >If so Google "UK air traffic control outage caused by unusual flight >plan data" > >> Electronic design, and FPGA coding, are intended to be bug-free first >> pass and often are, when done right. > >But using design and simulation *software* that you fail to acknowledge >is actually pretty good. If you had to do it with pencil and paper your >would be there forever.
We did serious electronic design without simulation, and most of it worked first time, or had dumb mistake hard failures that were easily hacked. It didn't take forever. If one didn't understand some part or circuit, it could be breadboarded and tested.
> >> FPGAs are halfway software, so the coders tend to be less careful than >> hardware designers. FPGA bug fixes are easy, so why bother to read >> your own code? >> >> That's ironic, when you think about it. The hardest bits, the physical >> electronics, has the least bugs. > >So do physical mechanical interlocks. I don't trust software or even >electronic interlocks to protect me compared to a damn great beam stop >and a padlock on it with the key in my pocket.
On 9/5/2023 9:47 AM, Martin Brown wrote:
> Don't blame the engineers for that - it is the ship it and be damned senior > management that is responsible for most buggy code being shipped. Even more so > now that 1+GB upgrades are essentially free. :(
Note how the latest coding styles inherently acknowledge that. Agile? How-to-write-code-without-knowing-what-it-has-to-do?
> First to market is worth enough that people live with buggy code. The worst
Of course! Anyone think their Windows/Linux box is bug-free? USENET client? Browser? yet, somehow, they all seem to provide real value to their users!
> major release I can recall in a very long time was MS Excel 2007 (although bugs > in Vista took a lot more flack - rather unfairly IMHO).
Of course. Folks run Linux with 20M+ LoC? So, a ballpark estimate of 20K+ *bugs* in the RELEASED product?? <https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Linux_kernel_map.png> The era of monolithic kernels is over. Unless folks keep wanting to DONATE their time to maintaining them. <https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Redevelopment_costs_of_Linux_kernel.png> Amusing that it's pursuing a 50 year old dream... (let's get together an effort to recreate the Wright flyer so we can all take 100 yard flights!)
> (which reminds me it is a MS patch Tuesday today)
Surrender your internet connection, for the day...
> The only ones which actually could be truly relied upon used formal > mathematical proof techniques to ensure reliability. Very few practitioners are > able to do it properly and it is pretty much reserved for ultra high > reliability safety and mission critical code.
And only applies to the smallest parts of the codebase. The "engineering" comes in figuring out how to live with systems that aren't verifiable. (you can't ensure hardware WILL work as advertised unless you have tested every component that you put into the fabrication -- ah, but you can blame someone else for YOUR system's failure)
> It could be all be done to that standard iff commercial developers and their > customers were prepared to pay for it. However, they want it now and they keep > changing their minds about what it is they actually want so the goalposts are > forever shifting around. That sort of functionality creep is much less common > in hardware.
Exactly. And, software often is told to COMPENSATE for hardware shortcomings. One of the sound systems used in early video games used a CVSD as an ARB. But, the idiot who designed the hardware was 200% clueless about how the software would use the hardware. So, the (dedicated!) processor had to sit in a tight loop SHIFTING bits into the CVSD. Of course, each path through the loop had to be balanced in terms of execution time lest you get a beat component (as every 8th bit requires a new byte to be fetched -- which takes a different amount of time than shifting the current byte by one bit). Hardware designers are typically clueless as to how their decisions impact the software. And, as the company may have invested a "couple of kilobucks" on a design and layout, Manglement's shortsightedness fails to realize the tens of kilobucks that their penny-pinching will cost! [I once had a spectacular FAIL in a bit of hardware that I designed. It was a custom CPU ("chip"). The guy writing the code (and the tools to write it!) assumed addresses were byte-oriented. But, the processor was truly a 16b machine and all of the addresses were for 16b objects. So, all of the addresses generated by his tools were exactly twice what they should have been ("Didn't you notice how the LSb was ALWAYS '0'?") Simple fix but embarassing as we each relied on assumptions that seemed natural to us where the wiser approach would have made that statement explicit]
> UK's NATS system is supposedly 6 sigma coding but its misbehaviour on Bank > Holiday Monday peak travel time was somewhat disastrous. It seems someone > managed to input the halt and catch fire instruction and the buffers ran out > before they were able to fix it. There will be a technical report out in due > course - my guess is that they have reduced overheads and no longer have some > of the key people who understand its internals. Malformed flight plan data > should not have been able to kill it stone dead - but apparently that is > exactly what happened!
Lunar landers, etc. Software is complex. Hardware is a walk in the park. For anything but a trivial piece of code, you can't see all of the interconnects/interdependencies.
> https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790 > (might be paywalled) > > If so Google "UK air traffic control outage caused by unusual flight plan data" > >> Electronic design, and FPGA coding, are intended to be bug-free first >> pass and often are, when done right. > > But using design and simulation *software* that you fail to acknowledge is > actually pretty good. If you had to do it with pencil and paper your would be > there forever.
When was the last time your calculator PROGRAM produced a verifiable error? And, desktop software is considerably less complex than software used in products where interactions arising from temporal differences can prove unpredictable. We bought a new stove/oven some time ago. Specify which oven, heat source, setpoint temperature and time. START. Ah, but if you want to change the time remaining (because you peeked at the item and realize it could use another few minutes) AND the timer expires WHILE YOU ARE TRYING TO CHANGE IT, the user interface locks up (!). Your recourse is to shut off the oven (abort the process) and then restart it using the settings you just CANCELED. It's easy to see how this can evade testing -- if the test engineer didn't have a good understanding of how the code worked so he could challenge it with specially crafted test cases. When drafting system specifications, I (try to) imagine every situation that can come up and describe how each should be handled. So, the test scaffolding and actual tests can be designed to verify that behavior in the resulting product. [How do you test for the case where the user tries to change the remaining time AS the timer is expiring? How do you test for the case where the process on the remote host crashes AFTER it has received a request for service but before it has acknolwedged that? Or, BEFORE it receives it? Or, WHILE acknowledging it?] Hardware is easy to test: set voltage/current/freq/etc. and observe result. [We purchased a glass titty many years ago. At one point, we turned it on, then off, then on again -- in relatively short order. I guess the guy who designed the power supply hadn't considered this possibility as the magic smoke rushed out of it! How hard can it be to design a power supply???]
>> FPGAs are halfway software, so the coders tend to be less careful than >> hardware designers. FPGA bug fixes are easy, so why bother to read >> your own code? >> >> That's ironic, when you think about it. The hardest bits, the physical >> electronics, has the least bugs.
No, the physical electronics are the EASIEST bits. If designing hardware was so difficult, then the solution to the software "problem" would be to just have all the hardware designers switch over to designing software! Problem solved INSTANTLY! In practice, the problem would be worsened by a few orders of magnitude as they suddenly found themselves living in an opaque world.
> So do physical mechanical interlocks. I don't trust software or even electronic > interlocks to protect me compared to a damn great beam stop and a padlock on it > with the key in my pocket.
Note the miswired motor example, above. If the limit switches had been hardwired, the problem still would have been present as the problem was in the hardware -- the wiring of the motor.
On 9/5/2023 9:45 AM, Joe Gwinn wrote:
> There is a complication. Modern software is tens of millions of lines > of code, far exceeding the inspection capabilities of humans. Hardware > is far simpler in terms of lines of FPGA code. But it's creeping up.
Even small projects defy hardware implementations. BUILD a speech synthesizer, entirely out of hardware. Make sure there is a way the user can adjust the voice their individual liking. (*you*, not your TEAM, have 3 months to produce a working prototype). Or, something that recognizes faces, voices, etc. Or, something that knows which plants should be watered, today (if any), and how much water to dispense. Or, something that examines the text in a document and flags grammatical and spelling errors. Or...
> On a project some decades ago, the customer wanted us to verify every > path through the code, which was about 100,000 lines (large at the > time) of C or assembler (don't recall, doesn't actually matter). > > In round numbers, one in five lines of code is an IF statement, so in > 100,000 lines of code there will be 20,000 IF statements. So, there > are up to 2^20000 unique paths through the code. Which chokes my HP > calculator, so we must resort to logarithms, yielding 10^6021, which > is a *very* large number. The age of the Universe is only 14 billion > years, call it 10^10 years, so one would never be able to test even a > tiny fraction of the possible paths.
The *first* problem is codifying how the code should behave in *each* of those test cases.
> The customer withdrew the requirement.
"Verify your sqrt() function produces correct answers over the range of inputs"
On 9/5/2023 10:02 AM, Martin Brown wrote:
>> In round numbers, one in five lines of code is an IF statement, so in >> 100,000 lines of code there will be 20,000 IF statements.&nbsp; So, there >> are up to 2^20000 unique paths through the code.&nbsp; Which chokes my HP > > Although that is true it is also true that a small number of cunningly > constructed test datasets can explore a very high proportion of the most > frequently traversed paths in any given codebase. One snag is that testing is > invariably cut short by management when development overruns.
"We'll fix it in version 2" I always found this an amusing delusion. If the product is successful, there will be lots of people clamoring for fixes so you won't have any manpower to devote to designing version 2 (but your competitors will see the appeal your product has and will start designing THEIR replacement for it!) If the product is a dud (possibly because of these problems), there won't be a need for a version 2.
> The bits that fail to get explored tend to be weird error recovery routines. I
Because, by design, they are seldom encountered. So, don't benefit from being exercised in the normal course of operation.
> recall one latent on the VAX for ages which was that when it ran out of IO > handles (because someone was opening them inside a loop) the first thing the > recovery routine tried to do was open an IO channel! > >> calculator, so we must resort to logarithms, yielding 10^6021, which >> is a *very* large number.&nbsp; The age of the Universe is only 14 billion >> years, call it 10^10 years, so one would never be able to test even a >> tiny fraction of the possible paths. > > McCabe's complexity metric provides a way to test paths in components and > subsystems reasonably thoroughly and catch most of the common programmer > errors. Static dataflow analysis is also a lot better now than in the past.
But some test cases can mask other paths through the code. There is no guarantee that a given piece of code *can* be thoroughly tested -- especially if you take into account the fact that the underlying hardware isn't infallible; "if (x % )" can yield one result, now, and a different result, 5 lines later -- even though x hasn't been altered (but the hardware farted). So: if (x % 2) { do this; do that; do another_thing; } else { do that; } can execute differently than: if (x % 2) { do this; } do that; if (x % 2) { do another_thing; } Years ago, this possibility wasn't ever considered. [Yes, optimizers can twiddle this but the point remains] And, that doesn't begin to address hostile actors in a system!
> Then you only need at most 40000 test vectors to take each branch of every > binary if statement (60000 if it is Fortran with 3 way branches all used). That > is a rather more tractable number (although still large). > > Any routine with too high a CCI count is practically certain to contain latent > bugs - which makes it worth looking at more carefully.
"A 'program' should fit on a single piece of paper"
On Tue, 5 Sep 2023 10:44:08 -0700, Don Y <blockedofcourse@foo.invalid>
wrote:

>On 9/5/2023 9:47 AM, Martin Brown wrote: >> Don't blame the engineers for that - it is the ship it and be damned senior >> management that is responsible for most buggy code being shipped. Even more so >> now that 1+GB upgrades are essentially free. :( > >Note how the latest coding styles inherently acknowledge that. >Agile? How-to-write-code-without-knowing-what-it-has-to-do? > >> First to market is worth enough that people live with buggy code. The worst > >Of course! Anyone think their Windows/Linux box is bug-free? >USENET client? Browser? yet, somehow, they all seem to provide >real value to their users! > >> major release I can recall in a very long time was MS Excel 2007 (although bugs >> in Vista took a lot more flack - rather unfairly IMHO). > >Of course. Folks run Linux with 20M+ LoC? So, a ballpark estimate >of 20K+ *bugs* in the RELEASED product?? > ><https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Linux_kernel_map.png> > >The era of monolithic kernels is over. Unless folks keep wanting >to DONATE their time to maintaining them. > ><https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Redevelopment_costs_of_Linux_kernel.png> > >Amusing that it's pursuing a 50 year old dream... (let's get together >an effort to recreate the Wright flyer so we can all take 100 yard flights!) > >> (which reminds me it is a MS patch Tuesday today) > >Surrender your internet connection, for the day... > >> The only ones which actually could be truly relied upon used formal >> mathematical proof techniques to ensure reliability. Very few practitioners are >> able to do it properly and it is pretty much reserved for ultra high >> reliability safety and mission critical code. > >And only applies to the smallest parts of the codebase. The "engineering" >comes in figuring out how to live with systems that aren't verifiable. >(you can't ensure hardware WILL work as advertised unless you have tested >every component that you put into the fabrication -- ah, but you can blame >someone else for YOUR system's failure) > >> It could be all be done to that standard iff commercial developers and their >> customers were prepared to pay for it. However, they want it now and they keep >> changing their minds about what it is they actually want so the goalposts are >> forever shifting around. That sort of functionality creep is much less common >> in hardware. > >Exactly. And, software often is told to COMPENSATE for hardware shortcomings. > >One of the sound systems used in early video games used a CVSD as an ARB. >But, the idiot who designed the hardware was 200% clueless about how the >software would use the hardware. So, the (dedicated!) processor had >to sit in a tight loop SHIFTING bits into the CVSD. Of course, each >path through the loop had to be balanced in terms of execution time >lest you get a beat component (as every 8th bit requires a new byte >to be fetched -- which takes a different amount of time than shifting >the current byte by one bit). > >Hardware designers are typically clueless as to how their decisions >impact the software. And, as the company may have invested a "couple >of kilobucks" on a design and layout, Manglement's shortsightedness >fails to realize the tens of kilobucks that their penny-pinching >will cost! > >[I once had a spectacular FAIL in a bit of hardware that I designed. >It was a custom CPU ("chip"). The guy writing the code (and the >tools to write it!) assumed addresses were byte-oriented. But, >the processor was truly a 16b machine and all of the addresses >were for 16b objects. So, all of the addresses generated by his tools >were exactly twice what they should have been ("Didn't you notice >how the LSb was ALWAYS '0'?") Simple fix but embarassing as we each >relied on assumptions that seemed natural to us where the wiser >approach would have made that statement explicit] > >> UK's NATS system is supposedly 6 sigma coding but its misbehaviour on Bank >> Holiday Monday peak travel time was somewhat disastrous. It seems someone >> managed to input the halt and catch fire instruction and the buffers ran out >> before they were able to fix it. There will be a technical report out in due >> course - my guess is that they have reduced overheads and no longer have some >> of the key people who understand its internals. Malformed flight plan data >> should not have been able to kill it stone dead - but apparently that is >> exactly what happened! > >Lunar landers, etc. Software is complex. Hardware is a walk in the >park. For anything but a trivial piece of code, you can't see all of the >interconnects/interdependencies. > >> https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790 >> (might be paywalled) >> >> If so Google "UK air traffic control outage caused by unusual flight plan data" >> >>> Electronic design, and FPGA coding, are intended to be bug-free first >>> pass and often are, when done right. >> >> But using design and simulation *software* that you fail to acknowledge is >> actually pretty good. If you had to do it with pencil and paper your would be >> there forever. > >When was the last time your calculator PROGRAM produced a verifiable error? >And, desktop software is considerably less complex than software used >in products where interactions arising from temporal differences can >prove unpredictable. > >We bought a new stove/oven some time ago. Specify which oven, heat source, >setpoint temperature and time. START. > >Ah, but if you want to change the time remaining (because you peeked >at the item and realize it could use another few minutes) AND the >timer expires WHILE YOU ARE TRYING TO CHANGE IT, the user interface >locks up (!). Your recourse is to shut off the oven (abort the >process) and then restart it using the settings you just CANCELED. > >It's easy to see how this can evade testing -- if the test engineer >didn't have a good understanding of how the code worked so he >could challenge it with specially crafted test cases. > >When drafting system specifications, I (try to) imagine every >situation that can come up and describe how each should be handled. >So, the test scaffolding and actual tests can be designed to verify >that behavior in the resulting product. > >[How do you test for the case where the user tries to change the >remaining time AS the timer is expiring? How do you test for the >case where the process on the remote host crashes AFTER it has >received a request for service but before it has acknolwedged >that? Or, BEFORE it receives it? Or, WHILE acknowledging it?] > >Hardware is easy to test: set voltage/current/freq/etc. and >observe result. > >[We purchased a glass titty many years ago. At one point, we turned >it on, then off, then on again -- in relatively short order. I >guess the guy who designed the power supply hadn't considered this >possibility as the magic smoke rushed out of it! How hard can it >be to design a power supply???] > >>> FPGAs are halfway software, so the coders tend to be less careful than >>> hardware designers. FPGA bug fixes are easy, so why bother to read >>> your own code? >>> >>> That's ironic, when you think about it. The hardest bits, the physical >>> electronics, has the least bugs. > >No, the physical electronics are the EASIEST bits. If designing >hardware was so difficult, then the solution to the software >"problem" would be to just have all the hardware designers switch >over to designing software! Problem solved INSTANTLY!
The state of software development is a disgrace. We are plagued with absurd user interfaces, hidden states, and massive numbers of bugs. There is no science, math, or discipline to programming. What famous person said that "anybody can learn to code"? One study fould that English majors, on average, were better programmers than CE or CS majors.
> >In practice, the problem would be worsened by a few orders of >magnitude as they suddenly found themselves living in an opaque world. > >> So do physical mechanical interlocks. I don't trust software or even electronic >> interlocks to protect me compared to a damn great beam stop and a padlock on it >> with the key in my pocket. > >Note the miswired motor example, above. If the limit switches had >been hardwired, the problem still would have been present as the >problem was in the hardware -- the wiring of the motor.
I wonder if the programmer had ever wired or worked with actual motors. One of our neighbors is a highly-paid Apple software engineer and might kill himself if you handed him a screwdriver. He is entirely clueless about electricity. We always consider user wiring error effects, as in a recent remote-sense power supply. No connection can damage it or make the voltage go more than 2 volts over or under the programmed value.
On 9/5/2023 10:03 AM, Don Y wrote:
> Good problem decomposition goes a long way towards that goal. > If you try to do "too much" you quickly overwhelm the developer's > ability to manage complexity (7 items in STM?).&nbsp; And, as you can't > *see* the entire implementation, there's nothing to REMIND you > of some salient issue that might impact your local efforts. > > [Hence the value of eschewing globals and the languages that > tolerate/encourage them!&nbsp; This dramatically cuts down the > number of ways X can influence Y.]
Of course, if you've never had any formal training ("you're just a coder"), then you don't even realize these hazards exist! You just pick at your code until it SEEMS to work and then walk away. Hence the need for the "managed environments" and languages du jour that try to compensate for the lack of formal training in schools and businesses. [I worked with a Fortune *100* company on a 30 man project where The Boss assigned the software for the product to a *technician* whose sole qualification was that he had a CoCo at home! Really? You're putting your good name in the hands of a tinkerer??] Sadly, most businessmen don't understand software or the process and, rather than admit their ignorance, blunder onward wondering (later) why everything turns to shite. Anyone whose had to explain why a "little change" in the product specification requires a major change to the schedule understands the "ignorance at the top". [I had a manager who wrote BASIC programs to keep track of the DOG SHOWS that he'd entered (what is that? just a bunch of PRINT statements??) and considered himself qualified to make decisions regarding the software in the products for which he was responsible. *Anyone* can write code.] And, engineers turned managers tend to be the worst as they THINK they understand the current state of the art (because they used to practice it) without realizing that it's a moving target and if you're using last year's technology, you are 2 or 3 (!) years out of date! Would you promote a *technician* to run an electronics DESIGN department and expect him to be current wrt the latest generation of components, design and manufacturing practices? If he *thought* he was, how quickly would you disabuse him of that belief?
On Tue, 5 Sep 2023 18:02:05 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 05/09/2023 17:45, Joe Gwinn wrote: >> On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin >> <jlarkin@highlandSNIPMEtechnology.com> wrote: >> >>> On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown >>> <'''newspam'''@nonad.co.uk> wrote: >>> >>>> On 04/09/2023 14:30, Don Y wrote: >>>>> Anyone else use bug reporting frequency as a gross indicator >>>>> of system stability? >>>> >>>> Just about everyone who runs a beta test program. >>>> MTBF is another metric that can be used for something that is intended >>>> to run 24/7 and recover gracefully from anything that may happen to it. >>>> >>>> It is inevitable that a new release will have some bugs and minor >>>> differences from its predecessor that real life users will find PDQ. >>> >>> That's the story of software: bugs are inevitable, so why bother to be >>> careful coding or testing? You can always wait for bug reports from >>> users and post regular fixes of the worst ones. >>> >>>> >>>> The trick is to gain enough information from each in service failure to >>>> identify and fix the root cause bug in a single iteration and without >>>> breaking something else. Modern optimisers make that more difficult now >>>> than it used to be back when I was involved in commercial development. >>> >>> There have been various drives to write reliable code, but none were >>> popular. Quite the contrary, the software world loves abstraction and >>> ever new, bizarre languages... namely playing games instead of coding >>> boring, reliable applications in some klunky, reliable language. >>> >>> Electronic design, and FPGA coding, are intended to be bug-free first >>> pass and often are, when done right. >>> >>> FPGAs are halfway software, so the coders tend to be less careful than >>> hardware designers. FPGA bug fixes are easy, so why bother to read >>> your own code? >>> >>> That's ironic, when you think about it. The hardest bits, the physical >>> electronics, has the least bugs. >> >> There is a complication. Modern software is tens of millions of lines >> of code, far exceeding the inspection capabilities of humans. Hardware >> is far simpler in terms of lines of FPGA code. But it's creeping up. >> >> On a project some decades ago, the customer wanted us to verify every >> path through the code, which was about 100,000 lines (large at the >> time) of C or assembler (don't recall, doesn't actually matter). >> >> In round numbers, one in five lines of code is an IF statement, so in >> 100,000 lines of code there will be 20,000 IF statements. So, there >> are up to 2^20000 unique paths through the code. Which chokes my HP > >Although that is true it is also true that a small number of cunningly >constructed test datasets can explore a very high proportion of the most >frequently traversed paths in any given codebase. One snag is that >testing is invariably cut short by management when development overruns. > >The bits that fail to get explored tend to be weird error recovery >routines. I recall one latent on the VAX for ages which was that when it >ran out of IO handles (because someone was opening them inside a loop) >the first thing the recovery routine tried to do was open an IO channel! > >> calculator, so we must resort to logarithms, yielding 10^6021, which >> is a *very* large number. The age of the Universe is only 14 billion >> years, call it 10^10 years, so one would never be able to test even a >> tiny fraction of the possible paths. > >McCabe's complexity metric provides a way to test paths in components >and subsystems reasonably thoroughly and catch most of the common >programmer errors. Static dataflow analysis is also a lot better now >than in the past. > >Then you only need at most 40000 test vectors to take each branch of >every binary if statement (60000 if it is Fortran with 3 way branches >all used). That is a rather more tractable number (although still large). > >Any routine with too high a CCI count is practically certain to contain >latent bugs - which makes it worth looking at more carefully.
I must say that I fail to see how this can overcome 10^6021 paths, even if it is wondrously effective, reducing the space to be tested by a trillion to one (10^-12) - only 10^6009 paths to explore. Joe Gwinn
On Tue, 05 Sep 2023 10:19:15 -0700, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net> >wrote: > >>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin >><jlarkin@highlandSNIPMEtechnology.com> wrote: >> >>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown >>><'''newspam'''@nonad.co.uk> wrote: >>> >>>>On 04/09/2023 14:30, Don Y wrote: >>>>> Anyone else use bug reporting frequency as a gross indicator >>>>> of system stability? >>>> >>>>Just about everyone who runs a beta test program. >>>>MTBF is another metric that can be used for something that is intended >>>>to run 24/7 and recover gracefully from anything that may happen to it. >>>> >>>>It is inevitable that a new release will have some bugs and minor >>>>differences from its predecessor that real life users will find PDQ. >>> >>>That's the story of software: bugs are inevitable, so why bother to be >>>careful coding or testing? You can always wait for bug reports from >>>users and post regular fixes of the worst ones. >>> >>>> >>>>The trick is to gain enough information from each in service failure to >>>>identify and fix the root cause bug in a single iteration and without >>>>breaking something else. Modern optimisers make that more difficult now >>>>than it used to be back when I was involved in commercial development. >>> >>>There have been various drives to write reliable code, but none were >>>popular. Quite the contrary, the software world loves abstraction and >>>ever new, bizarre languages... namely playing games instead of coding >>>boring, reliable applications in some klunky, reliable language. >>> >>>Electronic design, and FPGA coding, are intended to be bug-free first >>>pass and often are, when done right. >>> >>>FPGAs are halfway software, so the coders tend to be less careful than >>>hardware designers. FPGA bug fixes are easy, so why bother to read >>>your own code? >>> >>>That's ironic, when you think about it. The hardest bits, the physical >>>electronics, has the least bugs. >> >>There is a complication. Modern software is tens of millions of lines >>of code, far exceeding the inspection capabilities of humans. > >After you type a line of code, read it. When we did that, entire >applications often worked first try. > >Hardware >>is far simpler in terms of lines of FPGA code. But it's creeping up. > >FPGAs are at least (usually) organized state machines. Mistakes are >typically hard failures, not low-rate bugs discovered in the field. >Avoiding race and metastability hazards is common practise. > >> >>On a project some decades ago, the customer wanted us to verify every >>path through the code, which was about 100,000 lines (large at the >>time) of C or assembler (don't recall, doesn't actually matter). > >Software provability was a brief fad once. It wasn't popular or, as >code is now done, possible. > > >> >>In round numbers, one in five lines of code is an IF statement, so in >>100,000 lines of code there will be 20,000 IF statements. So, there >>are up to 2^20000 unique paths through the code. Which chokes my HP >>calculator, so we must resort to logarithms, yielding 10^6021, which >>is a *very* large number. The age of the Universe is only 14 billion >>years, call it 10^10 years, so one would never be able to test even a >>tiny fraction of the possible paths. > >An FPGA is usually coded as a state machine, where the designer >understands that the machine has a finite number of states and handles >every one. A computer program has an impossibly large number of >states, unknown and certainly not managed. Code is like hairball async >logic design.
In recent FPGAs you have done, how many states and events (their Cartesian product being the entire state table) are there? By the way, back in the day when I was specifying state machines (often for implementation in software), I had a rule that all cells would have an entry, even the combinations of state and event that "couldn't happen". This was essential for achieving robustness in practice.
>>The customer withdrew the requirement. > >It was naive of him to want correct code.
No, only a bit unrealistic. But it was naive of him to think that total correctness can be tested into anything. The state of the art in verifying safety-critical code (as in for safety of flight) is DO-178, which is an immensely heavy process. The original objective was a probability of error not exceeding 10^-6, this has been tightened to 10^-7 or 10^-8 because of the "headline risk". .<https://en.wikipedia.org/wiki/DO-178C> Correctness can be mathematically proven only for extremely simple mechanisms, using a sharply restricted set of allowed operations. See The Halting Problem. .<https://en.wikipedia.org/wiki/Halting_problem#:~:text=The%20halting%20problem%20is%20undecidable,usually%20via%20a%20Turing%20machine.> Joe Gwinn
On Tue, 05 Sep 2023 18:33:47 -0400, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Tue, 05 Sep 2023 10:19:15 -0700, John Larkin ><jlarkin@highlandSNIPMEtechnology.com> wrote: > >>On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net> >>wrote: >> >>>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin >>><jlarkin@highlandSNIPMEtechnology.com> wrote: >>> >>>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown >>>><'''newspam'''@nonad.co.uk> wrote: >>>> >>>>>On 04/09/2023 14:30, Don Y wrote: >>>>>> Anyone else use bug reporting frequency as a gross indicator >>>>>> of system stability? >>>>> >>>>>Just about everyone who runs a beta test program. >>>>>MTBF is another metric that can be used for something that is intended >>>>>to run 24/7 and recover gracefully from anything that may happen to it. >>>>> >>>>>It is inevitable that a new release will have some bugs and minor >>>>>differences from its predecessor that real life users will find PDQ. >>>> >>>>That's the story of software: bugs are inevitable, so why bother to be >>>>careful coding or testing? You can always wait for bug reports from >>>>users and post regular fixes of the worst ones. >>>> >>>>> >>>>>The trick is to gain enough information from each in service failure to >>>>>identify and fix the root cause bug in a single iteration and without >>>>>breaking something else. Modern optimisers make that more difficult now >>>>>than it used to be back when I was involved in commercial development. >>>> >>>>There have been various drives to write reliable code, but none were >>>>popular. Quite the contrary, the software world loves abstraction and >>>>ever new, bizarre languages... namely playing games instead of coding >>>>boring, reliable applications in some klunky, reliable language. >>>> >>>>Electronic design, and FPGA coding, are intended to be bug-free first >>>>pass and often are, when done right. >>>> >>>>FPGAs are halfway software, so the coders tend to be less careful than >>>>hardware designers. FPGA bug fixes are easy, so why bother to read >>>>your own code? >>>> >>>>That's ironic, when you think about it. The hardest bits, the physical >>>>electronics, has the least bugs. >>> >>>There is a complication. Modern software is tens of millions of lines >>>of code, far exceeding the inspection capabilities of humans. >> >>After you type a line of code, read it. When we did that, entire >>applications often worked first try. >> >>Hardware >>>is far simpler in terms of lines of FPGA code. But it's creeping up. >> >>FPGAs are at least (usually) organized state machines. Mistakes are >>typically hard failures, not low-rate bugs discovered in the field. >>Avoiding race and metastability hazards is common practise. >> >>> >>>On a project some decades ago, the customer wanted us to verify every >>>path through the code, which was about 100,000 lines (large at the >>>time) of C or assembler (don't recall, doesn't actually matter). >> >>Software provability was a brief fad once. It wasn't popular or, as >>code is now done, possible. >> >> >>> >>>In round numbers, one in five lines of code is an IF statement, so in >>>100,000 lines of code there will be 20,000 IF statements. So, there >>>are up to 2^20000 unique paths through the code. Which chokes my HP >>>calculator, so we must resort to logarithms, yielding 10^6021, which >>>is a *very* large number. The age of the Universe is only 14 billion >>>years, call it 10^10 years, so one would never be able to test even a >>>tiny fraction of the possible paths. >> >>An FPGA is usually coded as a state machine, where the designer >>understands that the machine has a finite number of states and handles >>every one. A computer program has an impossibly large number of >>states, unknown and certainly not managed. Code is like hairball async >>logic design. > >In recent FPGAs you have done, how many states and events (their >Cartesian product being the entire state table) are there?
A useful state machine might have 4 or maybe 16 states. I'm not sure what you mean by 'events'. Sometimes we have a state word and a counter, which technically gives us more states but it's convnient to think of them separately. As in "repeat state 4 until the counter hits zero." A state machine can have many more inputs and outputs than it has states. It is critical that no inputs can be changing when the clock ticks.
On Tue, 05 Sep 2023 17:00:13 -0700, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Tue, 05 Sep 2023 18:33:47 -0400, Joe Gwinn <joegwinn@comcast.net> >wrote: > >>On Tue, 05 Sep 2023 10:19:15 -0700, John Larkin >><jlarkin@highlandSNIPMEtechnology.com> wrote: >> >>>On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net> >>>wrote: >>> >>>>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin >>>><jlarkin@highlandSNIPMEtechnology.com> wrote: >>>> >>>>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown >>>>><'''newspam'''@nonad.co.uk> wrote: >>>>> >>>>>>On 04/09/2023 14:30, Don Y wrote: >>>>>>> Anyone else use bug reporting frequency as a gross indicator >>>>>>> of system stability? >>>>>> >>>>>>Just about everyone who runs a beta test program. >>>>>>MTBF is another metric that can be used for something that is intended >>>>>>to run 24/7 and recover gracefully from anything that may happen to it. >>>>>> >>>>>>It is inevitable that a new release will have some bugs and minor >>>>>>differences from its predecessor that real life users will find PDQ. >>>>> >>>>>That's the story of software: bugs are inevitable, so why bother to be >>>>>careful coding or testing? You can always wait for bug reports from >>>>>users and post regular fixes of the worst ones. >>>>> >>>>>> >>>>>>The trick is to gain enough information from each in service failure to >>>>>>identify and fix the root cause bug in a single iteration and without >>>>>>breaking something else. Modern optimisers make that more difficult now >>>>>>than it used to be back when I was involved in commercial development. >>>>> >>>>>There have been various drives to write reliable code, but none were >>>>>popular. Quite the contrary, the software world loves abstraction and >>>>>ever new, bizarre languages... namely playing games instead of coding >>>>>boring, reliable applications in some klunky, reliable language. >>>>> >>>>>Electronic design, and FPGA coding, are intended to be bug-free first >>>>>pass and often are, when done right. >>>>> >>>>>FPGAs are halfway software, so the coders tend to be less careful than >>>>>hardware designers. FPGA bug fixes are easy, so why bother to read >>>>>your own code? >>>>> >>>>>That's ironic, when you think about it. The hardest bits, the physical >>>>>electronics, has the least bugs. >>>> >>>>There is a complication. Modern software is tens of millions of lines >>>>of code, far exceeding the inspection capabilities of humans. >>> >>>After you type a line of code, read it. When we did that, entire >>>applications often worked first try. >>> >>>Hardware >>>>is far simpler in terms of lines of FPGA code. But it's creeping up. >>> >>>FPGAs are at least (usually) organized state machines. Mistakes are >>>typically hard failures, not low-rate bugs discovered in the field. >>>Avoiding race and metastability hazards is common practise. >>> >>>> >>>>On a project some decades ago, the customer wanted us to verify every >>>>path through the code, which was about 100,000 lines (large at the >>>>time) of C or assembler (don't recall, doesn't actually matter). >>> >>>Software provability was a brief fad once. It wasn't popular or, as >>>code is now done, possible. >>> >>> >>>> >>>>In round numbers, one in five lines of code is an IF statement, so in >>>>100,000 lines of code there will be 20,000 IF statements. So, there >>>>are up to 2^20000 unique paths through the code. Which chokes my HP >>>>calculator, so we must resort to logarithms, yielding 10^6021, which >>>>is a *very* large number. The age of the Universe is only 14 billion >>>>years, call it 10^10 years, so one would never be able to test even a >>>>tiny fraction of the possible paths. >>> >>>An FPGA is usually coded as a state machine, where the designer >>>understands that the machine has a finite number of states and handles >>>every one. A computer program has an impossibly large number of >>>states, unknown and certainly not managed. Code is like hairball async >>>logic design. >> >>In recent FPGAs you have done, how many states and events (their >>Cartesian product being the entire state table) are there? > > >A useful state machine might have 4 or maybe 16 states. I'm not sure >what you mean by 'events'. Sometimes we have a state word and a >counter, which technically gives us more states but it's convnient to >think of them separately. As in "repeat state 4 until the counter hits >zero."
We'll call it 16 states for the present purposes. An event is anything that can cause the state to change, including expiration of a timer. This is basically a design choice.
>A state machine can have many more inputs and outputs than it has >states.
Yes, that's typical.
> It is critical that no inputs can be changing when the clock >ticks.
That's also essential in hardware state machines. In software state machines, events are most often the arrival of messages, and the mechanism that provides these messages ensures that they are presented in serial order (even if the underlying hardware does not ensure ordering). Joe Gwinn