sci.electronics.design | Tracking bug report frequency| page 2

Reply by John Larkin ●September 5, 20232023-09-05

On Tue, 5 Sep 2023 17:47:41 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 05/09/2023 16:57, John Larkin wrote:
>> On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
>>> On 04/09/2023 14:30, Don Y wrote:
>>>> Anyone else use bug reporting frequency as a gross indicator
>>>> of system stability?
>>>
>>> Just about everyone who runs a beta test program.
>>> MTBF is another metric that can be used for something that is intended
>>> to run 24/7 and recover gracefully from anything that may happen to it.
>>>
>>> It is inevitable that a new release will have some bugs and minor
>>> differences from its predecessor that real life users will find PDQ.
>> 
>> That's the story of software: bugs are inevitable, so why bother to be
>> careful coding or testing? You can always wait for bug reports from
>> users and post regular fixes of the worst ones.
>
>Don't blame the engineers for that - it is the ship it and be damned 
>senior management that is responsible for most buggy code being shipped. 
>Even more so now that 1+GB upgrades are essentially free. :(
>
>First to market is worth enough that people live with buggy code. The 
>worst major release I can recall in a very long time was MS Excel 2007 
>(although bugs in Vista took a lot more flack - rather unfairly IMHO).
>
>(which reminds me it is a MS patch Tuesday today)
>
>>> The trick is to gain enough information from each in service failure to
>>> identify and fix the root cause bug in a single iteration and without
>>> breaking something else. Modern  optimisers make that more difficult now
>>> than it used to be back when I was involved in commercial development.
>> 
>> There have been various drives to write reliable code, but none were
>> popular. Quite the contrary, the software world loves abstraction and
>> ever new, bizarre languages... namely playing games instead of coding
>> boring, reliable applications in some klunky, reliable language.
>
>The only ones which actually could be truly relied upon used formal 
>mathematical proof techniques to ensure reliability. Very few 
>practitioners are able to do it properly and it is pretty much reserved 
>for ultra high reliability safety and mission critical code.
>
>It could be all be done to that standard iff commercial developers and 
>their customers were prepared to pay for it. However, they want it now 
>and they keep changing their minds about what it is they actually want 
>so the goalposts are forever shifting around. That sort of functionality 
>creep is much less common in hardware.
>
>UK's NATS system is supposedly 6 sigma coding but its misbehaviour on 
>Bank Holiday Monday peak travel time was somewhat disastrous. It seems 
>someone managed to input the halt and catch fire instruction and the 
>buffers ran out before they were able to fix it. There will be a 
>technical report out in due course - my guess is that they have reduced 
>overheads and no longer have some of the key people who understand its 
>internals. Malformed flight plan data should not have been able to kill 
>it stone dead - but apparently that is exactly what happened!
>
>https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790
>(might be paywalled)
>
>If so Google "UK air traffic control outage caused by unusual flight 
>plan data"
>
>> Electronic design, and FPGA coding, are intended to be bug-free first
>> pass and often are, when done right.
>
>But using design and simulation *software* that you fail to acknowledge 
>is actually pretty good. If you had to do it with pencil and paper your 
>would be there forever.

We did serious electronic design without simulation, and most of it
worked first time, or had dumb mistake hard failures that were easily
hacked. It didn't take forever. If one didn't understand some part or
circuit, it could be breadboarded and tested.



>
>> FPGAs are halfway software, so the coders tend to be less careful than
>> hardware designers. FPGA bug fixes are easy, so why bother to read
>> your own code?
>> 
>> That's ironic, when you think about it. The hardest bits, the physical
>> electronics, has the least bugs.
>
>So do physical mechanical interlocks. I don't trust software or even 
>electronic interlocks to protect me compared to a damn great beam stop 
>and a padlock on it with the key in my pocket.

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 9:47 AM, Martin Brown wrote:
> Don't blame the engineers for that - it is the ship it and be damned senior 
> management that is responsible for most buggy code being shipped. Even more so 
> now that 1+GB upgrades are essentially free. :(

Note how the latest coding styles inherently acknowledge that.
Agile?  How-to-write-code-without-knowing-what-it-has-to-do?

> First to market is worth enough that people live with buggy code. The worst 

Of course!  Anyone think their Windows/Linux box is bug-free?
USENET client?  Browser?  yet, somehow, they all seem to provide
real value to their users!

> major release I can recall in a very long time was MS Excel 2007 (although bugs 
> in Vista took a lot more flack - rather unfairly IMHO).

Of course.  Folks run Linux with 20M+ LoC?  So, a ballpark estimate
of 20K+ *bugs* in the RELEASED product??

<https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Linux_kernel_map.png>

The era of monolithic kernels is over.  Unless folks keep wanting
to DONATE their time to maintaining them.

<https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Redevelopment_costs_of_Linux_kernel.png>

Amusing that it's pursuing a 50 year old dream...  (let's get together
an effort to recreate the Wright flyer so we can all take 100 yard flights!)

> (which reminds me it is a MS patch Tuesday today)

Surrender your internet connection, for the day...

> The only ones which actually could be truly relied upon used formal 
> mathematical proof techniques to ensure reliability. Very few practitioners are 
> able to do it properly and it is pretty much reserved for ultra high 
> reliability safety and mission critical code.

And only applies to the smallest parts of the codebase.  The "engineering"
comes in figuring out how to live with systems that aren't verifiable.
(you can't ensure hardware WILL work as advertised unless you have tested
every component that you put into the fabrication -- ah, but you can blame
someone else for YOUR system's failure)

> It could be all be done to that standard iff commercial developers and their 
> customers were prepared to pay for it. However, they want it now and they keep 
> changing their minds about what it is they actually want so the goalposts are 
> forever shifting around. That sort of functionality creep is much less common 
> in hardware.

Exactly.  And, software often is told to COMPENSATE for hardware shortcomings.

One of the sound systems used in early video games used a CVSD as an ARB.
But, the idiot who designed the hardware was 200% clueless about how the
software would use the hardware.  So, the (dedicated!) processor had
to sit in a tight loop SHIFTING bits into the CVSD.  Of course, each
path through the loop had to be balanced in terms of execution time
lest you get a beat component (as every 8th bit requires a new byte
to be fetched -- which takes a different amount of time than shifting
the current byte by one bit).

Hardware designers are typically clueless as to how their decisions
impact the software.  And, as the company may have invested a "couple
of kilobucks" on a design and layout, Manglement's shortsightedness
fails to realize the tens of kilobucks that their penny-pinching
will cost!

[I once had a spectacular FAIL in a bit of hardware that I designed.
It was a custom CPU ("chip").  The guy writing the code (and the
tools to write it!) assumed addresses were byte-oriented.  But,
the processor was truly a 16b machine and all of the addresses
were for 16b objects.  So, all of the addresses generated by his tools
were exactly twice what they should have been ("Didn't you notice
how the LSb was ALWAYS '0'?")  Simple fix but embarassing as we each
relied on assumptions that seemed natural to us where the wiser
approach would have made that statement explicit]

> UK's NATS system is supposedly 6 sigma coding but its misbehaviour on Bank 
> Holiday Monday peak travel time was somewhat disastrous. It seems someone 
> managed to input the halt and catch fire instruction and the buffers ran out 
> before they were able to fix it. There will be a technical report out in due 
> course - my guess is that they have reduced overheads and no longer have some 
> of the key people who understand its internals. Malformed flight plan data 
> should not have been able to kill it stone dead - but apparently that is 
> exactly what happened!

Lunar landers, etc.  Software is complex.  Hardware is a walk in the
park.  For anything but a trivial piece of code, you can't see all of the
interconnects/interdependencies.

> https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790
> (might be paywalled)
> 
> If so Google "UK air traffic control outage caused by unusual flight plan data"
> 
>> Electronic design, and FPGA coding, are intended to be bug-free first
>> pass and often are, when done right.
> 
> But using design and simulation *software* that you fail to acknowledge is 
> actually pretty good. If you had to do it with pencil and paper your would be 
> there forever.

When was the last time your calculator PROGRAM produced a verifiable error?
And, desktop software is considerably less complex than software used
in products where interactions arising from temporal differences can
prove unpredictable.

We bought a new stove/oven some time ago.  Specify which oven, heat source,
setpoint temperature and time.  START.

Ah, but if you want to change the time remaining (because you peeked
at the item and realize it could use another few minutes) AND the
timer expires WHILE YOU ARE TRYING TO CHANGE IT, the user interface
locks up (!).  Your recourse is to shut off the oven (abort the
process) and then restart it using the settings you just CANCELED.

It's easy to see how this can evade testing -- if the test engineer
didn't have a good understanding of how the code worked so he
could challenge it with specially crafted test cases.

When drafting system specifications, I (try to) imagine every
situation that can come up and describe how each should be handled.
So, the test scaffolding and actual tests can be designed to verify
that behavior in the resulting product.

[How do you test for the case where the user tries to change the
remaining time AS the timer is expiring?  How do you test for the
case where the process on the remote host crashes AFTER it has
received a request for service but before it has acknolwedged
that?  Or, BEFORE it receives it?  Or, WHILE acknowledging it?]

Hardware is easy to test:  set voltage/current/freq/etc. and
observe result.

[We purchased a glass titty many years ago.  At one point, we turned
it on, then off, then on again -- in relatively short order.  I
guess the guy who designed the power supply hadn't considered this
possibility as the magic smoke rushed out of it!  How hard can it
be to design a power supply???]

>> FPGAs are halfway software, so the coders tend to be less careful than
>> hardware designers. FPGA bug fixes are easy, so why bother to read
>> your own code?
>>
>> That's ironic, when you think about it. The hardest bits, the physical
>> electronics, has the least bugs.

No, the physical electronics are the EASIEST bits.  If designing
hardware was so difficult, then the solution to the software
"problem" would be to just have all the hardware designers switch
over to designing software!  Problem solved INSTANTLY!

In practice, the problem would be worsened by a few orders of
magnitude as they suddenly found themselves living in an opaque world.

> So do physical mechanical interlocks. I don't trust software or even electronic 
> interlocks to protect me compared to a damn great beam stop and a padlock on it 
> with the key in my pocket.

Note the miswired motor example, above.  If the limit switches had
been hardwired, the problem still would have been present as the
problem was in the hardware -- the wiring of the motor.

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 9:45 AM, Joe Gwinn wrote:
> There is a complication.  Modern software is tens of millions of lines
> of code, far exceeding the inspection capabilities of humans. Hardware
> is far simpler in terms of lines of FPGA code.  But it's creeping up.

Even small projects defy hardware implementations.

BUILD a speech synthesizer, entirely out of hardware.
Make sure there is a way the user can adjust the voice
their individual liking.  (*you*, not your TEAM, have
3 months to produce a working prototype).

Or, something that recognizes faces, voices, etc.
Or, something that knows which plants should be watered,
today (if any), and how much water to dispense.
Or, something that examines the text in a document
and flags grammatical and spelling errors.
Or...

> On a project some decades ago, the customer wanted us to verify every
> path through the code, which was about 100,000 lines (large at the
> time) of C or assembler (don't recall, doesn't actually matter).
> 
> In round numbers, one in five lines of code is an IF statement, so in
> 100,000 lines of code there will be 20,000 IF statements.  So, there
> are up to 2^20000 unique paths through the code.  Which chokes my HP
> calculator, so we must resort to logarithms, yielding 10^6021, which
> is a *very* large number.  The age of the Universe is only 14 billion
> years, call it 10^10 years, so one would never be able to test even a
> tiny fraction of the possible paths.

The *first* problem is codifying how the code should behave in
*each* of those test cases.

> The customer withdrew the requirement.

"Verify your sqrt() function produces correct answers over the
range of inputs"

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 10:02 AM, Martin Brown wrote:
>> In round numbers, one in five lines of code is an IF statement, so in
>> 100,000 lines of code there will be 20,000 IF statements.&nbsp; So, there
>> are up to 2^20000 unique paths through the code.&nbsp; Which chokes my HP
> 
> Although that is true it is also true that a small number of cunningly 
> constructed test datasets can explore a very high proportion of the most 
> frequently traversed paths in any given codebase. One snag is that testing is 
> invariably cut short by management when development overruns.

"We'll fix it in version 2"

I always found this an amusing delusion.

If the product is successful, there will be lots of people clamoring
for fixes so you won't have any manpower to devote to designing
version 2 (but your competitors will see the appeal your product
has and will start designing THEIR replacement for it!)

If the product is a dud (possibly because of these problems),
there won't be a need for a version 2.

> The bits that fail to get explored tend to be weird error recovery routines. I 

Because, by design, they are seldom encountered.
So, don't benefit from being exercised in the normal
course of operation.

> recall one latent on the VAX for ages which was that when it ran out of IO 
> handles (because someone was opening them inside a loop) the first thing the 
> recovery routine tried to do was open an IO channel!
> 
>> calculator, so we must resort to logarithms, yielding 10^6021, which
>> is a *very* large number.&nbsp; The age of the Universe is only 14 billion
>> years, call it 10^10 years, so one would never be able to test even a
>> tiny fraction of the possible paths.
> 
> McCabe's complexity metric provides a way to test paths in components and 
> subsystems reasonably thoroughly and catch most of the common programmer 
> errors. Static dataflow analysis is also a lot better now than in the past.

But some test cases can mask other paths through the code.
There is no guarantee that a given piece of code *can* be
thoroughly tested -- especially if you take into account the
fact that the underlying hardware isn't infallible;
"if (x % )" can yield one result, now, and a different
result, 5 lines later -- even though x hasn't been
altered (but the hardware farted).

So:

     if (x % 2) {
        do this;
        do that;
        do another_thing;
     } else {
        do that;
     }

can execute differently than:

     if (x % 2) {
        do this;
     }

     do that;

     if (x % 2) {
        do another_thing;
     }

Years ago, this possibility wasn't ever considered.

[Yes, optimizers can twiddle this but the point remains]

And, that doesn't begin to address hostile actors in a
system!

> Then you only need at most 40000 test vectors to take each branch of every 
> binary if statement (60000 if it is Fortran with 3 way branches all used). That 
> is a rather more tractable number (although still large).
> 
> Any routine with too high a CCI count is practically certain to contain latent 
> bugs - which makes it worth looking at more carefully.

"A 'program' should fit on a single piece of paper"

Reply by John Larkin ●September 5, 20232023-09-05

On Tue, 5 Sep 2023 10:44:08 -0700, Don Y <blockedofcourse@foo.invalid>
wrote:

>On 9/5/2023 9:47 AM, Martin Brown wrote:
>> Don't blame the engineers for that - it is the ship it and be damned senior 
>> management that is responsible for most buggy code being shipped. Even more so 
>> now that 1+GB upgrades are essentially free. :(
>
>Note how the latest coding styles inherently acknowledge that.
>Agile?  How-to-write-code-without-knowing-what-it-has-to-do?
>
>> First to market is worth enough that people live with buggy code. The worst 
>
>Of course!  Anyone think their Windows/Linux box is bug-free?
>USENET client?  Browser?  yet, somehow, they all seem to provide
>real value to their users!
>
>> major release I can recall in a very long time was MS Excel 2007 (although bugs 
>> in Vista took a lot more flack - rather unfairly IMHO).
>
>Of course.  Folks run Linux with 20M+ LoC?  So, a ballpark estimate
>of 20K+ *bugs* in the RELEASED product??
>
><https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Linux_kernel_map.png>
>
>The era of monolithic kernels is over.  Unless folks keep wanting
>to DONATE their time to maintaining them.
>
><https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Redevelopment_costs_of_Linux_kernel.png>
>
>Amusing that it's pursuing a 50 year old dream...  (let's get together
>an effort to recreate the Wright flyer so we can all take 100 yard flights!)
>
>> (which reminds me it is a MS patch Tuesday today)
>
>Surrender your internet connection, for the day...
>
>> The only ones which actually could be truly relied upon used formal 
>> mathematical proof techniques to ensure reliability. Very few practitioners are 
>> able to do it properly and it is pretty much reserved for ultra high 
>> reliability safety and mission critical code.
>
>And only applies to the smallest parts of the codebase.  The "engineering"
>comes in figuring out how to live with systems that aren't verifiable.
>(you can't ensure hardware WILL work as advertised unless you have tested
>every component that you put into the fabrication -- ah, but you can blame
>someone else for YOUR system's failure)
>
>> It could be all be done to that standard iff commercial developers and their 
>> customers were prepared to pay for it. However, they want it now and they keep 
>> changing their minds about what it is they actually want so the goalposts are 
>> forever shifting around. That sort of functionality creep is much less common 
>> in hardware.
>
>Exactly.  And, software often is told to COMPENSATE for hardware shortcomings.
>
>One of the sound systems used in early video games used a CVSD as an ARB.
>But, the idiot who designed the hardware was 200% clueless about how the
>software would use the hardware.  So, the (dedicated!) processor had
>to sit in a tight loop SHIFTING bits into the CVSD.  Of course, each
>path through the loop had to be balanced in terms of execution time
>lest you get a beat component (as every 8th bit requires a new byte
>to be fetched -- which takes a different amount of time than shifting
>the current byte by one bit).
>
>Hardware designers are typically clueless as to how their decisions
>impact the software.  And, as the company may have invested a "couple
>of kilobucks" on a design and layout, Manglement's shortsightedness
>fails to realize the tens of kilobucks that their penny-pinching
>will cost!
>
>[I once had a spectacular FAIL in a bit of hardware that I designed.
>It was a custom CPU ("chip").  The guy writing the code (and the
>tools to write it!) assumed addresses were byte-oriented.  But,
>the processor was truly a 16b machine and all of the addresses
>were for 16b objects.  So, all of the addresses generated by his tools
>were exactly twice what they should have been ("Didn't you notice
>how the LSb was ALWAYS '0'?")  Simple fix but embarassing as we each
>relied on assumptions that seemed natural to us where the wiser
>approach would have made that statement explicit]
>
>> UK's NATS system is supposedly 6 sigma coding but its misbehaviour on Bank 
>> Holiday Monday peak travel time was somewhat disastrous. It seems someone 
>> managed to input the halt and catch fire instruction and the buffers ran out 
>> before they were able to fix it. There will be a technical report out in due 
>> course - my guess is that they have reduced overheads and no longer have some 
>> of the key people who understand its internals. Malformed flight plan data 
>> should not have been able to kill it stone dead - but apparently that is 
>> exactly what happened!
>
>Lunar landers, etc.  Software is complex.  Hardware is a walk in the
>park.  For anything but a trivial piece of code, you can't see all of the
>interconnects/interdependencies.
>
>> https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790
>> (might be paywalled)
>> 
>> If so Google "UK air traffic control outage caused by unusual flight plan data"
>> 
>>> Electronic design, and FPGA coding, are intended to be bug-free first
>>> pass and often are, when done right.
>> 
>> But using design and simulation *software* that you fail to acknowledge is 
>> actually pretty good. If you had to do it with pencil and paper your would be 
>> there forever.
>
>When was the last time your calculator PROGRAM produced a verifiable error?
>And, desktop software is considerably less complex than software used
>in products where interactions arising from temporal differences can
>prove unpredictable.
>
>We bought a new stove/oven some time ago.  Specify which oven, heat source,
>setpoint temperature and time.  START.
>
>Ah, but if you want to change the time remaining (because you peeked
>at the item and realize it could use another few minutes) AND the
>timer expires WHILE YOU ARE TRYING TO CHANGE IT, the user interface
>locks up (!).  Your recourse is to shut off the oven (abort the
>process) and then restart it using the settings you just CANCELED.
>
>It's easy to see how this can evade testing -- if the test engineer
>didn't have a good understanding of how the code worked so he
>could challenge it with specially crafted test cases.
>
>When drafting system specifications, I (try to) imagine every
>situation that can come up and describe how each should be handled.
>So, the test scaffolding and actual tests can be designed to verify
>that behavior in the resulting product.
>
>[How do you test for the case where the user tries to change the
>remaining time AS the timer is expiring?  How do you test for the
>case where the process on the remote host crashes AFTER it has
>received a request for service but before it has acknolwedged
>that?  Or, BEFORE it receives it?  Or, WHILE acknowledging it?]
>
>Hardware is easy to test:  set voltage/current/freq/etc. and
>observe result.
>
>[We purchased a glass titty many years ago.  At one point, we turned
>it on, then off, then on again -- in relatively short order.  I
>guess the guy who designed the power supply hadn't considered this
>possibility as the magic smoke rushed out of it!  How hard can it
>be to design a power supply???]
>
>>> FPGAs are halfway software, so the coders tend to be less careful than
>>> hardware designers. FPGA bug fixes are easy, so why bother to read
>>> your own code?
>>>
>>> That's ironic, when you think about it. The hardest bits, the physical
>>> electronics, has the least bugs.
>
>No, the physical electronics are the EASIEST bits.  If designing
>hardware was so difficult, then the solution to the software
>"problem" would be to just have all the hardware designers switch
>over to designing software!  Problem solved INSTANTLY!

The state of software development is a disgrace. We are plagued with
absurd user interfaces, hidden states, and massive numbers of bugs.

There is no science, math, or discipline to programming. What famous
person said that "anybody can learn to code"? One study fould that
English majors, on average, were better programmers than CE or CS
majors.

>
>In practice, the problem would be worsened by a few orders of
>magnitude as they suddenly found themselves living in an opaque world.
>
>> So do physical mechanical interlocks. I don't trust software or even electronic 
>> interlocks to protect me compared to a damn great beam stop and a padlock on it 
>> with the key in my pocket.
>
>Note the miswired motor example, above.  If the limit switches had
>been hardwired, the problem still would have been present as the
>problem was in the hardware -- the wiring of the motor.

I wonder if the programmer had ever wired or worked with actual
motors. One of our neighbors is a highly-paid Apple software engineer
and might kill himself if you handed him a screwdriver. He is entirely
clueless about electricity.

We always consider user wiring error effects, as in a recent
remote-sense power supply. No connection can damage it or make the
voltage go more than 2 volts over or under the programmed value.

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 10:03 AM, Don Y wrote:
> Good problem decomposition goes a long way towards that goal.
> If you try to do "too much" you quickly overwhelm the developer's
> ability to manage complexity (7 items in STM?).&nbsp; And, as you can't
> *see* the entire implementation, there's nothing to REMIND you
> of some salient issue that might impact your local efforts.
> 
> [Hence the value of eschewing globals and the languages that
> tolerate/encourage them!&nbsp; This dramatically cuts down the
> number of ways X can influence Y.]

Of course, if you've never had any formal training ("you're
just a coder"), then you don't even realize these hazards exist!
You just pick at your code until it SEEMS to work and then
walk away.

Hence the need for the "managed environments" and
languages du jour that try to compensate for the
lack of formal training in schools and businesses.

[I worked with a Fortune *100* company on a 30 man project
where The Boss assigned the software for the product to a
*technician* whose sole qualification was that he
had a CoCo at home!  Really?  You're putting your good
name in the hands of a tinkerer??]

Sadly, most businessmen don't understand software or the
process and, rather than admit their ignorance, blunder
onward wondering (later) why everything turns to shite.
Anyone whose had to explain why a "little change" in
the product specification requires a major change to
the schedule understands the "ignorance at the top".

[I had a manager who wrote BASIC programs to keep track of the
DOG SHOWS that he'd entered (what is that?  just a bunch
of PRINT statements??) and considered himself qualified to
make decisions regarding the software in the products for
which he was responsible.  *Anyone* can write code.]

And, engineers turned managers tend to be the worst as they
THINK they understand the current state of the art (because
they used to practice it) without realizing that it's a moving
target and if you're using last year's technology, you are 2 or
3 (!) years out of date!

Would you promote a *technician* to run an electronics DESIGN
department and expect him to be current wrt the latest
generation of components, design and manufacturing practices?
If he *thought* he was, how quickly would you disabuse him of
that belief?

Reply by Joe Gwinn ●September 5, 20232023-09-05

On Tue, 5 Sep 2023 18:02:05 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 05/09/2023 17:45, Joe Gwinn wrote:
>> On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
>> <jlarkin@highlandSNIPMEtechnology.com> wrote:
>> 
>>> On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>>> <'''newspam'''@nonad.co.uk> wrote:
>>>
>>>> On 04/09/2023 14:30, Don Y wrote:
>>>>> Anyone else use bug reporting frequency as a gross indicator
>>>>> of system stability?
>>>>
>>>> Just about everyone who runs a beta test program.
>>>> MTBF is another metric that can be used for something that is intended
>>>> to run 24/7 and recover gracefully from anything that may happen to it.
>>>>
>>>> It is inevitable that a new release will have some bugs and minor
>>>> differences from its predecessor that real life users will find PDQ.
>>>
>>> That's the story of software: bugs are inevitable, so why bother to be
>>> careful coding or testing? You can always wait for bug reports from
>>> users and post regular fixes of the worst ones.
>>>
>>>>
>>>> The trick is to gain enough information from each in service failure to
>>>> identify and fix the root cause bug in a single iteration and without
>>>> breaking something else. Modern  optimisers make that more difficult now
>>>> than it used to be back when I was involved in commercial development.
>>>
>>> There have been various drives to write reliable code, but none were
>>> popular. Quite the contrary, the software world loves abstraction and
>>> ever new, bizarre languages... namely playing games instead of coding
>>> boring, reliable applications in some klunky, reliable language.
>>>
>>> Electronic design, and FPGA coding, are intended to be bug-free first
>>> pass and often are, when done right.
>>>
>>> FPGAs are halfway software, so the coders tend to be less careful than
>>> hardware designers. FPGA bug fixes are easy, so why bother to read
>>> your own code?
>>>
>>> That's ironic, when you think about it. The hardest bits, the physical
>>> electronics, has the least bugs.
>> 
>> There is a complication.  Modern software is tens of millions of lines
>> of code, far exceeding the inspection capabilities of humans. Hardware
>> is far simpler in terms of lines of FPGA code.  But it's creeping up.
>> 
>> On a project some decades ago, the customer wanted us to verify every
>> path through the code, which was about 100,000 lines (large at the
>> time) of C or assembler (don't recall, doesn't actually matter).
>> 
>> In round numbers, one in five lines of code is an IF statement, so in
>> 100,000 lines of code there will be 20,000 IF statements.  So, there
>> are up to 2^20000 unique paths through the code.  Which chokes my HP
>
>Although that is true it is also true that a small number of cunningly 
>constructed test datasets can explore a very high proportion of the most 
>frequently traversed paths in any given codebase. One snag is that 
>testing is invariably cut short by management when development overruns.
>
>The bits that fail to get explored tend to be weird error recovery 
>routines. I recall one latent on the VAX for ages which was that when it 
>ran out of IO handles (because someone was opening them inside a loop) 
>the first thing the recovery routine tried to do was open an IO channel!
>
>> calculator, so we must resort to logarithms, yielding 10^6021, which
>> is a *very* large number.  The age of the Universe is only 14 billion
>> years, call it 10^10 years, so one would never be able to test even a
>> tiny fraction of the possible paths.
>
>McCabe's complexity metric provides a way to test paths in components 
>and subsystems reasonably thoroughly and catch most of the common 
>programmer errors. Static dataflow analysis is also a lot better now 
>than in the past.
>
>Then you only need at most 40000 test vectors to take each branch of 
>every binary if statement (60000 if it is Fortran with 3 way branches 
>all used). That is a rather more tractable number (although still large).
>
>Any routine with too high a CCI count is practically certain to contain 
>latent bugs - which makes it worth looking at more carefully.

I must say that I fail to see how this can overcome 10^6021 paths,
even if it is wondrously effective, reducing the space to be tested by
a trillion to one (10^-12) - only 10^6009 paths to explore.

Joe Gwinn

Reply by Joe Gwinn ●September 5, 20232023-09-05

On Tue, 05 Sep 2023 10:19:15 -0700, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net>
>wrote:
>
>>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
>><jlarkin@highlandSNIPMEtechnology.com> wrote:
>>
>>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>>><'''newspam'''@nonad.co.uk> wrote:
>>>
>>>>On 04/09/2023 14:30, Don Y wrote:
>>>>> Anyone else use bug reporting frequency as a gross indicator
>>>>> of system stability?
>>>>
>>>>Just about everyone who runs a beta test program.
>>>>MTBF is another metric that can be used for something that is intended 
>>>>to run 24/7 and recover gracefully from anything that may happen to it.
>>>>
>>>>It is inevitable that a new release will have some bugs and minor 
>>>>differences from its predecessor that real life users will find PDQ.
>>>
>>>That's the story of software: bugs are inevitable, so why bother to be
>>>careful coding or testing? You can always wait for bug reports from
>>>users and post regular fixes of the worst ones.
>>>
>>>>
>>>>The trick is to gain enough information from each in service failure to 
>>>>identify and fix the root cause bug in a single iteration and without 
>>>>breaking something else. Modern  optimisers make that more difficult now 
>>>>than it used to be back when I was involved in commercial development.
>>>
>>>There have been various drives to write reliable code, but none were
>>>popular. Quite the contrary, the software world loves abstraction and
>>>ever new, bizarre languages... namely playing games instead of coding
>>>boring, reliable applications in some klunky, reliable language.
>>>
>>>Electronic design, and FPGA coding, are intended to be bug-free first
>>>pass and often are, when done right. 
>>>
>>>FPGAs are halfway software, so the coders tend to be less careful than
>>>hardware designers. FPGA bug fixes are easy, so why bother to read
>>>your own code?
>>>
>>>That's ironic, when you think about it. The hardest bits, the physical
>>>electronics, has the least bugs. 
>>
>>There is a complication.  Modern software is tens of millions of lines
>>of code, far exceeding the inspection capabilities of humans. 
>
>After you type a line of code, read it. When we did that, entire
>applications often worked first try.
>
>Hardware
>>is far simpler in terms of lines of FPGA code.  But it's creeping up.
>
>FPGAs are at least (usually) organized state machines. Mistakes are
>typically hard failures, not low-rate bugs discovered in the field.
>Avoiding race and metastability hazards is common practise.
>
>>
>>On a project some decades ago, the customer wanted us to verify every
>>path through the code, which was about 100,000 lines (large at the
>>time) of C or assembler (don't recall, doesn't actually matter).
>
>Software provability was a brief fad once. It wasn't popular or, as
>code is now done, possible.
>
>
>>
>>In round numbers, one in five lines of code is an IF statement, so in
>>100,000 lines of code there will be 20,000 IF statements.  So, there
>>are up to 2^20000 unique paths through the code.  Which chokes my HP
>>calculator, so we must resort to logarithms, yielding 10^6021, which
>>is a *very* large number.  The age of the Universe is only 14 billion
>>years, call it 10^10 years, so one would never be able to test even a
>>tiny fraction of the possible paths.
>
>An FPGA is usually coded as a state machine, where the designer
>understands that the machine has a finite number of states and handles
>every one. A computer program has an impossibly large number of
>states, unknown and certainly not managed. Code is like hairball async
>logic design. 

In recent FPGAs you have done, how many states and events (their
Cartesian product being the entire state table) are there?

By the way, back in the day when I was specifying state machines
(often for implementation in software), I had a rule that all cells
would have an entry, even the combinations of state and event that
"couldn't happen".  This was essential for achieving robustness in
practice.


>>The customer withdrew the requirement.
>
>It was naive of him to want correct code.

No, only a bit unrealistic.  

But it was naive of him to think that total correctness can be tested
into anything.


The state of the art in verifying safety-critical code (as in for
safety of flight) is DO-178, which is an immensely heavy process.  The
original objective was a probability of error not exceeding 10^-6,
this has been tightened to 10^-7 or 10^-8 because of the "headline
risk".

.<https://en.wikipedia.org/wiki/DO-178C>


Correctness can be mathematically proven only for extremely simple
mechanisms, using a sharply restricted set of allowed operations.  See
The Halting Problem.

.<https://en.wikipedia.org/wiki/Halting_problem#:~:text=The%20halting%20problem%20is%20undecidable,usually%20via%20a%20Turing%20machine.>


Joe Gwinn

Reply by John Larkin ●September 5, 20232023-09-05

On Tue, 05 Sep 2023 18:33:47 -0400, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Tue, 05 Sep 2023 10:19:15 -0700, John Larkin
><jlarkin@highlandSNIPMEtechnology.com> wrote:
>
>>On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net>
>>wrote:
>>
>>>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
>>><jlarkin@highlandSNIPMEtechnology.com> wrote:
>>>
>>>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>>>><'''newspam'''@nonad.co.uk> wrote:
>>>>
>>>>>On 04/09/2023 14:30, Don Y wrote:
>>>>>> Anyone else use bug reporting frequency as a gross indicator
>>>>>> of system stability?
>>>>>
>>>>>Just about everyone who runs a beta test program.
>>>>>MTBF is another metric that can be used for something that is intended 
>>>>>to run 24/7 and recover gracefully from anything that may happen to it.
>>>>>
>>>>>It is inevitable that a new release will have some bugs and minor 
>>>>>differences from its predecessor that real life users will find PDQ.
>>>>
>>>>That's the story of software: bugs are inevitable, so why bother to be
>>>>careful coding or testing? You can always wait for bug reports from
>>>>users and post regular fixes of the worst ones.
>>>>
>>>>>
>>>>>The trick is to gain enough information from each in service failure to 
>>>>>identify and fix the root cause bug in a single iteration and without 
>>>>>breaking something else. Modern  optimisers make that more difficult now 
>>>>>than it used to be back when I was involved in commercial development.
>>>>
>>>>There have been various drives to write reliable code, but none were
>>>>popular. Quite the contrary, the software world loves abstraction and
>>>>ever new, bizarre languages... namely playing games instead of coding
>>>>boring, reliable applications in some klunky, reliable language.
>>>>
>>>>Electronic design, and FPGA coding, are intended to be bug-free first
>>>>pass and often are, when done right. 
>>>>
>>>>FPGAs are halfway software, so the coders tend to be less careful than
>>>>hardware designers. FPGA bug fixes are easy, so why bother to read
>>>>your own code?
>>>>
>>>>That's ironic, when you think about it. The hardest bits, the physical
>>>>electronics, has the least bugs. 
>>>
>>>There is a complication.  Modern software is tens of millions of lines
>>>of code, far exceeding the inspection capabilities of humans. 
>>
>>After you type a line of code, read it. When we did that, entire
>>applications often worked first try.
>>
>>Hardware
>>>is far simpler in terms of lines of FPGA code.  But it's creeping up.
>>
>>FPGAs are at least (usually) organized state machines. Mistakes are
>>typically hard failures, not low-rate bugs discovered in the field.
>>Avoiding race and metastability hazards is common practise.
>>
>>>
>>>On a project some decades ago, the customer wanted us to verify every
>>>path through the code, which was about 100,000 lines (large at the
>>>time) of C or assembler (don't recall, doesn't actually matter).
>>
>>Software provability was a brief fad once. It wasn't popular or, as
>>code is now done, possible.
>>
>>
>>>
>>>In round numbers, one in five lines of code is an IF statement, so in
>>>100,000 lines of code there will be 20,000 IF statements.  So, there
>>>are up to 2^20000 unique paths through the code.  Which chokes my HP
>>>calculator, so we must resort to logarithms, yielding 10^6021, which
>>>is a *very* large number.  The age of the Universe is only 14 billion
>>>years, call it 10^10 years, so one would never be able to test even a
>>>tiny fraction of the possible paths.
>>
>>An FPGA is usually coded as a state machine, where the designer
>>understands that the machine has a finite number of states and handles
>>every one. A computer program has an impossibly large number of
>>states, unknown and certainly not managed. Code is like hairball async
>>logic design. 
>
>In recent FPGAs you have done, how many states and events (their
>Cartesian product being the entire state table) are there?


A useful state machine might have 4 or maybe 16 states. I'm not sure
what you mean by 'events'. Sometimes we have a state word and a
counter, which technically gives us more states but it's convnient to
think of them separately. As in "repeat state 4 until the counter hits
zero."

A state machine can have many more inputs and outputs than it has
states. It is critical that no inputs can be changing when the clock
ticks.

Reply by Joe Gwinn ●September 5, 20232023-09-05

On Tue, 05 Sep 2023 17:00:13 -0700, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Tue, 05 Sep 2023 18:33:47 -0400, Joe Gwinn <joegwinn@comcast.net>
>wrote:
>
>>On Tue, 05 Sep 2023 10:19:15 -0700, John Larkin
>><jlarkin@highlandSNIPMEtechnology.com> wrote:
>>
>>>On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net>
>>>wrote:
>>>
>>>>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
>>>><jlarkin@highlandSNIPMEtechnology.com> wrote:
>>>>
>>>>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>>>>><'''newspam'''@nonad.co.uk> wrote:
>>>>>
>>>>>>On 04/09/2023 14:30, Don Y wrote:
>>>>>>> Anyone else use bug reporting frequency as a gross indicator
>>>>>>> of system stability?
>>>>>>
>>>>>>Just about everyone who runs a beta test program.
>>>>>>MTBF is another metric that can be used for something that is intended 
>>>>>>to run 24/7 and recover gracefully from anything that may happen to it.
>>>>>>
>>>>>>It is inevitable that a new release will have some bugs and minor 
>>>>>>differences from its predecessor that real life users will find PDQ.
>>>>>
>>>>>That's the story of software: bugs are inevitable, so why bother to be
>>>>>careful coding or testing? You can always wait for bug reports from
>>>>>users and post regular fixes of the worst ones.
>>>>>
>>>>>>
>>>>>>The trick is to gain enough information from each in service failure to 
>>>>>>identify and fix the root cause bug in a single iteration and without 
>>>>>>breaking something else. Modern  optimisers make that more difficult now 
>>>>>>than it used to be back when I was involved in commercial development.
>>>>>
>>>>>There have been various drives to write reliable code, but none were
>>>>>popular. Quite the contrary, the software world loves abstraction and
>>>>>ever new, bizarre languages... namely playing games instead of coding
>>>>>boring, reliable applications in some klunky, reliable language.
>>>>>
>>>>>Electronic design, and FPGA coding, are intended to be bug-free first
>>>>>pass and often are, when done right. 
>>>>>
>>>>>FPGAs are halfway software, so the coders tend to be less careful than
>>>>>hardware designers. FPGA bug fixes are easy, so why bother to read
>>>>>your own code?
>>>>>
>>>>>That's ironic, when you think about it. The hardest bits, the physical
>>>>>electronics, has the least bugs. 
>>>>
>>>>There is a complication.  Modern software is tens of millions of lines
>>>>of code, far exceeding the inspection capabilities of humans. 
>>>
>>>After you type a line of code, read it. When we did that, entire
>>>applications often worked first try.
>>>
>>>Hardware
>>>>is far simpler in terms of lines of FPGA code.  But it's creeping up.
>>>
>>>FPGAs are at least (usually) organized state machines. Mistakes are
>>>typically hard failures, not low-rate bugs discovered in the field.
>>>Avoiding race and metastability hazards is common practise.
>>>
>>>>
>>>>On a project some decades ago, the customer wanted us to verify every
>>>>path through the code, which was about 100,000 lines (large at the
>>>>time) of C or assembler (don't recall, doesn't actually matter).
>>>
>>>Software provability was a brief fad once. It wasn't popular or, as
>>>code is now done, possible.
>>>
>>>
>>>>
>>>>In round numbers, one in five lines of code is an IF statement, so in
>>>>100,000 lines of code there will be 20,000 IF statements.  So, there
>>>>are up to 2^20000 unique paths through the code.  Which chokes my HP
>>>>calculator, so we must resort to logarithms, yielding 10^6021, which
>>>>is a *very* large number.  The age of the Universe is only 14 billion
>>>>years, call it 10^10 years, so one would never be able to test even a
>>>>tiny fraction of the possible paths.
>>>
>>>An FPGA is usually coded as a state machine, where the designer
>>>understands that the machine has a finite number of states and handles
>>>every one. A computer program has an impossibly large number of
>>>states, unknown and certainly not managed. Code is like hairball async
>>>logic design. 
>>
>>In recent FPGAs you have done, how many states and events (their
>>Cartesian product being the entire state table) are there?
>
>
>A useful state machine might have 4 or maybe 16 states. I'm not sure
>what you mean by 'events'. Sometimes we have a state word and a
>counter, which technically gives us more states but it's convnient to
>think of them separately. As in "repeat state 4 until the counter hits
>zero."

We'll call it 16 states for the present purposes.

An event is anything that can cause the state to change, including
expiration of a timer.  This is basically a design choice.


>A state machine can have many more inputs and outputs than it has
>states. 

Yes, that's typical.


>    It is critical that no inputs can be changing when the clock
>ticks.

That's also essential in hardware state machines.

In software state machines, events are most often the arrival of
messages, and the mechanism that provides these messages ensures that
they are presented in serial order (even if the underlying hardware
does not ensure ordering).

Joe Gwinn

Previous 123 4 Next

Tracking bug report frequency

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group