sci.electronics.design | Tracking bug report frequency

Anyone else use bug reporting frequency as a gross indicator
of system stability?

Reply by John Larkin ●September 4, 20232023-09-04

On Mon, 4 Sep 2023 06:30:44 -0700, Don Y <blockedofcourse@foo.invalid>
wrote:

>Anyone else use bug reporting frequency as a gross indicator
>of system stability?

One bug is an engineering failure and should be fixed immediately.

Reply by Anthony William Sloman ●September 4, 20232023-09-04

On Monday, September 4, 2023 at 11:30:55&#8239;PM UTC+10, Don Y wrote:
>
> Anyone else use bug reporting frequency as a gross indicator 
> of system stability?

IBM claimed to be doing that some thirty years ago. They also claimed to have used their debugging system on Bill Gates MS/DOS software and made it much more reliable.

It must have been truly appalling when Bill Gates originally offered it to them for the IBM PC.

-- 
Bill Sloman, Sydney

Reply by Martin Brown ●September 5, 20232023-09-05

On 04/09/2023 14:30, Don Y wrote:
> Anyone else use bug reporting frequency as a gross indicator
> of system stability?

Just about everyone who runs a beta test program.
MTBF is another metric that can be used for something that is intended 
to run 24/7 and recover gracefully from anything that may happen to it.

It is inevitable that a new release will have some bugs and minor 
differences from its predecessor that real life users will find PDQ.

The trick is to gain enough information from each in service failure to 
identify and fix the root cause bug in a single iteration and without 
breaking something else. Modern  optimisers make that more difficult now 
than it used to be back when I was involved in commercial development.

-- 
Martin Brown

Reply by John Larkin ●September 5, 20232023-09-05

On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 04/09/2023 14:30, Don Y wrote:
>> Anyone else use bug reporting frequency as a gross indicator
>> of system stability?
>
>Just about everyone who runs a beta test program.
>MTBF is another metric that can be used for something that is intended 
>to run 24/7 and recover gracefully from anything that may happen to it.
>
>It is inevitable that a new release will have some bugs and minor 
>differences from its predecessor that real life users will find PDQ.

That's the story of software: bugs are inevitable, so why bother to be
careful coding or testing? You can always wait for bug reports from
users and post regular fixes of the worst ones.

>
>The trick is to gain enough information from each in service failure to 
>identify and fix the root cause bug in a single iteration and without 
>breaking something else. Modern  optimisers make that more difficult now 
>than it used to be back when I was involved in commercial development.

There have been various drives to write reliable code, but none were
popular. Quite the contrary, the software world loves abstraction and
ever new, bizarre languages... namely playing games instead of coding
boring, reliable applications in some klunky, reliable language.

Electronic design, and FPGA coding, are intended to be bug-free first
pass and often are, when done right. 

FPGAs are halfway software, so the coders tend to be less careful than
hardware designers. FPGA bug fixes are easy, so why bother to read
your own code?

That's ironic, when you think about it. The hardest bits, the physical
electronics, has the least bugs.

Reply by Joe Gwinn ●September 5, 20232023-09-05

On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
><'''newspam'''@nonad.co.uk> wrote:
>
>>On 04/09/2023 14:30, Don Y wrote:
>>> Anyone else use bug reporting frequency as a gross indicator
>>> of system stability?
>>
>>Just about everyone who runs a beta test program.
>>MTBF is another metric that can be used for something that is intended 
>>to run 24/7 and recover gracefully from anything that may happen to it.
>>
>>It is inevitable that a new release will have some bugs and minor 
>>differences from its predecessor that real life users will find PDQ.
>
>That's the story of software: bugs are inevitable, so why bother to be
>careful coding or testing? You can always wait for bug reports from
>users and post regular fixes of the worst ones.
>
>>
>>The trick is to gain enough information from each in service failure to 
>>identify and fix the root cause bug in a single iteration and without 
>>breaking something else. Modern  optimisers make that more difficult now 
>>than it used to be back when I was involved in commercial development.
>
>There have been various drives to write reliable code, but none were
>popular. Quite the contrary, the software world loves abstraction and
>ever new, bizarre languages... namely playing games instead of coding
>boring, reliable applications in some klunky, reliable language.
>
>Electronic design, and FPGA coding, are intended to be bug-free first
>pass and often are, when done right. 
>
>FPGAs are halfway software, so the coders tend to be less careful than
>hardware designers. FPGA bug fixes are easy, so why bother to read
>your own code?
>
>That's ironic, when you think about it. The hardest bits, the physical
>electronics, has the least bugs. 

There is a complication.  Modern software is tens of millions of lines
of code, far exceeding the inspection capabilities of humans. Hardware
is far simpler in terms of lines of FPGA code.  But it's creeping up.

On a project some decades ago, the customer wanted us to verify every
path through the code, which was about 100,000 lines (large at the
time) of C or assembler (don't recall, doesn't actually matter).

In round numbers, one in five lines of code is an IF statement, so in
100,000 lines of code there will be 20,000 IF statements.  So, there
are up to 2^20000 unique paths through the code.  Which chokes my HP
calculator, so we must resort to logarithms, yielding 10^6021, which
is a *very* large number.  The age of the Universe is only 14 billion
years, call it 10^10 years, so one would never be able to test even a
tiny fraction of the possible paths.

The customer withdrew the requirement.

Joe Gwinn

Reply by Martin Brown ●September 5, 20232023-09-05

On 05/09/2023 16:57, John Larkin wrote:
> On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:
> 
>> On 04/09/2023 14:30, Don Y wrote:
>>> Anyone else use bug reporting frequency as a gross indicator
>>> of system stability?
>>
>> Just about everyone who runs a beta test program.
>> MTBF is another metric that can be used for something that is intended
>> to run 24/7 and recover gracefully from anything that may happen to it.
>>
>> It is inevitable that a new release will have some bugs and minor
>> differences from its predecessor that real life users will find PDQ.
> 
> That's the story of software: bugs are inevitable, so why bother to be
> careful coding or testing? You can always wait for bug reports from
> users and post regular fixes of the worst ones.

Don't blame the engineers for that - it is the ship it and be damned 
senior management that is responsible for most buggy code being shipped. 
Even more so now that 1+GB upgrades are essentially free. :(

First to market is worth enough that people live with buggy code. The 
worst major release I can recall in a very long time was MS Excel 2007 
(although bugs in Vista took a lot more flack - rather unfairly IMHO).

(which reminds me it is a MS patch Tuesday today)

>> The trick is to gain enough information from each in service failure to
>> identify and fix the root cause bug in a single iteration and without
>> breaking something else. Modern  optimisers make that more difficult now
>> than it used to be back when I was involved in commercial development.
> 
> There have been various drives to write reliable code, but none were
> popular. Quite the contrary, the software world loves abstraction and
> ever new, bizarre languages... namely playing games instead of coding
> boring, reliable applications in some klunky, reliable language.

The only ones which actually could be truly relied upon used formal 
mathematical proof techniques to ensure reliability. Very few 
practitioners are able to do it properly and it is pretty much reserved 
for ultra high reliability safety and mission critical code.

It could be all be done to that standard iff commercial developers and 
their customers were prepared to pay for it. However, they want it now 
and they keep changing their minds about what it is they actually want 
so the goalposts are forever shifting around. That sort of functionality 
creep is much less common in hardware.

UK's NATS system is supposedly 6 sigma coding but its misbehaviour on 
Bank Holiday Monday peak travel time was somewhat disastrous. It seems 
someone managed to input the halt and catch fire instruction and the 
buffers ran out before they were able to fix it. There will be a 
technical report out in due course - my guess is that they have reduced 
overheads and no longer have some of the key people who understand its 
internals. Malformed flight plan data should not have been able to kill 
it stone dead - but apparently that is exactly what happened!

https://www.ft.com/content/9fe22207-5867-4c4f-972b-620cdab10790
(might be paywalled)

If so Google "UK air traffic control outage caused by unusual flight 
plan data"

> Electronic design, and FPGA coding, are intended to be bug-free first
> pass and often are, when done right.

But using design and simulation *software* that you fail to acknowledge 
is actually pretty good. If you had to do it with pencil and paper your 
would be there forever.

> FPGAs are halfway software, so the coders tend to be less careful than
> hardware designers. FPGA bug fixes are easy, so why bother to read
> your own code?
> 
> That's ironic, when you think about it. The hardest bits, the physical
> electronics, has the least bugs.

So do physical mechanical interlocks. I don't trust software or even 
electronic interlocks to protect me compared to a damn great beam stop 
and a padlock on it with the key in my pocket.

-- 
Martin Brown

Reply by Martin Brown ●September 5, 20232023-09-05

On 05/09/2023 17:45, Joe Gwinn wrote:
> On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
> <jlarkin@highlandSNIPMEtechnology.com> wrote:
> 
>> On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>>
>>> On 04/09/2023 14:30, Don Y wrote:
>>>> Anyone else use bug reporting frequency as a gross indicator
>>>> of system stability?
>>>
>>> Just about everyone who runs a beta test program.
>>> MTBF is another metric that can be used for something that is intended
>>> to run 24/7 and recover gracefully from anything that may happen to it.
>>>
>>> It is inevitable that a new release will have some bugs and minor
>>> differences from its predecessor that real life users will find PDQ.
>>
>> That's the story of software: bugs are inevitable, so why bother to be
>> careful coding or testing? You can always wait for bug reports from
>> users and post regular fixes of the worst ones.
>>
>>>
>>> The trick is to gain enough information from each in service failure to
>>> identify and fix the root cause bug in a single iteration and without
>>> breaking something else. Modern  optimisers make that more difficult now
>>> than it used to be back when I was involved in commercial development.
>>
>> There have been various drives to write reliable code, but none were
>> popular. Quite the contrary, the software world loves abstraction and
>> ever new, bizarre languages... namely playing games instead of coding
>> boring, reliable applications in some klunky, reliable language.
>>
>> Electronic design, and FPGA coding, are intended to be bug-free first
>> pass and often are, when done right.
>>
>> FPGAs are halfway software, so the coders tend to be less careful than
>> hardware designers. FPGA bug fixes are easy, so why bother to read
>> your own code?
>>
>> That's ironic, when you think about it. The hardest bits, the physical
>> electronics, has the least bugs.
> 
> There is a complication.  Modern software is tens of millions of lines
> of code, far exceeding the inspection capabilities of humans. Hardware
> is far simpler in terms of lines of FPGA code.  But it's creeping up.
> 
> On a project some decades ago, the customer wanted us to verify every
> path through the code, which was about 100,000 lines (large at the
> time) of C or assembler (don't recall, doesn't actually matter).
> 
> In round numbers, one in five lines of code is an IF statement, so in
> 100,000 lines of code there will be 20,000 IF statements.  So, there
> are up to 2^20000 unique paths through the code.  Which chokes my HP

Although that is true it is also true that a small number of cunningly 
constructed test datasets can explore a very high proportion of the most 
frequently traversed paths in any given codebase. One snag is that 
testing is invariably cut short by management when development overruns.

The bits that fail to get explored tend to be weird error recovery 
routines. I recall one latent on the VAX for ages which was that when it 
ran out of IO handles (because someone was opening them inside a loop) 
the first thing the recovery routine tried to do was open an IO channel!

> calculator, so we must resort to logarithms, yielding 10^6021, which
> is a *very* large number.  The age of the Universe is only 14 billion
> years, call it 10^10 years, so one would never be able to test even a
> tiny fraction of the possible paths.

McCabe's complexity metric provides a way to test paths in components 
and subsystems reasonably thoroughly and catch most of the common 
programmer errors. Static dataflow analysis is also a lot better now 
than in the past.

Then you only need at most 40000 test vectors to take each branch of 
every binary if statement (60000 if it is Fortran with 3 way branches 
all used). That is a rather more tractable number (although still large).

Any routine with too high a CCI count is practically certain to contain 
latent bugs - which makes it worth looking at more carefully.

-- 
Martin Brown

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 5:13 AM, Martin Brown wrote:
> On 04/09/2023 14:30, Don Y wrote:
>> Anyone else use bug reporting frequency as a gross indicator
>> of system stability?
> 
> Just about everyone who runs a beta test program. > MTBF is another metric that can be used for something that is intended to run
> 24/7 and recover gracefully from anything that may happen to it.

I'm looking at the pre-release period (you wouldn't want to release
something that wasn't "stable").

I commit often (dozens of times a day) so I can have a record of
each problem encountered and, thereafter, how it was "fixed".
As the number of messages related to fixups decreases, confidence
in the codebase rises.

> It is inevitable that a new release will have some bugs and minor differences 
> from its predecessor that real life users will find PDQ.

The "bugs" that tend to show up after release are specification
shortcomings.  E.g., I had a case where a guy wired a motor
incorrectly and the software just kept driving it further and further
from it's desired setpoint -- until it smashed into the "wrong"
limit switches (which, of course, weren't examined because it
wasn't SUPPOSED to be traveling in that direction).

When you've got 7-figures at stake, you can't resort to blaming
the "electrician" for the failure ("Why didn't the software
sense that it was running the wrong way?"  Um, why didn't it sense
that the electrician's wife had been ragging on him before he
came to work and left him in a distracted mood??)

Bugs (as in "coding errors") should never leave the lab.

> The trick is to gain enough information from each in service failure to 
> identify and fix the root cause bug in a single iteration and without breaking 
> something else. Modern&nbsp; optimisers make that more difficult now than it used to 
> be back when I was involved in commercial development.

Good problem decomposition goes a long way towards that goal.
If you try to do "too much" you quickly overwhelm the developer's
ability to manage complexity (7 items in STM?).  And, as you can't
*see* the entire implementation, there's nothing to REMIND you
of some salient issue that might impact your local efforts.

[Hence the value of eschewing globals and the languages that
tolerate/encourage them!  This dramatically cuts down the
number of ways X can influence Y.]

Reply by John Larkin ●September 5, 20232023-09-05

On Tue, 05 Sep 2023 12:45:01 -0400, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Tue, 05 Sep 2023 08:57:22 -0700, John Larkin
><jlarkin@highlandSNIPMEtechnology.com> wrote:
>
>>On Tue, 5 Sep 2023 13:13:51 +0100, Martin Brown
>><'''newspam'''@nonad.co.uk> wrote:
>>
>>>On 04/09/2023 14:30, Don Y wrote:
>>>> Anyone else use bug reporting frequency as a gross indicator
>>>> of system stability?
>>>
>>>Just about everyone who runs a beta test program.
>>>MTBF is another metric that can be used for something that is intended 
>>>to run 24/7 and recover gracefully from anything that may happen to it.
>>>
>>>It is inevitable that a new release will have some bugs and minor 
>>>differences from its predecessor that real life users will find PDQ.
>>
>>That's the story of software: bugs are inevitable, so why bother to be
>>careful coding or testing? You can always wait for bug reports from
>>users and post regular fixes of the worst ones.
>>
>>>
>>>The trick is to gain enough information from each in service failure to 
>>>identify and fix the root cause bug in a single iteration and without 
>>>breaking something else. Modern  optimisers make that more difficult now 
>>>than it used to be back when I was involved in commercial development.
>>
>>There have been various drives to write reliable code, but none were
>>popular. Quite the contrary, the software world loves abstraction and
>>ever new, bizarre languages... namely playing games instead of coding
>>boring, reliable applications in some klunky, reliable language.
>>
>>Electronic design, and FPGA coding, are intended to be bug-free first
>>pass and often are, when done right. 
>>
>>FPGAs are halfway software, so the coders tend to be less careful than
>>hardware designers. FPGA bug fixes are easy, so why bother to read
>>your own code?
>>
>>That's ironic, when you think about it. The hardest bits, the physical
>>electronics, has the least bugs. 
>
>There is a complication.  Modern software is tens of millions of lines
>of code, far exceeding the inspection capabilities of humans. 

After you type a line of code, read it. When we did that, entire
applications often worked first try.

Hardware
>is far simpler in terms of lines of FPGA code.  But it's creeping up.

FPGAs are at least (usually) organized state machines. Mistakes are
typically hard failures, not low-rate bugs discovered in the field.
Avoiding race and metastability hazards is common practise.

>
>On a project some decades ago, the customer wanted us to verify every
>path through the code, which was about 100,000 lines (large at the
>time) of C or assembler (don't recall, doesn't actually matter).

Software provability was a brief fad once. It wasn't popular or, as
code is now done, possible.


>
>In round numbers, one in five lines of code is an IF statement, so in
>100,000 lines of code there will be 20,000 IF statements.  So, there
>are up to 2^20000 unique paths through the code.  Which chokes my HP
>calculator, so we must resort to logarithms, yielding 10^6021, which
>is a *very* large number.  The age of the Universe is only 14 billion
>years, call it 10^10 years, so one would never be able to test even a
>tiny fraction of the possible paths.

An FPGA is usually coded as a state machine, where the designer
understands that the machine has a finite number of states and handles
every one. A computer program has an impossibly large number of
states, unknown and certainly not managed. Code is like hairball async
logic design. 


>
>The customer withdrew the requirement.

It was naiive of him to want correct code.


>
>Joe Gwinn

Previous12 3 4 Next

Tracking bug report frequency

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group