sci.electronics.design | Tracking bug report frequency| page 3

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 3:14 PM, Joe Gwinn wrote:
>> Then you only need at most 40000 test vectors to take each branch of
>> every binary if statement (60000 if it is Fortran with 3 way branches
>> all used). That is a rather more tractable number (although still large).
>>
>> Any routine with too high a CCI count is practically certain to contain
>> latent bugs - which makes it worth looking at more carefully.
> 
> I must say that I fail to see how this can overcome 10^6021 paths,
> even if it is wondrously effective, reducing the space to be tested by
> a trillion to one (10^-12) - only 10^6009 paths to explore.

You don't have to exercise every path with a unique set of stimuli.
The number of N-way branches only sets an upper limit on the
number of paths through a piece of code.  The actual number of
paths can be much less:

if (x == 1)
    doXisone;

if (x == 2)
    doXistwo;

if (x == 3)
    doXisthree;

has three 2-way branches but only three distinct paths
through the code (instead of 2^3 = 8)

The point of complexity metrics is to alert you that maybe you
have factored the problem poorly.

Or, failed to recognize some underlying relationship(s)
that could simplify the process.

What's more amusing is how few orgainzations USE any of these
measures.  And, how poorly they institute design controls on
software, in general!

[I had a colleague ask me to help one of his clients
because their codebase "suddenly" stopped working.
It was "suggested" that an employee who had been "made
redundant" may have sabotaged the codebase.

"Simple:  retrieve the snapshot that would have been current
on the day before he was aware that he would be terminated.
Then, step forward until the day he was actually *gone*
and look for changes..."

But, no controls on who had access to the repository (nor
guarantees that the identity of the ACTOR was accurately stored
with each ACTION.  The employee had gone back and cooked
the contents of the repository so a more detailed forensic
examination was required; you couldn't just checkout a
particular release and be assured it represented the
state of the codebase on the assumed day!

Can YOUR developers dick with the repository in unexpected ways?]

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 3:33 PM, Joe Gwinn wrote:
> In recent FPGAs you have done, how many states and events (their
> Cartesian product being the entire state table) are there?

There is undoubtedly far more state *in* the CPU (neglecting
the application!) than in most FPGA designs!

> By the way, back in the day when I was specifying state machines
> (often for implementation in software), I had a rule that all cells
> would have an entry, even the combinations of state and event that
> "couldn't happen".  This was essential for achieving robustness in
> practice.

It also helps if the machine can power up into a "random"
state (no "reset" to the state variables)... just let it run
for a few clocks until it finds its way to a known state.

> The state of the art in verifying safety-critical code (as in for
> safety of flight) is DO-178, which is an immensely heavy process.  The
> original objective was a probability of error not exceeding 10^-6,
> this has been tightened to 10^-7 or 10^-8 because of the "headline
> risk".
> 
> .<https://en.wikipedia.org/wiki/DO-178C>

But, this is still just a game of chance.  And, with the number of
external things that can potentially impact the operation of the
state machine, even a provably correct implementation can fail
because the hardware isn't "ideal".

> Correctness can be mathematically proven only for extremely simple
> mechanisms, using a sharply restricted set of allowed operations.  See

Exactly.  Just like provably correct programs are extremely
simple.

And, if you try to *build* a (more complex) system atop that
proven (sub)system, you discover you're right back where you
started.  (Else, if knowing the underlying system was
provably correct, you would rationalize that ANY program
is correct because the opcodes on which it relies are correctly
implemented!)

> The Halting Problem.
> 
> .<https://en.wikipedia.org/wiki/Halting_problem#:~:text=The%20halting%20problem%20is%20undecidable,usually%20via%20a%20Turing%20machine.>

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 5:11 PM, Joe Gwinn wrote:
> In software state machines, events are most often the arrival of
> messages, and the mechanism that provides these messages ensures that
> they are presented in serial order (even if the underlying hardware
> does not ensure ordering).

But software state machines can get EASILY creative in how they
process events.  E.g., "go back from whence you came" (which isn't
present in most hardware machines)

A typical user interface can have hundreds of states, not counting
the information/controls they are "accumulating".

Reply by Don Y ●September 5, 20232023-09-05

On 9/5/2023 6:04 PM, Don Y wrote:
> On 9/5/2023 3:14 PM, Joe Gwinn wrote:
>>> Then you only need at most 40000 test vectors to take each branch of
>>> every binary if statement (60000 if it is Fortran with 3 way branches
>>> all used). That is a rather more tractable number (although still large).
>>>
>>> Any routine with too high a CCI count is practically certain to contain
>>> latent bugs - which makes it worth looking at more carefully.
>>
>> I must say that I fail to see how this can overcome 10^6021 paths,
>> even if it is wondrously effective, reducing the space to be tested by
>> a trillion to one (10^-12) - only 10^6009 paths to explore.
> 
> You don't have to exercise every path with a unique set of stimuli.
> The number of N-way branches only sets an upper limit on the
> number of paths through a piece of code.&nbsp; The actual number of
> paths can be much less:
> 
> if (x == 1)
>  &nbsp;&nbsp; doXisone;
> 
> if (x == 2)
>  &nbsp;&nbsp; doXistwo;
> 
> if (x == 3)
>  &nbsp;&nbsp; doXisthree;

actually, four

Reply by Martin Brown ●September 6, 20232023-09-06

On 05/09/2023 23:14, Joe Gwinn wrote:
> On Tue, 5 Sep 2023 18:02:05 +0100, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:
> 
>> On 05/09/2023 17:45, Joe Gwinn wrote:

>>> calculator, so we must resort to logarithms, yielding 10^6021, which
>>> is a *very* large number.  The age of the Universe is only 14 billion
>>> years, call it 10^10 years, so one would never be able to test even a
>>> tiny fraction of the possible paths.
>>
>> McCabe's complexity metric provides a way to test paths in components
>> and subsystems reasonably thoroughly and catch most of the common
>> programmer errors. Static dataflow analysis is also a lot better now
>> than in the past.
>>
>> Then you only need at most 40000 test vectors to take each branch of
>> every binary if statement (60000 if it is Fortran with 3 way branches
>> all used). That is a rather more tractable number (although still large).
>>
>> Any routine with too high a CCI count is practically certain to contain
>> latent bugs - which makes it worth looking at more carefully.
> 
> I must say that I fail to see how this can overcome 10^6021 paths,
> even if it is wondrously effective, reducing the space to be tested by
> a trillion to one (10^-12) - only 10^6009 paths to explore.

It is a divide and conquer strategy. It guarantees to execute every path 
at least once and common core paths many times with different data. The 
number of test vectors required scales as log2 of the number of paths.

That is an enormous reduction in the state space and makes coverage 
testing possible. There are even some automatic tools that can help 
create the test vectors now. Not a recommendation but just to make you 
aware of some of the options for this sort of thing.

www.accelq.com/blog/test-coverage-techniques/

Here is a simple example for N = 5 (you wouldn't code it this way)

if (i<16)
{
    if (i<8)
    {
       if (i<4)
       {
          if (i<2)
          {
             if (i<1) //zero
	    else // one
	 }
	 else
	 {
	    if (i<3) // two
	    else // three
	 }
	 else
	 {
	 }
       }
       else
       {
       }
    }
    else
    {
    }
}
else
{
}

So that at each level i=0 ..N in the trivial example there are 2^N if 
statements to select between numbers in the range 0 to 31 inclusive. 
There is a deliberate error left in.

Counted your way there are 2^(N+1)-1 if statements and so 2^(2^(N+1)) 
distinct paths through the software (plus a few more with invalid data).

However integers -1 through 2^N will be sufficient to explore every path 
at least once and test for high/low fence post errors.

Concrete example of N = 5
  total if statements 63
  naive paths through code 2^63 ~ 10^19
  CCI test vectors to test every path	34

The example is topologically equivalent to real* code you merely have to 
construct input data that will force execution down each binary choice 
in turn at every level. Getting the absolute minimum number of test 
vectors for full coverage is a much harder problem but a good enough 
solution is possible in most practical cases.

-- 
Martin Brown

*real code can be grubby in reality with different depths of tree.
(I'd hope that few routines go deeper than 5 nested if statements)

Reply by Martin Brown ●September 6, 20232023-09-06

On 05/09/2023 19:06, Don Y wrote:
> On 9/5/2023 10:02 AM, Martin Brown wrote:
>>> In round numbers, one in five lines of code is an IF statement, so in
>>> 100,000 lines of code there will be 20,000 IF statements.&nbsp; So, there
>>> are up to 2^20000 unique paths through the code.&nbsp; Which chokes my HP
>>
>> Although that is true it is also true that a small number of cunningly 
>> constructed test datasets can explore a very high proportion of the 
>> most frequently traversed paths in any given codebase. One snag is 
>> that testing is invariably cut short by management when development 
>> overruns.
> 
> "We'll fix it in version 2"
> 
> I always found this an amusing delusion.
> 
> If the product is successful, there will be lots of people clamoring
> for fixes so you won't have any manpower to devote to designing
> version 2 (but your competitors will see the appeal your product
> has and will start designing THEIR replacement for it!)
> 
> If the product is a dud (possibly because of these problems),
> there won't be a need for a version 2.

It depends a bit on how special the hardware being controlled is.

The industry I am most familiar with was high end scientific mass 
spectrometers where management viewed the user software (and embedded 
code) as a necessary evil to make the hardware work properly.
> 
>> The bits that fail to get explored tend to be weird error recovery 
>> routines. I 
> 
> Because, by design, they are seldom encountered.
> So, don't benefit from being exercised in the normal
> course of operation.

It is always the rare cases that bite you in the backside (eventually).

Back in the day when (even for a dry site) almost everyone went to the 
pub on Friday lunchtimes we had a rule that you could test software and 
note defects but not make any alterations that afternoon.

The finger trouble caused by the odd person having had one too many 
replicated typical user errors rather well!

>> McCabe's complexity metric provides a way to test paths in components 
>> and subsystems reasonably thoroughly and catch most of the common 
>> programmer errors. Static dataflow analysis is also a lot better now 
>> than in the past.
> 
> But some test cases can mask other paths through the code.
> There is no guarantee that a given piece of code *can* be
> thoroughly tested -- especially if you take into account the
> fact that the underlying hardware isn't infallible;

The only time I have seen hardware failure masquerading as software bugs 
I can count on the fingers of one hand. They are memorable for that:

1. Memory management unit on ND500 supermini
2. Disk controller DMA flaw on certain HP models in the 386 era
3. Embedded CPU where RTI failed about 1:10^9 times
4. Intel FPU divide bug

I know that cosmic ray single bit switches have to allowed for in some 
of the space probes. OK in error corrected memory but really tough if it 
happens in an unprotected CPU register.

> "if (x % )" can yield one result, now, and a different
> result, 5 lines later -- even though x hasn't been
> altered (but the hardware farted).
> 
> So:
> 
>  &nbsp;&nbsp;&nbsp; if (x % 2) {
>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this;
>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that;
>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing;
>  &nbsp;&nbsp;&nbsp; } else {
>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that;
>  &nbsp;&nbsp;&nbsp; }
> 
> can execute differently than:
> 
>  &nbsp;&nbsp;&nbsp; if (x % 2) {
>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this;
>  &nbsp;&nbsp;&nbsp; }
> 
>  &nbsp;&nbsp;&nbsp; do that;
> 
>  &nbsp;&nbsp;&nbsp; if (x % 2) {
>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing;
>  &nbsp;&nbsp;&nbsp; }
> 
> Years ago, this possibility wasn't ever considered.
> 
> [Yes, optimizers can twiddle this but the point remains]
> 
> And, that doesn't begin to address hostile actors in a
> system!

It is optimisers rearranging things that can make testing these days so 
much more difficult. The CPU out of order and speculative execution 
profile also means that the old rules about if statements no longer hold 
true. Even more so if the same path is taken many times. The old loop 
unrolling trick can actually work against you now if it means that the 
innermost loop no longer fits nicely inside a cache line.
> 
>> Then you only need at most 40000 test vectors to take each branch of 
>> every binary if statement (60000 if it is Fortran with 3 way branches 
>> all used). That is a rather more tractable number (although still large).
>>
>> Any routine with too high a CCI count is practically certain to 
>> contain latent bugs - which makes it worth looking at more carefully.
> 
> "A 'program' should fit on a single piece of paper"

The 7 +/- 2 rule for each hierarchical level of a software design is 
still quite a good heuristic unless there are special circumstances.

-- 
Martin Brown

Reply by Don Y ●September 6, 20232023-09-06

On 9/6/2023 2:58 AM, Martin Brown wrote:
> On 05/09/2023 19:06, Don Y wrote:
>> On 9/5/2023 10:02 AM, Martin Brown wrote:
>>>> In round numbers, one in five lines of code is an IF statement, so in
>>>> 100,000 lines of code there will be 20,000 IF statements.&nbsp; So, there
>>>> are up to 2^20000 unique paths through the code.&nbsp; Which chokes my HP
>>>
>>> Although that is true it is also true that a small number of cunningly 
>>> constructed test datasets can explore a very high proportion of the most 
>>> frequently traversed paths in any given codebase. One snag is that testing 
>>> is invariably cut short by management when development overruns.
>>
>> "We'll fix it in version 2"
>>
>> I always found this an amusing delusion.
>>
>> If the product is successful, there will be lots of people clamoring
>> for fixes so you won't have any manpower to devote to designing
>> version 2 (but your competitors will see the appeal your product
>> has and will start designing THEIR replacement for it!)
>>
>> If the product is a dud (possibly because of these problems),
>> there won't be a need for a version 2.
> 
> It depends a bit on how special the hardware being controlled is.
> 
> The industry I am most familiar with was high end scientific mass spectrometers 
> where management viewed the user software (and embedded code) as a necessary 
> evil to make the hardware work properly.

Note that they were LIKELY reasonably skilled users.  So, they could
recognize things that "didn't quite make sense" and not be victimized
by them.

When users are less savvy, the consequences of a system problem
can be more dire -- as they may not be able to recognize and compensate
for it.  E.g., the stove/oven bug I described -- if manifested -- is
likely to leave its victims perplexed as to how to escape its grip.

>>> The bits that fail to get explored tend to be weird error recovery routines. I 
>>
>> Because, by design, they are seldom encountered.
>> So, don't benefit from being exercised in the normal
>> course of operation.
> 
> It is always the rare cases that bite you in the backside (eventually).

Because they *are* rare.  Folks often don't check for memory allocation
failures because they don't encounter them in the normal course of
the program's execution.  So, when the system is *stressed* and they
manifest, the code to handle them is conveniently missing!

[Oh, you don't have mechanisms in place to stress a system
as it is being tested?  Don't worry.  Your CUSTOMERS will
find a way in the normal course of operation... won't YOU
be embarassed!  :> ]

I find that most folks haven't a clue as to how to deal with
unexpected errors.  And, you often just see them propagated along
until *something* panic()s.

My current design is entirely real-time.  EVERY task has to specify
a deadline.  AND, a deadline handler -- what do you want the system to
do WHEN your deadline isn't met.  Don't want to think about that?
Then it will just be killed off -- and, by extension, anything
that relies on the services it provides will be killed off.  Not
going to leave a good impression to the user that installed your applet!

> Back in the day when (even for a dry site) almost everyone went to the pub on 
> Friday lunchtimes we had a rule that you could test software and note defects 
> but not make any alterations that afternoon.
> 
> The finger trouble caused by the odd person having had one too many replicated 
> typical user errors rather well!

I have a delightful knack for tickling the unexplored corners of applications.
It's something colleagues have grown to love -- and hate!  "Have Don play
with it for a while..."  It's rare that I can't find something on a
*finished* piece of software that doesn't work -- in just a few minutes
of playing around!

[I've got a Dragon cassette deck.  It "autoreverses" by switching to
a separate pair of tracks in the head that are in-line with the
"back side" stereo channels on the tape -- so the cassette
doesn't have to be mechanically flipped over.  It's "tape counter"
is a simple revolutions counter.  So, if it starts at 0000 at
the start of side A and reaches XXXX at the end of side A, it
will count *backwards* back down to 0000 while it is playing side B.
But, there is a race in the system logic that can repeatably
cause it to count FORWARDS while *moving* BACKWARDS (there are
several MCUs in the product and comms take finite time!  :> )

It's a genuine defect because I have two of them and they both
exhibit the same behavior.  For a $2K (30+ years ago) device
you would think they'd pay a bit more attention to detail!]

I had a buddy "finish" a product, complete all of the testing
prior to FDA approvals.  I happened to be walking past him as he
made his "announcement".  I reached over his shoulder and
typed something on the machine's console -- and it crashed
spectacularly!

"What did you do????!"
"This:"
"You're not supposed to do that!"
"Why did your system LET ME?"

>>> McCabe's complexity metric provides a way to test paths in components and 
>>> subsystems reasonably thoroughly and catch most of the common programmer 
>>> errors. Static dataflow analysis is also a lot better now than in the past.
>>
>> But some test cases can mask other paths through the code.
>> There is no guarantee that a given piece of code *can* be
>> thoroughly tested -- especially if you take into account the
>> fact that the underlying hardware isn't infallible;
> 
> The only time I have seen hardware failure masquerading as software bugs I can 
> count on the fingers of one hand. They are memorable for that:
> 
> 1. Memory management unit on ND500 supermini
> 2. Disk controller DMA flaw on certain HP models in the 386 era
> 3. Embedded CPU where RTI failed about 1:10^9 times
> 4. Intel FPU divide bug
> 
> I know that cosmic ray single bit switches have to allowed for in some of the 
> space probes. OK in error corrected memory but really tough if it happens in an 
> unprotected CPU register.

The fact that ECCs are now common on datapaths (and structures)
*inside* CPUs suggests they already know they are pushing the
envelope in terms of reliable behavior (in the absence of those
mechanisms).  Look at MLC flash where states are resolved based
on a few handfulls of electrons!

When do you -- as a user or designer -- lose faith in a system
that is reporting CORRECTED errors?  At what level do you start
seriously wondering about the number of UNCORRECTED errors that
you *might* be experiencing?

What about systems that don't have provisions for reporting
same?  Do you just assume all is well?

When do you start wondering if that flash memory that
holds your system image, FPGA configuration, etc. is
suffering from read wear?  Do you have a mechanism for
detecting that?  Reporting it?  Remedying it??  The
software can't perform as designed if the hardware
can't be relied upon to reliably reproduce it.

Are you sure your *intended* code isn't triggering an
exploitable feature accidentally?  (are you sure any
"foreign" code that you are hosting isn't deliberately
trying to do so?)

>> "if (x % )" can yield one result, now, and a different
>> result, 5 lines later -- even though x hasn't been
>> altered (but the hardware farted).
>>
>> So:
>>
>> &nbsp;&nbsp;&nbsp;&nbsp; if (x % 2) {
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this;
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that;
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing;
>> &nbsp;&nbsp;&nbsp;&nbsp; } else {
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that;
>> &nbsp;&nbsp;&nbsp;&nbsp; }
>>
>> can execute differently than:
>>
>> &nbsp;&nbsp;&nbsp;&nbsp; if (x % 2) {
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this;
>> &nbsp;&nbsp;&nbsp;&nbsp; }
>>
>> &nbsp;&nbsp;&nbsp;&nbsp; do that;
>>
>> &nbsp;&nbsp;&nbsp;&nbsp; if (x % 2) {
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing;
>> &nbsp;&nbsp;&nbsp;&nbsp; }
>>
>> Years ago, this possibility wasn't ever considered.
>>
>> [Yes, optimizers can twiddle this but the point remains]
>>
>> And, that doesn't begin to address hostile actors in a
>> system!
> 
> It is optimisers rearranging things that can make testing these days so much 
> more difficult. The CPU out of order and speculative execution profile also 
> means that the old rules about if statements no longer hold true. Even more so 
> if the same path is taken many times. The old loop unrolling trick can actually 
> work against you now if it means that the innermost loop no longer fits nicely 
> inside a cache line.

Most "programmers" are clueless of these issues; they just concentrate
on getting the "logic correct".  Hardware designers are even MORE
clueless because they can't even imagine the performance consequences.

I've had to refactor many of the critical path structures in my
current design to "less intuitive" structures because of the
cache hits incurred when designing for a more intuitive implementation
(processor doesn't care about what *seems* related -- it only cares
about what *is* related, in terms of execution).

>>> Then you only need at most 40000 test vectors to take each branch of every 
>>> binary if statement (60000 if it is Fortran with 3 way branches all used). 
>>> That is a rather more tractable number (although still large).
>>>
>>> Any routine with too high a CCI count is practically certain to contain 
>>> latent bugs - which makes it worth looking at more carefully.
>>
>> "A 'program' should fit on a single piece of paper"
> 
> The 7 +/- 2 rule for each hierarchical level of a software design is still 
> quite a good heuristic unless there are special circumstances.

Makes you wonder how folks can still think things like monolithic
kernels "make sense":  how many crania do you need to gather to have
even an APPROXIMATE understanding of what's happening in the
machine, *now*?

With multi-threaded kernels, are you sure you know everything that
*might* be happening WHILE "this" is happening?

*SO* much easier when you're trying to get some trivial piece of code
to work than something CONSIDERABLY more complex!

[It has been an eye-opener dealing with odd concepts like
transport delays in designing and evaluating performance
when you always took the cost of comms INSIDE a device to
be zero!  Think of all the critical regions that are
inherently exposed that you never had to worry about:
"What are the chances of <something> happening *between*
the execution of statements 23 and 24?"  Ans:  100%
if the user is running the device!  Now, how do you TEST
for that inevitability?]

Reply by Don Y ●September 6, 20232023-09-06

On 9/6/2023 4:42 AM, Don Y wrote:
> [I've got a Dragon cassette deck.&nbsp; It "autoreverses" by switching to
> a separate pair of tracks in the head that are in-line with the
> "back side" stereo channels on the tape -- so the cassette
> doesn't have to be mechanically flipped over.&nbsp; It's "tape counter"
> is a simple revolutions counter.&nbsp; So, if it starts at 0000 at
> the start of side A and reaches XXXX at the end of side A, it
> will count *backwards* back down to 0000 while it is playing side B.
> But, there is a race in the system logic that can repeatably
> cause it to count FORWARDS while *moving* BACKWARDS (there are
> several MCUs in the product and comms take finite time!&nbsp; :> )

No, I think the flaw has it counting backwards while moving forwards
(I haven't tickled that bug recently; I know its there so why
give it a chance to manifest?)  But, only if you act in a small
(1-2 second) window of time.

So, there is no way to claim that it's an intentional feature
as the deck shouldn't care if you waited 1 second or 4!

> It's a genuine defect because I have two of them and they both
> exhibit the same behavior.&nbsp; For a $2K (30+ years ago) device
> you would think they'd pay a bit more attention to detail!]

Reply by Joe Gwinn ●September 6, 20232023-09-06

On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 05/09/2023 23:14, Joe Gwinn wrote:
>> On Tue, 5 Sep 2023 18:02:05 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
>>> On 05/09/2023 17:45, Joe Gwinn wrote:
>
>>>> calculator, so we must resort to logarithms, yielding 10^6021, which
>>>> is a *very* large number.  The age of the Universe is only 14 billion
>>>> years, call it 10^10 years, so one would never be able to test even a
>>>> tiny fraction of the possible paths.
>>>
>>> McCabe's complexity metric provides a way to test paths in components
>>> and subsystems reasonably thoroughly and catch most of the common
>>> programmer errors. Static dataflow analysis is also a lot better now
>>> than in the past.
>>>
>>> Then you only need at most 40000 test vectors to take each branch of
>>> every binary if statement (60000 if it is Fortran with 3 way branches
>>> all used). That is a rather more tractable number (although still large).
>>>
>>> Any routine with too high a CCI count is practically certain to contain
>>> latent bugs - which makes it worth looking at more carefully.
>> 
>> I must say that I fail to see how this can overcome 10^6021 paths,
>> even if it is wondrously effective, reducing the space to be tested by
>> a trillion to one (10^-12) - only 10^6009 paths to explore.
>
>It is a divide and conquer strategy. It guarantees to execute every path 
>at least once and common core paths many times with different data. The 
>number of test vectors required scales as log2 of the number of paths.
>
>That is an enormous reduction in the state space and makes coverage 
>testing possible. There are even some automatic tools that can help 
>create the test vectors now. Not a recommendation but just to make you 
>aware of some of the options for this sort of thing.
>
>www.accelq.com/blog/test-coverage-techniques/
>
>Here is a simple example for N = 5 (you wouldn't code it this way)
>
>if (i<16)
>{
>    if (i<8)
>    {
>       if (i<4)
>       {
>          if (i<2)
>          {
>             if (i<1) //zero
>	    else // one
>	 }
>	 else
>	 {
>	    if (i<3) // two
>	    else // three
>	 }
>	 else
>	 {
>	 }
>       }
>       else
>       {
>       }
>    }
>    else
>    {
>    }
>}
>else
>{
>}
>
>So that at each level i=0 ..N in the trivial example there are 2^N if 
>statements to select between numbers in the range 0 to 31 inclusive. 
>There is a deliberate error left in.
>
>Counted your way there are 2^(N+1)-1 if statements and so 2^(2^(N+1)) 
>distinct paths through the software (plus a few more with invalid data).
>
>However integers -1 through 2^N will be sufficient to explore every path 
>at least once and test for high/low fence post errors.
>
>Concrete example of N = 5
>  total if statements 63
>  naive paths through code 2^63 ~ 10^19
>  CCI test vectors to test every path	34
>
>The example is topologically equivalent to real* code you merely have to 
>construct input data that will force execution down each binary choice 
>in turn at every level. Getting the absolute minimum number of test 
>vectors for full coverage is a much harder problem but a good enough 
>solution is possible in most practical cases.

In practice, this is certainly pretty effective, but the proposed
requirement did not allow for such shortcuts, rendering the
requirement intractable - the Sun will blow up first.

Also, in practice we do a combination of random probing and fuzzing.

.<https://en.wikipedia.org/wiki/Fuzzing>


Joe Gwinn

Reply by Martin Brown ●September 7, 20232023-09-07

On 06/09/2023 19:20, Joe Gwinn wrote:
> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:
> 
>> The example is topologically equivalent to real* code you merely have to
>> construct input data that will force execution down each binary choice
>> in turn at every level. Getting the absolute minimum number of test
>> vectors for full coverage is a much harder problem but a good enough
>> solution is possible in most practical cases.
> 
> In practice, this is certainly pretty effective, but the proposed
> requirement did not allow for such shortcuts, rendering the
> requirement intractable - the Sun will blow up first.
> 
> Also, in practice we do a combination of random probing and fuzzing.
> 
> .<https://en.wikipedia.org/wiki/Fuzzing>
One tactic I have adopted for testing numerical code is very similar. 
Basically a biassed random number generator which creates test data for 
the routines under test and then verifies the answers.

It is extra work to do both a solver and a verifier but not that much 
and the verification of such provides a basis for regression testing.

Most software the computation being done may be very difficult but the 
inverse is often relatively easy by comparison. Finding all real roots 
of a function f(x)=0 to maximum accuracy is quite tricky but given a 
supposed root x0 then computing the value of the equation f(x0) and its 
derivative f'(x0) is easy. Then you can use NR to see if the correction 
is acceptably small enough, if not rinse and repeat.

I found a new bug in a cubic solver that is as robust as any on the 
planet quite recently. It required a very specific near exact 
combination of 3 64 bit parameters to create a catastrophic numeric 
cancellation down a seldom trodden path where the cubic equation has 
three real roots and you want the one that it can't compute accurately.

Most of these problems we try very hard to only have one real root...

My initial reaction was that it was tested library code so it must be my 
problem - until I traced into it and saw how it failed. It gives 8 
instead of 16 sig fig in double precision for these pathological data.

-- 
Martin Brown

Previous 1 234 Next

Tracking bug report frequency

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group