Electronics-Related.com
Forums

Tracking bug report frequency

Started by Don Y September 4, 2023
On 9/5/2023 3:14 PM, Joe Gwinn wrote:
>> Then you only need at most 40000 test vectors to take each branch of >> every binary if statement (60000 if it is Fortran with 3 way branches >> all used). That is a rather more tractable number (although still large). >> >> Any routine with too high a CCI count is practically certain to contain >> latent bugs - which makes it worth looking at more carefully. > > I must say that I fail to see how this can overcome 10^6021 paths, > even if it is wondrously effective, reducing the space to be tested by > a trillion to one (10^-12) - only 10^6009 paths to explore.
You don't have to exercise every path with a unique set of stimuli. The number of N-way branches only sets an upper limit on the number of paths through a piece of code. The actual number of paths can be much less: if (x == 1) doXisone; if (x == 2) doXistwo; if (x == 3) doXisthree; has three 2-way branches but only three distinct paths through the code (instead of 2^3 = 8) The point of complexity metrics is to alert you that maybe you have factored the problem poorly. Or, failed to recognize some underlying relationship(s) that could simplify the process. What's more amusing is how few orgainzations USE any of these measures. And, how poorly they institute design controls on software, in general! [I had a colleague ask me to help one of his clients because their codebase "suddenly" stopped working. It was "suggested" that an employee who had been "made redundant" may have sabotaged the codebase. "Simple: retrieve the snapshot that would have been current on the day before he was aware that he would be terminated. Then, step forward until the day he was actually *gone* and look for changes..." But, no controls on who had access to the repository (nor guarantees that the identity of the ACTOR was accurately stored with each ACTION. The employee had gone back and cooked the contents of the repository so a more detailed forensic examination was required; you couldn't just checkout a particular release and be assured it represented the state of the codebase on the assumed day! Can YOUR developers dick with the repository in unexpected ways?]
On 9/5/2023 3:33 PM, Joe Gwinn wrote:
> In recent FPGAs you have done, how many states and events (their > Cartesian product being the entire state table) are there?
There is undoubtedly far more state *in* the CPU (neglecting the application!) than in most FPGA designs!
> By the way, back in the day when I was specifying state machines > (often for implementation in software), I had a rule that all cells > would have an entry, even the combinations of state and event that > "couldn't happen". This was essential for achieving robustness in > practice.
It also helps if the machine can power up into a "random" state (no "reset" to the state variables)... just let it run for a few clocks until it finds its way to a known state.
> The state of the art in verifying safety-critical code (as in for > safety of flight) is DO-178, which is an immensely heavy process. The > original objective was a probability of error not exceeding 10^-6, > this has been tightened to 10^-7 or 10^-8 because of the "headline > risk". > > .<https://en.wikipedia.org/wiki/DO-178C>
But, this is still just a game of chance. And, with the number of external things that can potentially impact the operation of the state machine, even a provably correct implementation can fail because the hardware isn't "ideal".
> Correctness can be mathematically proven only for extremely simple > mechanisms, using a sharply restricted set of allowed operations. See
Exactly. Just like provably correct programs are extremely simple. And, if you try to *build* a (more complex) system atop that proven (sub)system, you discover you're right back where you started. (Else, if knowing the underlying system was provably correct, you would rationalize that ANY program is correct because the opcodes on which it relies are correctly implemented!)
> The Halting Problem. > > .<https://en.wikipedia.org/wiki/Halting_problem#:~:text=The%20halting%20problem%20is%20undecidable,usually%20via%20a%20Turing%20machine.>
On 9/5/2023 5:11 PM, Joe Gwinn wrote:
> In software state machines, events are most often the arrival of > messages, and the mechanism that provides these messages ensures that > they are presented in serial order (even if the underlying hardware > does not ensure ordering).
But software state machines can get EASILY creative in how they process events. E.g., "go back from whence you came" (which isn't present in most hardware machines) A typical user interface can have hundreds of states, not counting the information/controls they are "accumulating".
On 9/5/2023 6:04 PM, Don Y wrote:
> On 9/5/2023 3:14 PM, Joe Gwinn wrote: >>> Then you only need at most 40000 test vectors to take each branch of >>> every binary if statement (60000 if it is Fortran with 3 way branches >>> all used). That is a rather more tractable number (although still large). >>> >>> Any routine with too high a CCI count is practically certain to contain >>> latent bugs - which makes it worth looking at more carefully. >> >> I must say that I fail to see how this can overcome 10^6021 paths, >> even if it is wondrously effective, reducing the space to be tested by >> a trillion to one (10^-12) - only 10^6009 paths to explore. > > You don't have to exercise every path with a unique set of stimuli. > The number of N-way branches only sets an upper limit on the > number of paths through a piece of code.&nbsp; The actual number of > paths can be much less: > > if (x == 1) > &nbsp;&nbsp; doXisone; > > if (x == 2) > &nbsp;&nbsp; doXistwo; > > if (x == 3) > &nbsp;&nbsp; doXisthree;
actually, four
On 05/09/2023 23:14, Joe Gwinn wrote:
> On Tue, 5 Sep 2023 18:02:05 +0100, Martin Brown > <'''newspam'''@nonad.co.uk> wrote: > >> On 05/09/2023 17:45, Joe Gwinn wrote:
>>> calculator, so we must resort to logarithms, yielding 10^6021, which >>> is a *very* large number. The age of the Universe is only 14 billion >>> years, call it 10^10 years, so one would never be able to test even a >>> tiny fraction of the possible paths. >> >> McCabe's complexity metric provides a way to test paths in components >> and subsystems reasonably thoroughly and catch most of the common >> programmer errors. Static dataflow analysis is also a lot better now >> than in the past. >> >> Then you only need at most 40000 test vectors to take each branch of >> every binary if statement (60000 if it is Fortran with 3 way branches >> all used). That is a rather more tractable number (although still large). >> >> Any routine with too high a CCI count is practically certain to contain >> latent bugs - which makes it worth looking at more carefully. > > I must say that I fail to see how this can overcome 10^6021 paths, > even if it is wondrously effective, reducing the space to be tested by > a trillion to one (10^-12) - only 10^6009 paths to explore.
It is a divide and conquer strategy. It guarantees to execute every path at least once and common core paths many times with different data. The number of test vectors required scales as log2 of the number of paths. That is an enormous reduction in the state space and makes coverage testing possible. There are even some automatic tools that can help create the test vectors now. Not a recommendation but just to make you aware of some of the options for this sort of thing. www.accelq.com/blog/test-coverage-techniques/ Here is a simple example for N = 5 (you wouldn't code it this way) if (i<16) { if (i<8) { if (i<4) { if (i<2) { if (i<1) //zero else // one } else { if (i<3) // two else // three } else { } } else { } } else { } } else { } So that at each level i=0 ..N in the trivial example there are 2^N if statements to select between numbers in the range 0 to 31 inclusive. There is a deliberate error left in. Counted your way there are 2^(N+1)-1 if statements and so 2^(2^(N+1)) distinct paths through the software (plus a few more with invalid data). However integers -1 through 2^N will be sufficient to explore every path at least once and test for high/low fence post errors. Concrete example of N = 5 total if statements 63 naive paths through code 2^63 ~ 10^19 CCI test vectors to test every path 34 The example is topologically equivalent to real* code you merely have to construct input data that will force execution down each binary choice in turn at every level. Getting the absolute minimum number of test vectors for full coverage is a much harder problem but a good enough solution is possible in most practical cases. -- Martin Brown *real code can be grubby in reality with different depths of tree. (I'd hope that few routines go deeper than 5 nested if statements)
On 05/09/2023 19:06, Don Y wrote:
> On 9/5/2023 10:02 AM, Martin Brown wrote: >>> In round numbers, one in five lines of code is an IF statement, so in >>> 100,000 lines of code there will be 20,000 IF statements.&nbsp; So, there >>> are up to 2^20000 unique paths through the code.&nbsp; Which chokes my HP >> >> Although that is true it is also true that a small number of cunningly >> constructed test datasets can explore a very high proportion of the >> most frequently traversed paths in any given codebase. One snag is >> that testing is invariably cut short by management when development >> overruns. > > "We'll fix it in version 2" > > I always found this an amusing delusion. > > If the product is successful, there will be lots of people clamoring > for fixes so you won't have any manpower to devote to designing > version 2 (but your competitors will see the appeal your product > has and will start designing THEIR replacement for it!) > > If the product is a dud (possibly because of these problems), > there won't be a need for a version 2.
It depends a bit on how special the hardware being controlled is. The industry I am most familiar with was high end scientific mass spectrometers where management viewed the user software (and embedded code) as a necessary evil to make the hardware work properly.
> >> The bits that fail to get explored tend to be weird error recovery >> routines. I > > Because, by design, they are seldom encountered. > So, don't benefit from being exercised in the normal > course of operation.
It is always the rare cases that bite you in the backside (eventually). Back in the day when (even for a dry site) almost everyone went to the pub on Friday lunchtimes we had a rule that you could test software and note defects but not make any alterations that afternoon. The finger trouble caused by the odd person having had one too many replicated typical user errors rather well!
>> McCabe's complexity metric provides a way to test paths in components >> and subsystems reasonably thoroughly and catch most of the common >> programmer errors. Static dataflow analysis is also a lot better now >> than in the past. > > But some test cases can mask other paths through the code. > There is no guarantee that a given piece of code *can* be > thoroughly tested -- especially if you take into account the > fact that the underlying hardware isn't infallible;
The only time I have seen hardware failure masquerading as software bugs I can count on the fingers of one hand. They are memorable for that: 1. Memory management unit on ND500 supermini 2. Disk controller DMA flaw on certain HP models in the 386 era 3. Embedded CPU where RTI failed about 1:10^9 times 4. Intel FPU divide bug I know that cosmic ray single bit switches have to allowed for in some of the space probes. OK in error corrected memory but really tough if it happens in an unprotected CPU register.
> "if (x % )" can yield one result, now, and a different > result, 5 lines later -- even though x hasn't been > altered (but the hardware farted). > > So: > > &nbsp;&nbsp;&nbsp; if (x % 2) { > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this; > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that; > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing; > &nbsp;&nbsp;&nbsp; } else { > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that; > &nbsp;&nbsp;&nbsp; } > > can execute differently than: > > &nbsp;&nbsp;&nbsp; if (x % 2) { > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this; > &nbsp;&nbsp;&nbsp; } > > &nbsp;&nbsp;&nbsp; do that; > > &nbsp;&nbsp;&nbsp; if (x % 2) { > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing; > &nbsp;&nbsp;&nbsp; } > > Years ago, this possibility wasn't ever considered. > > [Yes, optimizers can twiddle this but the point remains] > > And, that doesn't begin to address hostile actors in a > system!
It is optimisers rearranging things that can make testing these days so much more difficult. The CPU out of order and speculative execution profile also means that the old rules about if statements no longer hold true. Even more so if the same path is taken many times. The old loop unrolling trick can actually work against you now if it means that the innermost loop no longer fits nicely inside a cache line.
> >> Then you only need at most 40000 test vectors to take each branch of >> every binary if statement (60000 if it is Fortran with 3 way branches >> all used). That is a rather more tractable number (although still large). >> >> Any routine with too high a CCI count is practically certain to >> contain latent bugs - which makes it worth looking at more carefully. > > "A 'program' should fit on a single piece of paper"
The 7 +/- 2 rule for each hierarchical level of a software design is still quite a good heuristic unless there are special circumstances. -- Martin Brown
On 9/6/2023 2:58 AM, Martin Brown wrote:
> On 05/09/2023 19:06, Don Y wrote: >> On 9/5/2023 10:02 AM, Martin Brown wrote: >>>> In round numbers, one in five lines of code is an IF statement, so in >>>> 100,000 lines of code there will be 20,000 IF statements.&nbsp; So, there >>>> are up to 2^20000 unique paths through the code.&nbsp; Which chokes my HP >>> >>> Although that is true it is also true that a small number of cunningly >>> constructed test datasets can explore a very high proportion of the most >>> frequently traversed paths in any given codebase. One snag is that testing >>> is invariably cut short by management when development overruns. >> >> "We'll fix it in version 2" >> >> I always found this an amusing delusion. >> >> If the product is successful, there will be lots of people clamoring >> for fixes so you won't have any manpower to devote to designing >> version 2 (but your competitors will see the appeal your product >> has and will start designing THEIR replacement for it!) >> >> If the product is a dud (possibly because of these problems), >> there won't be a need for a version 2. > > It depends a bit on how special the hardware being controlled is. > > The industry I am most familiar with was high end scientific mass spectrometers > where management viewed the user software (and embedded code) as a necessary > evil to make the hardware work properly.
Note that they were LIKELY reasonably skilled users. So, they could recognize things that "didn't quite make sense" and not be victimized by them. When users are less savvy, the consequences of a system problem can be more dire -- as they may not be able to recognize and compensate for it. E.g., the stove/oven bug I described -- if manifested -- is likely to leave its victims perplexed as to how to escape its grip.
>>> The bits that fail to get explored tend to be weird error recovery routines. I >> >> Because, by design, they are seldom encountered. >> So, don't benefit from being exercised in the normal >> course of operation. > > It is always the rare cases that bite you in the backside (eventually).
Because they *are* rare. Folks often don't check for memory allocation failures because they don't encounter them in the normal course of the program's execution. So, when the system is *stressed* and they manifest, the code to handle them is conveniently missing! [Oh, you don't have mechanisms in place to stress a system as it is being tested? Don't worry. Your CUSTOMERS will find a way in the normal course of operation... won't YOU be embarassed! :> ] I find that most folks haven't a clue as to how to deal with unexpected errors. And, you often just see them propagated along until *something* panic()s. My current design is entirely real-time. EVERY task has to specify a deadline. AND, a deadline handler -- what do you want the system to do WHEN your deadline isn't met. Don't want to think about that? Then it will just be killed off -- and, by extension, anything that relies on the services it provides will be killed off. Not going to leave a good impression to the user that installed your applet!
> Back in the day when (even for a dry site) almost everyone went to the pub on > Friday lunchtimes we had a rule that you could test software and note defects > but not make any alterations that afternoon. > > The finger trouble caused by the odd person having had one too many replicated > typical user errors rather well!
I have a delightful knack for tickling the unexplored corners of applications. It's something colleagues have grown to love -- and hate! "Have Don play with it for a while..." It's rare that I can't find something on a *finished* piece of software that doesn't work -- in just a few minutes of playing around! [I've got a Dragon cassette deck. It "autoreverses" by switching to a separate pair of tracks in the head that are in-line with the "back side" stereo channels on the tape -- so the cassette doesn't have to be mechanically flipped over. It's "tape counter" is a simple revolutions counter. So, if it starts at 0000 at the start of side A and reaches XXXX at the end of side A, it will count *backwards* back down to 0000 while it is playing side B. But, there is a race in the system logic that can repeatably cause it to count FORWARDS while *moving* BACKWARDS (there are several MCUs in the product and comms take finite time! :> ) It's a genuine defect because I have two of them and they both exhibit the same behavior. For a $2K (30+ years ago) device you would think they'd pay a bit more attention to detail!] I had a buddy "finish" a product, complete all of the testing prior to FDA approvals. I happened to be walking past him as he made his "announcement". I reached over his shoulder and typed something on the machine's console -- and it crashed spectacularly! "What did you do????!" "This:" "You're not supposed to do that!" "Why did your system LET ME?"
>>> McCabe's complexity metric provides a way to test paths in components and >>> subsystems reasonably thoroughly and catch most of the common programmer >>> errors. Static dataflow analysis is also a lot better now than in the past. >> >> But some test cases can mask other paths through the code. >> There is no guarantee that a given piece of code *can* be >> thoroughly tested -- especially if you take into account the >> fact that the underlying hardware isn't infallible; > > The only time I have seen hardware failure masquerading as software bugs I can > count on the fingers of one hand. They are memorable for that: > > 1. Memory management unit on ND500 supermini > 2. Disk controller DMA flaw on certain HP models in the 386 era > 3. Embedded CPU where RTI failed about 1:10^9 times > 4. Intel FPU divide bug > > I know that cosmic ray single bit switches have to allowed for in some of the > space probes. OK in error corrected memory but really tough if it happens in an > unprotected CPU register.
The fact that ECCs are now common on datapaths (and structures) *inside* CPUs suggests they already know they are pushing the envelope in terms of reliable behavior (in the absence of those mechanisms). Look at MLC flash where states are resolved based on a few handfulls of electrons! When do you -- as a user or designer -- lose faith in a system that is reporting CORRECTED errors? At what level do you start seriously wondering about the number of UNCORRECTED errors that you *might* be experiencing? What about systems that don't have provisions for reporting same? Do you just assume all is well? When do you start wondering if that flash memory that holds your system image, FPGA configuration, etc. is suffering from read wear? Do you have a mechanism for detecting that? Reporting it? Remedying it?? The software can't perform as designed if the hardware can't be relied upon to reliably reproduce it. Are you sure your *intended* code isn't triggering an exploitable feature accidentally? (are you sure any "foreign" code that you are hosting isn't deliberately trying to do so?)
>> "if (x % )" can yield one result, now, and a different >> result, 5 lines later -- even though x hasn't been >> altered (but the hardware farted). >> >> So: >> >> &nbsp;&nbsp;&nbsp;&nbsp; if (x % 2) { >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this; >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that; >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing; >> &nbsp;&nbsp;&nbsp;&nbsp; } else { >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do that; >> &nbsp;&nbsp;&nbsp;&nbsp; } >> >> can execute differently than: >> >> &nbsp;&nbsp;&nbsp;&nbsp; if (x % 2) { >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do this; >> &nbsp;&nbsp;&nbsp;&nbsp; } >> >> &nbsp;&nbsp;&nbsp;&nbsp; do that; >> >> &nbsp;&nbsp;&nbsp;&nbsp; if (x % 2) { >> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do another_thing; >> &nbsp;&nbsp;&nbsp;&nbsp; } >> >> Years ago, this possibility wasn't ever considered. >> >> [Yes, optimizers can twiddle this but the point remains] >> >> And, that doesn't begin to address hostile actors in a >> system! > > It is optimisers rearranging things that can make testing these days so much > more difficult. The CPU out of order and speculative execution profile also > means that the old rules about if statements no longer hold true. Even more so > if the same path is taken many times. The old loop unrolling trick can actually > work against you now if it means that the innermost loop no longer fits nicely > inside a cache line.
Most "programmers" are clueless of these issues; they just concentrate on getting the "logic correct". Hardware designers are even MORE clueless because they can't even imagine the performance consequences. I've had to refactor many of the critical path structures in my current design to "less intuitive" structures because of the cache hits incurred when designing for a more intuitive implementation (processor doesn't care about what *seems* related -- it only cares about what *is* related, in terms of execution).
>>> Then you only need at most 40000 test vectors to take each branch of every >>> binary if statement (60000 if it is Fortran with 3 way branches all used). >>> That is a rather more tractable number (although still large). >>> >>> Any routine with too high a CCI count is practically certain to contain >>> latent bugs - which makes it worth looking at more carefully. >> >> "A 'program' should fit on a single piece of paper" > > The 7 +/- 2 rule for each hierarchical level of a software design is still > quite a good heuristic unless there are special circumstances.
Makes you wonder how folks can still think things like monolithic kernels "make sense": how many crania do you need to gather to have even an APPROXIMATE understanding of what's happening in the machine, *now*? With multi-threaded kernels, are you sure you know everything that *might* be happening WHILE "this" is happening? *SO* much easier when you're trying to get some trivial piece of code to work than something CONSIDERABLY more complex! [It has been an eye-opener dealing with odd concepts like transport delays in designing and evaluating performance when you always took the cost of comms INSIDE a device to be zero! Think of all the critical regions that are inherently exposed that you never had to worry about: "What are the chances of <something> happening *between* the execution of statements 23 and 24?" Ans: 100% if the user is running the device! Now, how do you TEST for that inevitability?]
On 9/6/2023 4:42 AM, Don Y wrote:
> [I've got a Dragon cassette deck.&nbsp; It "autoreverses" by switching to > a separate pair of tracks in the head that are in-line with the > "back side" stereo channels on the tape -- so the cassette > doesn't have to be mechanically flipped over.&nbsp; It's "tape counter" > is a simple revolutions counter.&nbsp; So, if it starts at 0000 at > the start of side A and reaches XXXX at the end of side A, it > will count *backwards* back down to 0000 while it is playing side B. > But, there is a race in the system logic that can repeatably > cause it to count FORWARDS while *moving* BACKWARDS (there are > several MCUs in the product and comms take finite time!&nbsp; :> )
No, I think the flaw has it counting backwards while moving forwards (I haven't tickled that bug recently; I know its there so why give it a chance to manifest?) But, only if you act in a small (1-2 second) window of time. So, there is no way to claim that it's an intentional feature as the deck shouldn't care if you waited 1 second or 4!
> It's a genuine defect because I have two of them and they both > exhibit the same behavior.&nbsp; For a $2K (30+ years ago) device > you would think they'd pay a bit more attention to detail!]
On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 05/09/2023 23:14, Joe Gwinn wrote: >> On Tue, 5 Sep 2023 18:02:05 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> >>> On 05/09/2023 17:45, Joe Gwinn wrote: > >>>> calculator, so we must resort to logarithms, yielding 10^6021, which >>>> is a *very* large number. The age of the Universe is only 14 billion >>>> years, call it 10^10 years, so one would never be able to test even a >>>> tiny fraction of the possible paths. >>> >>> McCabe's complexity metric provides a way to test paths in components >>> and subsystems reasonably thoroughly and catch most of the common >>> programmer errors. Static dataflow analysis is also a lot better now >>> than in the past. >>> >>> Then you only need at most 40000 test vectors to take each branch of >>> every binary if statement (60000 if it is Fortran with 3 way branches >>> all used). That is a rather more tractable number (although still large). >>> >>> Any routine with too high a CCI count is practically certain to contain >>> latent bugs - which makes it worth looking at more carefully. >> >> I must say that I fail to see how this can overcome 10^6021 paths, >> even if it is wondrously effective, reducing the space to be tested by >> a trillion to one (10^-12) - only 10^6009 paths to explore. > >It is a divide and conquer strategy. It guarantees to execute every path >at least once and common core paths many times with different data. The >number of test vectors required scales as log2 of the number of paths. > >That is an enormous reduction in the state space and makes coverage >testing possible. There are even some automatic tools that can help >create the test vectors now. Not a recommendation but just to make you >aware of some of the options for this sort of thing. > >www.accelq.com/blog/test-coverage-techniques/ > >Here is a simple example for N = 5 (you wouldn't code it this way) > >if (i<16) >{ > if (i<8) > { > if (i<4) > { > if (i<2) > { > if (i<1) //zero > else // one > } > else > { > if (i<3) // two > else // three > } > else > { > } > } > else > { > } > } > else > { > } >} >else >{ >} > >So that at each level i=0 ..N in the trivial example there are 2^N if >statements to select between numbers in the range 0 to 31 inclusive. >There is a deliberate error left in. > >Counted your way there are 2^(N+1)-1 if statements and so 2^(2^(N+1)) >distinct paths through the software (plus a few more with invalid data). > >However integers -1 through 2^N will be sufficient to explore every path >at least once and test for high/low fence post errors. > >Concrete example of N = 5 > total if statements 63 > naive paths through code 2^63 ~ 10^19 > CCI test vectors to test every path 34 > >The example is topologically equivalent to real* code you merely have to >construct input data that will force execution down each binary choice >in turn at every level. Getting the absolute minimum number of test >vectors for full coverage is a much harder problem but a good enough >solution is possible in most practical cases.
In practice, this is certainly pretty effective, but the proposed requirement did not allow for such shortcuts, rendering the requirement intractable - the Sun will blow up first. Also, in practice we do a combination of random probing and fuzzing. .<https://en.wikipedia.org/wiki/Fuzzing> Joe Gwinn
On 06/09/2023 19:20, Joe Gwinn wrote:
> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown > <'''newspam'''@nonad.co.uk> wrote: > >> The example is topologically equivalent to real* code you merely have to >> construct input data that will force execution down each binary choice >> in turn at every level. Getting the absolute minimum number of test >> vectors for full coverage is a much harder problem but a good enough >> solution is possible in most practical cases. > > In practice, this is certainly pretty effective, but the proposed > requirement did not allow for such shortcuts, rendering the > requirement intractable - the Sun will blow up first. > > Also, in practice we do a combination of random probing and fuzzing. > > .<https://en.wikipedia.org/wiki/Fuzzing>
One tactic I have adopted for testing numerical code is very similar. Basically a biassed random number generator which creates test data for the routines under test and then verifies the answers. It is extra work to do both a solver and a verifier but not that much and the verification of such provides a basis for regression testing. Most software the computation being done may be very difficult but the inverse is often relatively easy by comparison. Finding all real roots of a function f(x)=0 to maximum accuracy is quite tricky but given a supposed root x0 then computing the value of the equation f(x0) and its derivative f'(x0) is easy. Then you can use NR to see if the correction is acceptably small enough, if not rinse and repeat. I found a new bug in a cubic solver that is as robust as any on the planet quite recently. It required a very specific near exact combination of 3 64 bit parameters to create a catastrophic numeric cancellation down a seldom trodden path where the cubic equation has three real roots and you want the one that it can't compute accurately. Most of these problems we try very hard to only have one real root... My initial reaction was that it was tested library code so it must be my problem - until I traced into it and saw how it failed. It gives 8 instead of 16 sig fig in double precision for these pathological data. -- Martin Brown