Electronics-Related.com
Forums

Tracking bug report frequency

Started by Don Y September 4, 2023
On Thu, 7 Sep 2023 17:31:42 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 06/09/2023 19:20, Joe Gwinn wrote: >> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> >>> The example is topologically equivalent to real* code you merely have to >>> construct input data that will force execution down each binary choice >>> in turn at every level. Getting the absolute minimum number of test >>> vectors for full coverage is a much harder problem but a good enough >>> solution is possible in most practical cases. >> >> In practice, this is certainly pretty effective, but the proposed >> requirement did not allow for such shortcuts, rendering the >> requirement intractable - the Sun will blow up first. >> >> Also, in practice we do a combination of random probing and fuzzing. >> >> .<https://en.wikipedia.org/wiki/Fuzzing> > >One tactic I have adopted for testing numerical code is very similar. >Basically a biased random number generator which creates test data for >the routines under test and then verifies the answers.
What do you mean by biased?
>It is extra work to do both a solver and a verifier but not that much >and the verification of such provides a basis for regression testing. > >Most software the computation being done may be very difficult but the >inverse is often relatively easy by comparison. Finding all real roots >of a function f(x)=0 to maximum accuracy is quite tricky but given a >supposed root x0 then computing the value of the equation f(x0) and its >derivative f'(x0) is easy. Then you can use NR to see if the correction >is acceptably small enough, if not rinse and repeat.
I often do that. Another trick is that many problems can be solved two different ways, so I solve twice and compare. One can also check that invariant's are respected. NR = Newton Raphson? And then there is co-plotting and staring - the Mark I eyeball is a very good pattern and anomaly detector, especially of concentricity or linearity.
>I found a new bug in a cubic solver that is as robust as any on the >planet quite recently. It required a very specific near exact >combination of 3 64 bit parameters to create a catastrophic numeric >cancellation down a seldom trodden path where the cubic equation has >three real roots and you want the one that it can't compute accurately. > >Most of these problems we try very hard to only have one real root... > >My initial reaction was that it was tested library code so it must be my >problem - until I traced into it and saw how it failed. It gives 8 >instead of 16 sig fig in double precision for these pathological data.
That's a good one. A sort-of parallel of my experience was in the late 1990s when looking for failure patterns in how a network was composed and connected, and were looking for test cases that could be used to debut the fault-tolerance algorithm. We used a form of Monte Carlo test, where we described the network contents and topology using a random number, and then tried the FT algorithm on it. Only if the algorithm failed (anything other than graceful behavior) did we record the random number, and then proceed, looking. Letting the random poking continue day and night for a few days sufficiently reduced the algorithm failure rate. Key to this was to write the algorithm and test harness in plain C code, not using any commercial statistical test application - these were large interpreters, and were orders of magnitude too slow. Not to mention expensive and hard to use. Sometimes, it's best to reinvent the wheel. Now days, I'd probably use some kind of genetic programming to generate test cases. But totally random had the advantage of making no assumptions, and so caught hazards that nobody could guess or suspect. Joe Gwinn
On 07/09/2023 23:20, Joe Gwinn wrote:
> On Thu, 7 Sep 2023 17:31:42 +0100, Martin Brown > <'''newspam'''@nonad.co.uk> wrote: > >> On 06/09/2023 19:20, Joe Gwinn wrote: >>> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown >>> <'''newspam'''@nonad.co.uk> wrote: >>> >>>> The example is topologically equivalent to real* code you merely have to >>>> construct input data that will force execution down each binary choice >>>> in turn at every level. Getting the absolute minimum number of test >>>> vectors for full coverage is a much harder problem but a good enough >>>> solution is possible in most practical cases. >>> >>> In practice, this is certainly pretty effective, but the proposed >>> requirement did not allow for such shortcuts, rendering the >>> requirement intractable - the Sun will blow up first. >>> >>> Also, in practice we do a combination of random probing and fuzzing. >>> >>> .<https://en.wikipedia.org/wiki/Fuzzing> >> >> One tactic I have adopted for testing numerical code is very similar. >> Basically a biased random number generator which creates test data for >> the routines under test and then verifies the answers. > > What do you mean by biased?
Instead of a linear random number on 0-1 it generates numbers that are focussed on regions that may cause trouble for the algorithm under test. Usually but not always in the vicinity of zero or very large values. A few bits of the original seed are used to determine how to convert it. Typical mappings from x a double precision uniform on 0-1 are x x^2 1-x^2 x^4 x^8 x^16 with some scale factor K multiplying it up to beyond the range that the parameter is supposed to take. Or more generally but a bit sparse: x/(1+eps-x) (x-1/2)/(1+eps-x) (sacrifices 3 bits of the original 64 bit seed)
>> It is extra work to do both a solver and a verifier but not that much >> and the verification of such provides a basis for regression testing. >> >> Most software the computation being done may be very difficult but the >> inverse is often relatively easy by comparison. Finding all real roots >> of a function f(x)=0 to maximum accuracy is quite tricky but given a >> supposed root x0 then computing the value of the equation f(x0) and its >> derivative f'(x0) is easy. Then you can use NR to see if the correction >> is acceptably small enough, if not rinse and repeat. > > I often do that. Another trick is that many problems can be solved > two different ways, so I solve twice and compare. > > One can also check that invariant's are respected.
Invariants are invaluable in critical code you get to find out sooner rather than later if one of your assumptions is invalid, something has gone wrong or the routine has been fed garbage inputs.
> NR = Newton Raphson?
Yes. Although these days my production code actually uses Halley's method in preference since it is almost the same speed to execute on modern pipeline CPUs and gives cubic rather than quadratic convergence. Some practitioners prefer what they call modified NR which is a sort of hybrid of NR with Halley with slightly better convergence than Halley. However it involves a sqrt and isn't always better. Equation (4) in this paper (which includes the derivation) - method originally due to Laguerre but now seeing a revival in some codes. https://academic.oup.com/mnras/article/467/2/1702/2929272 The gotcha is that although as written with abs() it never actually fails - a sufficiently bad guess will still give poor answers, but is much less likely to explode or oscillate in the way that NR does. I'm not convinced that it is worth the cost of a sqrt but with CPUs improving all the time it is now very close. The latest i5-12600 I have can do a sqrt in about the same time as it takes to do a divide!
> And then there is co-plotting and staring - the Mark I eyeball is a > very good pattern and anomaly detector, especially of concentricity or > linearity.
Looking at the plot of residual errors can be very enlightening. So can looking at the histogram for many test cases of the error distribution of solve, test and then measure error in the resulting answer. It is remarkably easy to home in on any weak spots and edge cases. My normal reporting rule is printout exceptions whenever the worst case high or low error is exceeded. Verbose mode shows when it is equalled.
>> I found a new bug in a cubic solver that is as robust as any on the >> planet quite recently. It required a very specific near exact >> combination of 3 64 bit parameters to create a catastrophic numeric >> cancellation down a seldom trodden path where the cubic equation has >> three real roots and you want the one that it can't compute accurately. >> >> Most of these problems we try very hard to only have one real root... >> >> My initial reaction was that it was tested library code so it must be my >> problem - until I traced into it and saw how it failed. It gives 8 >> instead of 16 sig fig in double precision for these pathological data. > > That's a good one.
I was a bit surprised. It required an astonishingly exact combination of factors for it to trigger and one of my real problems hit it square on. -- Martin Brown
On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 07/09/2023 23:20, Joe Gwinn wrote: >> On Thu, 7 Sep 2023 17:31:42 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> >>> On 06/09/2023 19:20, Joe Gwinn wrote: >>>> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown >>>> <'''newspam'''@nonad.co.uk> wrote: >>>> >>>>> The example is topologically equivalent to real* code you merely have to >>>>> construct input data that will force execution down each binary choice >>>>> in turn at every level. Getting the absolute minimum number of test >>>>> vectors for full coverage is a much harder problem but a good enough >>>>> solution is possible in most practical cases. >>>> >>>> In practice, this is certainly pretty effective, but the proposed >>>> requirement did not allow for such shortcuts, rendering the >>>> requirement intractable - the Sun will blow up first. >>>> >>>> Also, in practice we do a combination of random probing and fuzzing. >>>> >>>> .<https://en.wikipedia.org/wiki/Fuzzing> >>> >>> One tactic I have adopted for testing numerical code is very similar. >>> Basically a biased random number generator which creates test data for >>> the routines under test and then verifies the answers. >> >> What do you mean by biased? > >Instead of a linear random number on 0-1 it generates numbers that are >focussed on regions that may cause trouble for the algorithm under test. > >Usually but not always in the vicinity of zero or very large values. A >few bits of the original seed are used to determine how to convert it. > >Typical mappings from x a double precision uniform on 0-1 are >x >x^2 >1-x^2 >x^4 >x^8 >x^16 > >with some scale factor K multiplying it up to beyond the range that the >parameter is supposed to take. Or more generally but a bit sparse:
This reminds me of how one tests memory hardware.
>x/(1+eps-x) >(x-1/2)/(1+eps-x)
I'd be tempted to fuzz the eps parameter.
>(sacrifices 3 bits of the original 64 bit seed) > >>> It is extra work to do both a solver and a verifier but not that much >>> and the verification of such provides a basis for regression testing. >>> >>> Most software the computation being done may be very difficult but the >>> inverse is often relatively easy by comparison. Finding all real roots >>> of a function f(x)=0 to maximum accuracy is quite tricky but given a >>> supposed root x0 then computing the value of the equation f(x0) and its >>> derivative f'(x0) is easy. Then you can use NR to see if the correction >>> is acceptably small enough, if not rinse and repeat. >> >> I often do that. Another trick is that many problems can be solved >> two different ways, so I solve twice and compare. >> >> One can also check that invariant's are respected. > >Invariants are invaluable in critical code you get to find out sooner >rather than later if one of your assumptions is invalid, something has >gone wrong or the routine has been fed garbage inputs.
Oh yes. I also code invariants into in-line assertions, for runtime detection.
>> NR = Newton Raphson? > >Yes. Although these days my production code actually uses Halley's >method in preference since it is almost the same speed to execute on >modern pipeline CPUs and gives cubic rather than quadratic convergence.
Interesting. I had not heard of Halley's Method, and cubic convergence is worthwhile.
>Some practitioners prefer what they call modified NR which is a sort of >hybrid of NR with Halley with slightly better convergence than Halley. >However it involves a sqrt and isn't always better. > >Equation (4) in this paper (which includes the derivation) - method >originally due to Laguerre but now seeing a revival in some codes. > ><https://academic.oup.com/mnras/article/467/2/1702/2929272> > >The gotcha is that although as written with abs() it never actually >fails - a sufficiently bad guess will still give poor answers, but is >much less likely to explode or oscillate in the way that NR does.
So NR is better because it always warns?
> >I'm not convinced that it is worth the cost of a sqrt but with CPUs >improving all the time it is now very close. The latest i5-12600 I have >can do a sqrt in about the same time as it takes to do a divide!
Wonder what algorithm it uses?
>> And then there is co-plotting and staring - the Mark I eyeball is a >> very good pattern and anomaly detector, especially of concentricity or >> linearity. > >Looking at the plot of residual errors can be very enlightening. So can >looking at the histogram for many test cases of the error distribution >of solve, test and then measure error in the resulting answer. > >It is remarkably easy to home in on any weak spots and edge cases. My >normal reporting rule is printout exceptions whenever the worst case >high or low error is exceeded. Verbose mode shows when it is equalled.
Yes. But I always make plots and stare at them. Although I have found it useful to program a distinctive chord that sounds if for instance two measurement datafiles are identical, or a file was not found, so I don't have to stare at the screen when the code is grinding away.
>>> I found a new bug in a cubic solver that is as robust as any on the >>> planet quite recently. It required a very specific near exact >>> combination of 3 64 bit parameters to create a catastrophic numeric >>> cancellation down a seldom trodden path where the cubic equation has >>> three real roots and you want the one that it can't compute accurately. >>> >>> Most of these problems we try very hard to only have one real root... >>> >>> My initial reaction was that it was tested library code so it must be my >>> problem - until I traced into it and saw how it failed. It gives 8 >>> instead of 16 sig fig in double precision for these pathological data. >> >> That's a good one. > >I was a bit surprised. It required an astonishingly exact combination of >factors for it to trigger and one of my real problems hit it square on.
So biased probing paid off here? Or only for diagnosis? In the 1980s I hit a similar problem, but it wasn't hard to find. In scaled integer arithmetic (no floating point then), it's very common to multiply two 32-bit integers together, yielding a 64-bit product, and then divide that by a 64-bit scaling factor, yielding a 32-bit quotient. In the Fortran of the day, if one just wrote this as (a*b)/c, the compiler would truncate the product (a*b) to 32 bits before dividing by c, yielding a wildly wrong answer, without any warnings or drama. The solution was define a 64-bit intermediate variable, and do the operation in two lines: p=a*b, result = p/c. Joe Gwinn
On 08/09/2023 22:14, Joe Gwinn wrote:
> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown > <'''newspam'''@nonad.co.uk> wrote: >
>>> NR = Newton Raphson? >> >> Yes. Although these days my production code actually uses Halley's >> method in preference since it is almost the same speed to execute on >> modern pipeline CPUs and gives cubic rather than quadratic convergence. > > Interesting. I had not heard of Halley's Method, and cubic > convergence is worthwhile.
Indeed. Halley was pretty clever but forever overshadowed by Newton. Some of the latest codes use Delta_4 and Delta_5 corrections provided that the function is sufficiently differentiable. They have to be computed very carefully with considerable attention to detail! One curiosity I have seen with them is that for me D_5 is anomalously fast despite containing D_4 ! On certain problems D5 is *faster* than NR (I suspect it is some sort of pipeline stall slowing the others down). In the past they all got slower because there was a division for each step down the chain. I have a new computational scheme which avoids that (at an increased risk of under/over flow in some edge cases) The codes all run as expected on simple starting guesses but when the fancy cubic ones are used they show this peculiar speed variation. It is most noticeable with the Intel compiler (less so on MSC or GCC). The reason these high order methods are fashionable right now is that the iterative methods require extra function evaluations and for that you have to crystallise the next value of x1 and recompute f(x1),f'(x1) etc. Historically they would iterate 3-5 times to solve it. Having to wait for that new input value is an unavoidable hard pipeline stall. But if you can get the initial guess with a relative error below 10^(-16/N) then in an ideal world the Nth order difference corrector gets the answer in a single step and so the next value can be computed.
>> Some practitioners prefer what they call modified NR which is a sort of >> hybrid of NR with Halley with slightly better convergence than Halley. >> However it involves a sqrt and isn't always better. >> >> Equation (4) in this paper (which includes the derivation) - method >> originally due to Laguerre but now seeing a revival in some codes. >> >> <https://academic.oup.com/mnras/article/467/2/1702/2929272> >> >> The gotcha is that although as written with abs() it never actually >> fails - a sufficiently bad guess will still give poor answers, but is >> much less likely to explode or oscillate in the way that NR does. > > So NR is better because it always warns?
No. NR can go completely AWOL far too easily and diverge to infinity or oscillate chaotically between a set of values. Halley's method is the best of the three for stability, but MNR will get you slightly better precision if you can be sure your guesses are always close enough.
>> I'm not convinced that it is worth the cost of a sqrt but with CPUs >> improving all the time it is now very close. The latest i5-12600 I have >> can do a sqrt in about the same time as it takes to do a divide! > > Wonder what algorithm it uses?
My guess is some sort of cunning hardware shift, multiply and add. The latest generation of chips stand out in this respect. If only they would implement a hardware cube root then I would be very happy.
>>> And then there is co-plotting and staring - the Mark I eyeball is a >>> very good pattern and anomaly detector, especially of concentricity or >>> linearity. >> >> Looking at the plot of residual errors can be very enlightening. So can >> looking at the histogram for many test cases of the error distribution >> of solve, test and then measure error in the resulting answer. >> >> It is remarkably easy to home in on any weak spots and edge cases. My >> normal reporting rule is printout exceptions whenever the worst case >> high or low error is exceeded. Verbose mode shows when it is equalled. > > Yes. But I always make plots and stare at them.
I only do that if something catches my attention as potentially odd.
>>>> My initial reaction was that it was tested library code so it must be my >>>> problem - until I traced into it and saw how it failed. It gives 8 >>>> instead of 16 sig fig in double precision for these pathological data. >>> >>> That's a good one. >> >> I was a bit surprised. It required an astonishingly exact combination of >> factors for it to trigger and one of my real problems hit it square on. > > So biased probing paid off here? Or only for diagnosis?
It confirmed that given the right awkward parameters the solver would fail but also that it was a a very low risk event. Taking an overnight run just to find a single example using random data. I just happened to be unlucky in my exact choice of problem. Expansion around pi/4 was absolutely fine but the expansion around 3pi/4 went completely haywire even though it differed only in a handful of sign changes to coefficients. I was utterly convinced it was my user error.
> > In the 1980s I hit a similar problem, but it wasn't hard to find. In > scaled integer arithmetic (no floating point then), it's very common > to multiply two 32-bit integers together, yielding a 64-bit product, > and then divide that by a 64-bit scaling factor, yielding a 32-bit > quotient. In the Fortran of the day, if one just wrote this as > (a*b)/c, the compiler would truncate the product (a*b) to 32 bits > before dividing by c, yielding a wildly wrong answer, without any > warnings or drama. The solution was define a 64-bit intermediate > variable, and do the operation in two lines: p=a*b, result = p/c.
ISTR there being a library function MULDIV(a,b,c) for that at least on the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's). -- Martin Brown
On 2023-09-11 04:05, Martin Brown wrote:
> On 08/09/2023 22:14, Joe Gwinn wrote: >> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> > >>>> NR = Newton Raphson? >>> >>> Yes. Although these days my production code actually uses >>> Halley's method in preference since it is almost the same speed >>> to execute on modern pipeline CPUs and gives cubic rather than >>> quadratic convergence. >> >> Interesting. I had not heard of Halley's Method, and cubic >> convergence is worthwhile. > > Indeed. Halley was pretty clever but forever overshadowed by Newton. > > Some of the latest codes use Delta_4 and Delta_5 corrections > provided that the function is sufficiently differentiable. They have > to be computed very carefully with considerable attention to detail!
> > One curiosity I have seen with them is that for me D_5 is > anomalously fast despite containing D_4 ! On certain problems D5 is > *faster* than NR (I suspect it is some sort of pipeline stall slowing > the others down). > > In the past they all got slower because there was a division for > each step down the chain. I have a new computational scheme which > avoids that (at an increased risk of under/over flow in some edge > cases) > > The codes all run as expected on simple starting guesses but when > the fancy cubic ones are used they show this peculiar speed > variation. It is most noticeable with the Intel compiler (less so on > MSC or GCC). > > The reason these high order methods are fashionable right now is > that the iterative methods require extra function evaluations and for > that you have to crystallise the next value of x1 and recompute > f(x1),f'(x1) etc. Historically they would iterate 3-5 times to solve > it. Having to wait for that new input value is an unavoidable hard > pipeline stall. > > But if you can get the initial guess with a relative error below > 10^(-16/N) then in an ideal world the Nth order difference corrector > gets the answer in a single step and so the next value can be > computed. > >>> Some practitioners prefer what they call modified NR which is a >>> sort of hybrid of NR with Halley with slightly better >>> convergence than Halley. However it involves a sqrt and isn't >>> always better. >>> >>> Equation (4) in this paper (which includes the derivation) - >>> method originally due to Laguerre but now seeing a revival in >>> some codes. >>> >>> <https://academic.oup.com/mnras/article/467/2/1702/2929272> >>> >>> The gotcha is that although as written with abs() it never >>> actually fails - a sufficiently bad guess will still give poor >>> answers, but is much less likely to explode or oscillate in the >>> way that NR does. >> >> So NR is better because it always warns? > > No. NR can go completely AWOL far too easily and diverge to infinity > or oscillate chaotically between a set of values. Halley's method is > the best of the three for stability, but MNR will get you slightly > better precision if you can be sure your guesses are always close > enough. > >>> I'm not convinced that it is worth the cost of a sqrt but with >>> CPUs improving all the time it is now very close. The latest >>> i5-12600 I have can do a sqrt in about the same time as it takes >>> to do a divide! >> >> Wonder what algorithm it uses? > > My guess is some sort of cunning hardware shift, multiply and add. > The latest generation of chips stand out in this respect. If only > they would implement a hardware cube root then I would be very > happy. > >>>> And then there is co-plotting and staring - the Mark I eyeball >>>> is a very good pattern and anomaly detector, especially of >>>> concentricity or linearity. >>> >>> Looking at the plot of residual errors can be very enlightening. >>> So can looking at the histogram for many test cases of the error >>> distribution of solve, test and then measure error in the >>> resulting answer. >>> >>> It is remarkably easy to home in on any weak spots and edge >>> cases. My normal reporting rule is printout exceptions whenever >>> the worst case high or low error is exceeded. Verbose mode shows >>> when it is equalled. >> >> Yes. But I always make plots and stare at them. > > I only do that if something catches my attention as potentially odd. >
>
It's been quite awhile since I looked into the problem (and I was never close to your level of expertise there, Martin). However, this being Usenet, I am undeterred. ;) Cubic methods are great as long as computing the second derivative is cheap. BITD the "Illinois algorithm" and its relatives got a fair amount of attention (in select circles). They're regula-falsi based, but with hyperlinear convergence. The best that I know is a Bjorck & Anderson variant whose efficiency index (basically the average order of convergence per function evaluation) gets up to 8**(1/4) ~ 1.68. This compares with 1.0 for bisection or vanilla Regula Falsi, and sqrt(2) for the secant method. Secant is too squirrelly to use by itself, because of its tendency to hit a flat spot and jump straight to East Limbo, never to return. Newton has that problem too, so you have to find some finite interval where f_1 * f_2 < 0, and reject steps that fall outside it. Besides that, there are two cases with Mr. Newton: if computing the derivative is comparatively cheap, as with trig functions, he gets an efficiency index of 2, but if not, he's stuck down at sqrt(2) with secant. (Sometimes the derivative is harder to compute than the value, e.g. with special functions or ratios of trig polynomials, in which case Newton is even worse off.) On the other hand, if the derivative can be computed without nasty significance loss due to cancellation, N. is likely to work better at high precision. I've nearly always used derivative-free methods, because I'm usually not too concerned with ultimate accuracy, and the best ones can reach an efficiency index of very nearly 2.0, which is pretty cool. (Besides, e I'm too lazy to want to manage all of those derivatives.)
>>>>> My initial reaction was that it was tested library code so >>>>> it must be my problem - until I traced into it and saw how >>>>> it failed. It gives 8 instead of 16 sig fig in double >>>>> precision for these pathological data. >>>> >>>> That's a good one. >>> >>> I was a bit surprised. It required an astonishingly exact >>> combination of factors for it to trigger and one of my real >>> problems hit it square on. >> >> So biased probing paid off here? Or only for diagnosis? > > It confirmed that given the right awkward parameters the solver would > fail but also that it was a a very low risk event. Taking an > overnight run just to find a single example using random data. > > I just happened to be unlucky in my exact choice of problem. > Expansion around pi/4 was absolutely fine but the expansion around > 3pi/4 went completely haywire even though it differed only in a > handful of sign changes to coefficients. I was utterly convinced it > was my user error. >> >> In the 1980s I hit a similar problem, but it wasn't hard to find. >> In scaled integer arithmetic (no floating point then), it's very >> common to multiply two 32-bit integers together, yielding a 64-bit >> product, and then divide that by a 64-bit scaling factor, yielding >> a 32-bit quotient. In the Fortran of the day, if one just wrote >> this as (a*b)/c, the compiler would truncate the product (a*b) to >> 32 bits before dividing by c, yielding a wildly wrong answer, >> without any warnings or drama. The solution was define a 64-bit >> intermediate variable, and do the operation in two lines: p=a*b, >> result = p/c. > > ISTR there being a library function MULDIV(a,b,c) for that at least > on the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 > but I think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late > 70's).
Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 08/09/2023 22:14, Joe Gwinn wrote: >> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> > >>>> NR = Newton Raphson? >>> >>> Yes. Although these days my production code actually uses Halley's >>> method in preference since it is almost the same speed to execute on >>> modern pipeline CPUs and gives cubic rather than quadratic convergence. >> >> Interesting. I had not heard of Halley's Method, and cubic >> convergence is worthwhile. > >Indeed. Halley was pretty clever but forever overshadowed by Newton. > >Some of the latest codes use Delta_4 and Delta_5 corrections provided >that the function is sufficiently differentiable. They have to be >computed very carefully with considerable attention to detail!
And differentiation amplifies noise.
>One curiosity I have seen with them is that for me D_5 is anomalously >fast despite containing D_4 ! On certain problems D5 is *faster* than NR >(I suspect it is some sort of pipeline stall slowing the others down).
In the old days, one would see a bunch of NOPs in the generated assembly, but that won't work for this. I would code a test case in assembly code and time it. Maybe by toggling a wire before and after, and looking with a scope.
>In the past they all got slower because there was a division for each >step down the chain. I have a new computational scheme which avoids that >(at an increased risk of under/over flow in some edge cases)
The old dodge was to multiply by the pre computed reciprocal, if this could be computed less often than every division.
>The codes all run as expected on simple starting guesses but when the >fancy cubic ones are used they show this peculiar speed variation. It is >most noticeable with the Intel compiler (less so on MSC or GCC). > >The reason these high order methods are fashionable right now is that >the iterative methods require extra function evaluations and for that >you have to crystallise the next value of x1 and recompute f(x1),f'(x1) >etc. Historically they would iterate 3-5 times to solve it. Having to >wait for that new input value is an unavoidable hard pipeline stall.
Use staggered parallel pipelines?
>But if you can get the initial guess with a relative error below >10^(-16/N) then in an ideal world the Nth order difference corrector >gets the answer in a single step and so the next value can be computed.
Yes. I used to do this level of optimizing, but in operating-system kernels. High-level interrupt routines warranted personal instruction-by-instruction hand optimization, with a drawing for each and every step. It might have been 100 lines of assembly code. Multiplied by the interrupt rate and overhead of this or that bit of hardware. Eliminating even one clock cycle was worthwhile.
>>> Some practitioners prefer what they call modified NR which is a sort of >>> hybrid of NR with Halley with slightly better convergence than Halley. >>> However it involves a sqrt and isn't always better. >>> >>> Equation (4) in this paper (which includes the derivation) - method >>> originally due to Laguerre but now seeing a revival in some codes. >>> >>> <https://academic.oup.com/mnras/article/467/2/1702/2929272> >>> >>> The gotcha is that although as written with abs() it never actually >>> fails - a sufficiently bad guess will still give poor answers, but is >>> much less likely to explode or oscillate in the way that NR does. >> >> So NR is better because it always warns? > >No. NR can go completely AWOL far too easily and diverge to infinity or >oscillate chaotically between a set of values. Halley's method is the >best of the three for stability, but MNR will get you slightly better >precision if you can be sure your guesses are always close enough. > >>> I'm not convinced that it is worth the cost of a sqrt but with CPUs >>> improving all the time it is now very close. The latest i5-12600 I have >>> can do a sqrt in about the same time as it takes to do a divide! >> >> Wonder what algorithm it uses? > >My guess is some sort of cunning hardware shift, multiply and add. The >latest generation of chips stand out in this respect. If only they would >implement a hardware cube root then I would be very happy.
That makes sense, as many have a barrel shifter in hardware.
> >>>> And then there is co-plotting and staring - the Mark I eyeball is a >>>> very good pattern and anomaly detector, especially of concentricity or >>>> linearity. >>> >>> Looking at the plot of residual errors can be very enlightening. So can >>> looking at the histogram for many test cases of the error distribution >>> of solve, test and then measure error in the resulting answer. >>> >>> It is remarkably easy to home in on any weak spots and edge cases. My >>> normal reporting rule is printout exceptions whenever the worst case >>> high or low error is exceeded. Verbose mode shows when it is equalled. >> >> Yes. But I always make plots and stare at them. > >I only do that if something catches my attention as potentially odd.
I always do it, precisely to improve my chances of noticing something odd.
>>>>> My initial reaction was that it was tested library code so it must be my >>>>> problem - until I traced into it and saw how it failed. It gives 8 >>>>> instead of 16 sig fig in double precision for these pathological data. >>>> >>>> That's a good one. >>> >>> I was a bit surprised. It required an astonishingly exact combination of >>> factors for it to trigger and one of my real problems hit it square on. >> >> So biased probing paid off here? Or only for diagnosis? > >It confirmed that given the right awkward parameters the solver would >fail but also that it was a a very low risk event. Taking an overnight >run just to find a single example using random data. > >I just happened to be unlucky in my exact choice of problem. Expansion >around pi/4 was absolutely fine but the expansion around 3pi/4 went >completely haywire even though it differed only in a handful of sign >changes to coefficients. I was utterly convinced it was my user error. >> >> In the 1980s I hit a similar problem, but it wasn't hard to find. In >> scaled integer arithmetic (no floating point then), it's very common >> to multiply two 32-bit integers together, yielding a 64-bit product, >> and then divide that by a 64-bit scaling factor, yielding a 32-bit >> quotient. In the Fortran of the day, if one just wrote this as >> (a*b)/c, the compiler would truncate the product (a*b) to 32 bits >> before dividing by c, yielding a wildly wrong answer, without any >> warnings or drama. The solution was define a 64-bit intermediate >> variable, and do the operation in two lines: p=a*b, result = p/c. > >ISTR there being a library function MULDIV(a,b,c) for that at least on >the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I >think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).
Yes, it either didn't exist yet or didn't cover the specific case we had. A parallel story from the late 1970s was while building a C-141 maintenance trainer (mockup cockpit animated by a midi-computer of the day) in Fortran, and we were forced to&#4294967295;implement partly in assembler, specifically a collection of assembly-coded Fortran functions, to implement 32-bit bitwise logic functions. When I told the application programmers that they would be writing assembly, there was much wailing and rending of garments. But I provided a template and annotated example, and they soon got the hang of it. Thirty or forty year later, I get a call out of the blue from a company I'd never heard of. I don't know how they got my name. They had just won a contract to port the now legacy trainer code to C, and didn't know what to do with all that assembler. I told him the reason, and that this had been just before the ISA Extensions had been released, but that all that assembler did was bitwise logic, which could be done directly in C. The caller was very happy to hear this. Joe Gwinn
On 11/09/2023 13:36, Phil Hobbs wrote:
> On 2023-09-11 04:05, Martin Brown wrote:
>> Indeed. Halley was pretty clever but forever overshadowed by Newton. >> >> Some of the latest codes use Delta_4 and Delta_5 corrections >> provided that the function is sufficiently differentiable. They have >> to be computed very carefully with considerable attention to detail! > >> > It's been quite awhile since I looked into the problem (and I was never > close to your level of expertise there, Martin).&nbsp; However, this being > Usenet, I am undeterred. ;) > > Cubic methods are great as long as computing the second derivative is > cheap.&nbsp; BITD the "Illinois algorithm" and its relatives got a fair > amount of attention (in select circles).&nbsp; They're regula-falsi based, > but with hyperlinear convergence.&nbsp; The best that I know is a&nbsp; Bjorck & > Anderson variant whose efficiency index (basically the average order of > convergence per function evaluation) gets up to&nbsp; 8**(1/4) ~ 1.68. > This compares with 1.0 for bisection or vanilla Regula Falsi, and > sqrt(2) for the secant method.
That's a blast from the past. It used to be popular in some of these problems until a way was found to tame potential divergence (basically by making sure that the starting guess is always within the range where the sequence is absolutely convergent). Some analytical sleight of hand is needed to get it spot on or there will be some insanely small but still non-zero value where it goes crazy as eccentricity e -> 1. There has been a recent paper using bisection but it has various flaws - not least that it determines an accurate answer to the wrong question.
> Secant is too squirrelly to use by itself, because of its tendency to > hit a flat spot and jump straight to East Limbo, never to return. > Newton has that problem too, so you have to find some finite interval > where f_1 * f_2 < 0, and reject steps that fall outside it.
One of the cute features of Kepler's problem E = M + e*sin(E) is that one of the simplest naive upper bounds of E_0 = M+e is absolutely convergent so if you clip on that you remain numerically stable.
> Besides that, there are two cases with Mr. Newton: if computing the > derivative is comparatively cheap, as with trig functions, he gets an > efficiency index of 2, but if not, he's stuck down at sqrt(2) with > secant.&nbsp; (Sometimes the derivative is harder to compute than the value, > e.g. with special functions or ratios of trig polynomials, in which case > Newton is even worse off.)
The trend at the moment is to avoid system trig functions and have 5th order polynomial expansions around a table of equispaced fixed sin, cos values. Right now the trick for speed is avoid divides where possible.
> On the other hand, if the derivative can be computed without nasty > significance loss due to cancellation, N. is likely to work better at > high precision.
We always have analytical derivatives available (although they have some unpleasant properties in the singular corner).
> I've nearly always used derivative-free methods, because I'm usually not > too concerned with ultimate accuracy, and the best ones can reach an > efficiency index of very nearly 2.0, which is pretty cool.&nbsp; (Besides, e > I'm too lazy to want to manage all of those derivatives.)
If I only have the function available numerically as a black box and no analytical derivative I tend to favour Golden ratio search for a maximum and then take a punt on computing the derivative numerically and seeing where NR actually lands. It either polishes the answer nicely or goes AWOL. A few extra function evaluations is usually neither here nor there. I did some work a long time ago on computing accurate numerical derivatives from equally spaced function evaluations. It still comes in handy sometimes (a bit of a lost art from the old days of manual computors sat at desks). What surprised me most was that a flaw like that could be lurking in almost every cubic solver (I bet NAGlib got it right) just waiting for the right set of unplayable data to come along. -- Martin Brown
Martin Brown <'''newspam'''@nonad.co.uk> wrote:
> On 11/09/2023 13:36, Phil Hobbs wrote: >> On 2023-09-11 04:05, Martin Brown wrote: > >>> Indeed. Halley was pretty clever but forever overshadowed by Newton. >>> >>> Some of the latest codes use Delta_4 and Delta_5 corrections >>> provided that the function is sufficiently differentiable. They have >>> to be computed very carefully with considerable attention to detail! >> >>> >> It's been quite awhile since I looked into the problem (and I was never >> close to your level of expertise there, Martin).&nbsp; However, this being >> Usenet, I am undeterred. ;) >> >> Cubic methods are great as long as computing the second derivative is >> cheap.&nbsp; BITD the "Illinois algorithm" and its relatives got a fair >> amount of attention (in select circles).&nbsp; They're regula-falsi based, >> but with hyperlinear convergence.&nbsp; The best that I know is a&nbsp; Bjorck & >> Anderson variant whose efficiency index (basically the average order of >> convergence per function evaluation) gets up to&nbsp; 8**(1/4) ~ 1.68. >> This compares with 1.0 for bisection or vanilla Regula Falsi, and >> sqrt(2) for the secant method. > > That's a blast from the past. It used to be popular in some of these > problems until a way was found to tame potential divergence (basically > by making sure that the starting guess is always within the range where > the sequence is absolutely convergent). Some analytical sleight of hand > is needed to get it spot on or there will be some insanely small but > still non-zero value where it goes crazy as eccentricity e -> 1. > > There has been a recent paper using bisection but it has various flaws - > not least that it determines an accurate answer to the wrong question.
Could be a problem.
> >> Secant is too squirrelly to use by itself, because of its tendency to >> hit a flat spot and jump straight to East Limbo, never to return. >> Newton has that problem too, so you have to find some finite interval >> where f_1 * f_2 < 0, and reject steps that fall outside it. > > One of the cute features of Kepler's problem E = M + e*sin(E) is that > one of the simplest naive upper bounds of E_0 = M+e is absolutely > convergent so if you clip on that you remain numerically stable.
Fun. Last time I used Kepler&rsquo;s equation was on a problem set in about 1980. I recall being too lazy to use Newton, so since it was a planetary orbit with epsilon <<1, I just rearranged terms and iterated on epsilon. (Prof Ovenden was obviously bored with grading Newtonian solutions, because he gave me an extra point.) ;) The best thing about it is the names&mdash;M is &ldquo;mean anomaly&rdquo; and E is &ldquo;eccentric anomaly&rdquo;.
> >> Besides that, there are two cases with Mr. Newton: if computing the >> derivative is comparatively cheap, as with trig functions, he gets an >> efficiency index of 2, but if not, he's stuck down at sqrt(2) with >> secant.&nbsp; (Sometimes the derivative is harder to compute than the value, >> e.g. with special functions or ratios of trig polynomials, in which case >> Newton is even worse off.) > > The trend at the moment is to avoid system trig functions and have 5th > order polynomial expansions around a table of equispaced fixed sin, cos > values. Right now the trick for speed is avoid divides where possible. > >> On the other hand, if the derivative can be computed without nasty >> significance loss due to cancellation, N. is likely to work better at >> high precision. > > We always have analytical derivatives available (although they have some > unpleasant properties in the singular corner). > >> I've nearly always used derivative-free methods, because I'm usually not >> too concerned with ultimate accuracy, and the best ones can reach an >> efficiency index of very nearly 2.0, which is pretty cool.&nbsp; (Besides, e >> I'm too lazy to want to manage all of those derivatives.) > > If I only have the function available numerically as a black box and no > analytical derivative I tend to favour Golden ratio search for a maximum > and then take a punt on computing the derivative numerically and seeing > where NR actually lands. It either polishes the answer nicely or goes > AWOL. A few extra function evaluations is usually neither here nor there. > > I did some work a long time ago on computing accurate numerical > derivatives from equally spaced function evaluations. It still comes in > handy sometimes (a bit of a lost art from the old days of manual > computors sat at desks).
Long ago I did one of those on an HP41C, along the way to an Adams-Bashforth-Moulton predictor-corrector ODE solver. (It&rsquo;s easy to remember which is which&mdash;&ldquo;Bashforth&rdquo; is an excellent name for an extrapolator.) ;)
> > What surprised me most was that a flaw like that could be lurking in > almost every cubic solver (I bet NAGlib got it right) just waiting for > the right set of unplayable data to come along.
Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics
On 11/09/2023 21:58, Joe Gwinn wrote:
> On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown > <'''newspam'''@nonad.co.uk> wrote: > >> On 08/09/2023 22:14, Joe Gwinn wrote: >>> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown >>> <'''newspam'''@nonad.co.uk> wrote: >>> >> >>>>> NR = Newton Raphson? >>>> >>>> Yes. Although these days my production code actually uses Halley's >>>> method in preference since it is almost the same speed to execute on >>>> modern pipeline CPUs and gives cubic rather than quadratic convergence. >>> >>> Interesting. I had not heard of Halley's Method, and cubic >>> convergence is worthwhile. >> >> Indeed. Halley was pretty clever but forever overshadowed by Newton. >> >> Some of the latest codes use Delta_4 and Delta_5 corrections provided >> that the function is sufficiently differentiable. They have to be >> computed very carefully with considerable attention to detail! > > And differentiation amplifies noise.
Symbolic differentiation doesn't. We have analytical expressions for the problem to solve and most of them are differentiable several times apart from where there are poles. Those regions need extra care.
>> One curiosity I have seen with them is that for me D_5 is anomalously >> fast despite containing D_4 ! On certain problems D5 is *faster* than NR >> (I suspect it is some sort of pipeline stall slowing the others down). > > In the old days, one would see a bunch of NOPs in the generated > assembly, but that won't work for this. I would code a test case in > assembly code and time it. Maybe by toggling a wire before and after, > and looking with a scope.
I tend to use RDTSC although it is deprecated for this sort of thing it works fairly well. You just have to be aware of its limitations. The ultimate way to do it is run a shim that lets you access the machine specific registers and directly measure pipeline stalls and other modes of failure. I haven't done that for this code (yet).
>> In the past they all got slower because there was a division for each >> step down the chain. I have a new computational scheme which avoids that >> (at an increased risk of under/over flow in some edge cases) > > The old dodge was to multiply by the pre computed reciprocal, if this > could be computed less often than every division.
These days you can pretty much rely on any decent compiler to precompute static constants into whatever form is best. The strength reduction rules work well. You can write x/3.0 confident in the knowledge that it will compile to x*(1.0/3) with the 1.0/3 computed at compile time. (although it is as well to check this on new compilers) Even if you do something like several expressions in a row /x most of the latest compilers will compute 1/x and then multiply it out. Trying to micromanage such things can actually make it worse since the compiler might store the intermediate value in memory if you code it explicitly.
>> The codes all run as expected on simple starting guesses but when the >> fancy cubic ones are used they show this peculiar speed variation. It is >> most noticeable with the Intel compiler (less so on MSC or GCC). >> >> The reason these high order methods are fashionable right now is that >> the iterative methods require extra function evaluations and for that >> you have to crystallise the next value of x1 and recompute f(x1),f'(x1) >> etc. Historically they would iterate 3-5 times to solve it. Having to >> wait for that new input value is an unavoidable hard pipeline stall. > > Use staggered parallel pipelines?
There is something about the odd numbered differences that makes them unusually fast.
>> But if you can get the initial guess with a relative error below >> 10^(-16/N) then in an ideal world the Nth order difference corrector >> gets the answer in a single step and so the next value can be computed. > > Yes.
It is only being done because of the requirement for speed and accuracy at the same time. Every cycle counts. Someone is even working on a dedicated hardware solution using a variant of CORDIC for this problem. https://arxiv.org/abs/2008.02894 They are getting quite close to having working hardware now.
>>>> I'm not convinced that it is worth the cost of a sqrt but with CPUs >>>> improving all the time it is now very close. The latest i5-12600 I have >>>> can do a sqrt in about the same time as it takes to do a divide! >>> >>> Wonder what algorithm it uses? >> >> My guess is some sort of cunning hardware shift, multiply and add. The >> latest generation of chips stand out in this respect. If only they would >> implement a hardware cube root then I would be very happy. > > That makes sense, as many have a barrel shifter in hardware. >> ISTR there being a library function MULDIV(a,b,c) for that at least on >> the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I >> think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's). > > Yes, it either didn't exist yet or didn't cover the specific case we > had.
I'm pretty sure it was in one of the locally written system libraries. I also had a routine to mask out certain harmless underflows in an astrophysics FLIC code. It was scaled in SI units which were something like h/c^2 which didn't leave a lot of room in 32bit floats. The original code spent most of its time in underflow recovery routines. I got an immediate 5x speed improvement with no loss of accuracy with just a few lines of assembler added to the project. Eventually the problem was rescaled to put things into the middle ground so that underflows were much less likely. -- Martin Brown
On Fri, 15 Sep 2023 10:55:34 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 11/09/2023 21:58, Joe Gwinn wrote: >> On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown >> <'''newspam'''@nonad.co.uk> wrote: >> >>> On 08/09/2023 22:14, Joe Gwinn wrote: >>>> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown >>>> <'''newspam'''@nonad.co.uk> wrote: >>>> >>> >>>>>> NR = Newton Raphson? >>>>> >>>>> Yes. Although these days my production code actually uses Halley's >>>>> method in preference since it is almost the same speed to execute on >>>>> modern pipeline CPUs and gives cubic rather than quadratic convergence. >>>> >>>> Interesting. I had not heard of Halley's Method, and cubic >>>> convergence is worthwhile. >>> >>> Indeed. Halley was pretty clever but forever overshadowed by Newton. >>> >>> Some of the latest codes use Delta_4 and Delta_5 corrections provided >>> that the function is sufficiently differentiable. They have to be >>> computed very carefully with considerable attention to detail! >> >> And differentiation amplifies noise. > >Symbolic differentiation doesn't. We have analytical expressions for the >problem to solve and most of them are differentiable several times apart >from where there are poles. Those regions need extra care. > >>> One curiosity I have seen with them is that for me D_5 is anomalously >>> fast despite containing D_4 ! On certain problems D5 is *faster* than NR >>> (I suspect it is some sort of pipeline stall slowing the others down). >> >> In the old days, one would see a bunch of NOPs in the generated >> assembly, but that won't work for this. I would code a test case in >> assembly code and time it. Maybe by toggling a wire before and after, >> and looking with a scope. > >I tend to use RDTSC although it is deprecated for this sort of thing it >works fairly well. You just have to be aware of its limitations. The >ultimate way to do it is run a shim that lets you access the machine >specific registers and directly measure pipeline stalls and other modes >of failure. I haven't done that for this code (yet).
Well, the old EE's approach always involves an oscilloscope and an electrical signal from the device under test, simple and physical, so it cannot so easily lie. Well, before the advent of digital scopes, but still...
> >>> In the past they all got slower because there was a division for each >>> step down the chain. I have a new computational scheme which avoids that >>> (at an increased risk of under/over flow in some edge cases) >> >> The old dodge was to multiply by the pre computed reciprocal, if this >> could be computed less often than every division. > >These days you can pretty much rely on any decent compiler to precompute >static constants into whatever form is best. The strength reduction >rules work well. You can write x/3.0 confident in the knowledge that it >will compile to x*(1.0/3) with the 1.0/3 computed at compile time. >(although it is as well to check this on new compilers) > >Even if you do something like several expressions in a row /x most of >the latest compilers will compute 1/x and then multiply it out. Trying >to micromanage such things can actually make it worse since the compiler >might store the intermediate value in memory if you code it explicitly.
Yes.
>>> The codes all run as expected on simple starting guesses but when the >>> fancy cubic ones are used they show this peculiar speed variation. It is >>> most noticeable with the Intel compiler (less so on MSC or GCC). >>> >>> The reason these high order methods are fashionable right now is that >>> the iterative methods require extra function evaluations and for that >>> you have to crystallise the next value of x1 and recompute f(x1),f'(x1) >>> etc. Historically they would iterate 3-5 times to solve it. Having to >>> wait for that new input value is an unavoidable hard pipeline stall. >> >> Use staggered parallel pipelines? > >There is something about the odd numbered differences that makes them >unusually fast.
I'd be looking at hardware details like how this all maps onto physical memory, especially cache lines and the like.
>>> But if you can get the initial guess with a relative error below >>> 10^(-16/N) then in an ideal world the Nth order difference corrector >>> gets the answer in a single step and so the next value can be computed. >> >> Yes. > >It is only being done because of the requirement for speed and accuracy >at the same time. Every cycle counts. Someone is even working on a >dedicated hardware solution using a variant of CORDIC for this problem. > ><https://arxiv.org/abs/2008.02894> > >They are getting quite close to having working hardware now.
I've seen some CORDIC variations, typically for ellipses, but not this one.
>>>>> I'm not convinced that it is worth the cost of a sqrt but with CPUs >>>>> improving all the time it is now very close. The latest i5-12600 I have >>>>> can do a sqrt in about the same time as it takes to do a divide! >>>> >>>> Wonder what algorithm it uses? >>> >>> My guess is some sort of cunning hardware shift, multiply and add. The >>> latest generation of chips stand out in this respect. If only they would >>> implement a hardware cube root then I would be very happy. >> >> That makes sense, as many have a barrel shifter in hardware. >>> ISTR there being a library function MULDIV(a,b,c) for that at least on >>> the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I >>> think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's). >> >> Yes, it either didn't exist yet or didn't cover the specific case we >> had. > >I'm pretty sure it was in one of the locally written system libraries. I >also had a routine to mask out certain harmless underflows in an >astrophysics FLIC code. It was scaled in SI units which were something >like h/c^2 which didn't leave a lot of room in 32bit floats.
Sounds like Planck Scale?
>The original code spent most of its time in underflow recovery routines. >I got an immediate 5x speed improvement with no loss of accuracy with >just a few lines of assembler added to the project. >
Been there done that. One standard tool is a sampler tied to a hardware timer interrupt programmed for random intervals between interrupts, collecting PC addresses. The "random interval" must at least be mutually prime with respect to all periodic processes in the system under test. The histogram of PC addresses shows where the program was spending its time, often in a place nobody ever heard of, usually deep in some obscure part of the runtime system. Like underflow handling.
>Eventually the problem was rescaled to put things into the middle ground >so that underflows were much less likely.
What I've seen used a lot is straight SI units with double precision (64-bit) IEEE floating point arithmetic, with all conversions to and from legacy units performed only at the interfaces. Joe Gwinn