sci.electronics.design | Tracking bug report frequency| page 4

Reply by Joe Gwinn ●September 7, 20232023-09-07

On Thu, 7 Sep 2023 17:31:42 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 06/09/2023 19:20, Joe Gwinn wrote:
>> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
>>> The example is topologically equivalent to real* code you merely have to
>>> construct input data that will force execution down each binary choice
>>> in turn at every level. Getting the absolute minimum number of test
>>> vectors for full coverage is a much harder problem but a good enough
>>> solution is possible in most practical cases.
>> 
>> In practice, this is certainly pretty effective, but the proposed
>> requirement did not allow for such shortcuts, rendering the
>> requirement intractable - the Sun will blow up first.
>> 
>> Also, in practice we do a combination of random probing and fuzzing.
>> 
>> .<https://en.wikipedia.org/wiki/Fuzzing>
>
>One tactic I have adopted for testing numerical code is very similar. 
>Basically a biased random number generator which creates test data for 
>the routines under test and then verifies the answers.

What do you mean by biased?

>It is extra work to do both a solver and a verifier but not that much 
>and the verification of such provides a basis for regression testing.
>
>Most software the computation being done may be very difficult but the 
>inverse is often relatively easy by comparison. Finding all real roots 
>of a function f(x)=0 to maximum accuracy is quite tricky but given a 
>supposed root x0 then computing the value of the equation f(x0) and its 
>derivative f'(x0) is easy. Then you can use NR to see if the correction 
>is acceptably small enough, if not rinse and repeat.

I often do that.  Another trick is that many problems can be solved
two different ways, so I solve twice and compare.

One can also check that invariant's are respected.

NR = Newton Raphson? 

And then there is co-plotting and staring - the Mark I eyeball is a
very good pattern and anomaly detector, especially of concentricity or
linearity.

>I found a new bug in a cubic solver that is as robust as any on the 
>planet quite recently. It required a very specific near exact 
>combination of 3 64 bit parameters to create a catastrophic numeric 
>cancellation down a seldom trodden path where the cubic equation has 
>three real roots and you want the one that it can't compute accurately.
>
>Most of these problems we try very hard to only have one real root...
>
>My initial reaction was that it was tested library code so it must be my 
>problem - until I traced into it and saw how it failed. It gives 8 
>instead of 16 sig fig in double precision for these pathological data.

That's a good one.

A sort-of parallel of my experience was in the late 1990s when looking
for failure patterns in how a network was composed and connected, and
were looking for test cases that could be used to debut the
fault-tolerance algorithm.  We used a form of Monte Carlo test, where
we described the network contents and topology using a random number,
and then tried the FT algorithm on it.  Only if the algorithm failed
(anything other than graceful behavior) did we record the random
number, and then proceed, looking.  Letting the random poking continue
day and night for a few days sufficiently reduced the algorithm
failure rate.

Key to this was to write the algorithm and test harness in plain C
code, not using any commercial statistical test application - these
were large interpreters, and were orders of magnitude too slow.  Not
to mention expensive and hard to use.  Sometimes, it's best to
reinvent the wheel.

Now days, I'd probably use some kind of genetic programming to
generate test cases.  But totally random had the advantage of making
no assumptions, and so caught hazards that nobody could guess or
suspect.

Joe Gwinn

Reply by Martin Brown ●September 8, 20232023-09-08

On 07/09/2023 23:20, Joe Gwinn wrote:
> On Thu, 7 Sep 2023 17:31:42 +0100, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:
> 
>> On 06/09/2023 19:20, Joe Gwinn wrote:
>>> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown
>>> <'''newspam'''@nonad.co.uk> wrote:
>>>
>>>> The example is topologically equivalent to real* code you merely have to
>>>> construct input data that will force execution down each binary choice
>>>> in turn at every level. Getting the absolute minimum number of test
>>>> vectors for full coverage is a much harder problem but a good enough
>>>> solution is possible in most practical cases.
>>>
>>> In practice, this is certainly pretty effective, but the proposed
>>> requirement did not allow for such shortcuts, rendering the
>>> requirement intractable - the Sun will blow up first.
>>>
>>> Also, in practice we do a combination of random probing and fuzzing.
>>>
>>> .<https://en.wikipedia.org/wiki/Fuzzing>
>>
>> One tactic I have adopted for testing numerical code is very similar.
>> Basically a biased random number generator which creates test data for
>> the routines under test and then verifies the answers.
> 
> What do you mean by biased?

Instead of a linear random number on 0-1 it generates numbers that are 
focussed on regions that may cause trouble for the algorithm under test.

Usually but not always in the vicinity of zero or very large values. A 
few bits of the original seed are used to determine how to convert it.

Typical mappings from x a double precision uniform on 0-1 are
x
x^2
1-x^2
x^4
x^8
x^16

with some scale factor K multiplying it up to beyond the range that the 
parameter is supposed to take. Or more generally but a bit sparse:

x/(1+eps-x)
(x-1/2)/(1+eps-x)

(sacrifices 3 bits of the original 64 bit seed)

>> It is extra work to do both a solver and a verifier but not that much
>> and the verification of such provides a basis for regression testing.
>>
>> Most software the computation being done may be very difficult but the
>> inverse is often relatively easy by comparison. Finding all real roots
>> of a function f(x)=0 to maximum accuracy is quite tricky but given a
>> supposed root x0 then computing the value of the equation f(x0) and its
>> derivative f'(x0) is easy. Then you can use NR to see if the correction
>> is acceptably small enough, if not rinse and repeat.
> 
> I often do that.  Another trick is that many problems can be solved
> two different ways, so I solve twice and compare.
> 
> One can also check that invariant's are respected.

Invariants are invaluable in critical code you get to find out sooner 
rather than later if one of your assumptions is invalid, something has 
gone wrong or the routine has been fed garbage inputs.

> NR = Newton Raphson?

Yes. Although these days my production code actually uses Halley's 
method in preference since it is almost the same speed to execute on 
modern pipeline CPUs and gives cubic rather than quadratic convergence.

Some practitioners prefer what they call modified NR which is a sort of 
hybrid of NR with Halley with slightly better convergence than Halley. 
However it involves a sqrt and isn't always better.

Equation (4) in this paper (which includes the derivation) - method 
originally due to Laguerre but now seeing a revival in some codes.

https://academic.oup.com/mnras/article/467/2/1702/2929272

The gotcha is that although as written with abs() it never actually 
fails - a sufficiently bad guess will still give poor answers, but is 
much less likely to explode or oscillate in the way that NR does.

I'm not convinced that it is worth the cost of a sqrt but with CPUs 
improving all the time it is now very close. The latest i5-12600 I have 
can do a sqrt in about the same time as it takes to do a divide!

> And then there is co-plotting and staring - the Mark I eyeball is a
> very good pattern and anomaly detector, especially of concentricity or
> linearity.

Looking at the plot of residual errors can be very enlightening. So can 
looking at the histogram for many test cases of the error distribution 
of solve, test and then measure error in the resulting answer.

It is remarkably easy to home in on any weak spots and edge cases. My 
normal reporting rule is printout exceptions whenever the worst case 
high or low error is exceeded. Verbose mode shows when it is equalled.

>> I found a new bug in a cubic solver that is as robust as any on the
>> planet quite recently. It required a very specific near exact
>> combination of 3 64 bit parameters to create a catastrophic numeric
>> cancellation down a seldom trodden path where the cubic equation has
>> three real roots and you want the one that it can't compute accurately.
>>
>> Most of these problems we try very hard to only have one real root...
>>
>> My initial reaction was that it was tested library code so it must be my
>> problem - until I traced into it and saw how it failed. It gives 8
>> instead of 16 sig fig in double precision for these pathological data.
> 
> That's a good one.

I was a bit surprised. It required an astonishingly exact combination of 
factors for it to trigger and one of my real problems hit it square on.

-- 
Martin Brown

Reply by Joe Gwinn ●September 8, 20232023-09-08

On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 07/09/2023 23:20, Joe Gwinn wrote:
>> On Thu, 7 Sep 2023 17:31:42 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
>>> On 06/09/2023 19:20, Joe Gwinn wrote:
>>>> On Wed, 6 Sep 2023 09:49:48 +0100, Martin Brown
>>>> <'''newspam'''@nonad.co.uk> wrote:
>>>>
>>>>> The example is topologically equivalent to real* code you merely have to
>>>>> construct input data that will force execution down each binary choice
>>>>> in turn at every level. Getting the absolute minimum number of test
>>>>> vectors for full coverage is a much harder problem but a good enough
>>>>> solution is possible in most practical cases.
>>>>
>>>> In practice, this is certainly pretty effective, but the proposed
>>>> requirement did not allow for such shortcuts, rendering the
>>>> requirement intractable - the Sun will blow up first.
>>>>
>>>> Also, in practice we do a combination of random probing and fuzzing.
>>>>
>>>> .<https://en.wikipedia.org/wiki/Fuzzing>
>>>
>>> One tactic I have adopted for testing numerical code is very similar.
>>> Basically a biased random number generator which creates test data for
>>> the routines under test and then verifies the answers.
>> 
>> What do you mean by biased?
>
>Instead of a linear random number on 0-1 it generates numbers that are 
>focussed on regions that may cause trouble for the algorithm under test.
>
>Usually but not always in the vicinity of zero or very large values. A 
>few bits of the original seed are used to determine how to convert it.
>
>Typical mappings from x a double precision uniform on 0-1 are
>x
>x^2
>1-x^2
>x^4
>x^8
>x^16
>
>with some scale factor K multiplying it up to beyond the range that the 
>parameter is supposed to take. Or more generally but a bit sparse:

This reminds me of how one tests memory hardware.


>x/(1+eps-x)
>(x-1/2)/(1+eps-x)

I'd be tempted to fuzz the eps parameter.


>(sacrifices 3 bits of the original 64 bit seed)
>
>>> It is extra work to do both a solver and a verifier but not that much
>>> and the verification of such provides a basis for regression testing.
>>>
>>> Most software the computation being done may be very difficult but the
>>> inverse is often relatively easy by comparison. Finding all real roots
>>> of a function f(x)=0 to maximum accuracy is quite tricky but given a
>>> supposed root x0 then computing the value of the equation f(x0) and its
>>> derivative f'(x0) is easy. Then you can use NR to see if the correction
>>> is acceptably small enough, if not rinse and repeat.
>> 
>> I often do that.  Another trick is that many problems can be solved
>> two different ways, so I solve twice and compare.
>> 
>> One can also check that invariant's are respected.
>
>Invariants are invaluable in critical code you get to find out sooner 
>rather than later if one of your assumptions is invalid, something has 
>gone wrong or the routine has been fed garbage inputs.

Oh yes.  I also code invariants into in-line assertions, for runtime
detection.


>> NR = Newton Raphson?
>
>Yes. Although these days my production code actually uses Halley's 
>method in preference since it is almost the same speed to execute on 
>modern pipeline CPUs and gives cubic rather than quadratic convergence.

Interesting.  I had not heard of Halley's Method, and cubic
convergence is worthwhile.


>Some practitioners prefer what they call modified NR which is a sort of 
>hybrid of NR with Halley with slightly better convergence than Halley. 
>However it involves a sqrt and isn't always better.
>
>Equation (4) in this paper (which includes the derivation) - method 
>originally due to Laguerre but now seeing a revival in some codes.
>
><https://academic.oup.com/mnras/article/467/2/1702/2929272>
>
>The gotcha is that although as written with abs() it never actually 
>fails - a sufficiently bad guess will still give poor answers, but is 
>much less likely to explode or oscillate in the way that NR does.

So NR is better because it always warns?

>
>I'm not convinced that it is worth the cost of a sqrt but with CPUs 
>improving all the time it is now very close. The latest i5-12600 I have 
>can do a sqrt in about the same time as it takes to do a divide!

Wonder what algorithm it uses?


>> And then there is co-plotting and staring - the Mark I eyeball is a
>> very good pattern and anomaly detector, especially of concentricity or
>> linearity.
>
>Looking at the plot of residual errors can be very enlightening. So can 
>looking at the histogram for many test cases of the error distribution 
>of solve, test and then measure error in the resulting answer.
>
>It is remarkably easy to home in on any weak spots and edge cases. My 
>normal reporting rule is printout exceptions whenever the worst case 
>high or low error is exceeded. Verbose mode shows when it is equalled.

Yes.  But I always make plots and stare at them.

Although I have found it useful to program a distinctive chord that
sounds if for instance two measurement datafiles are identical, or a
file was not found, so I don't have to stare at the screen when the
code is grinding away.


>>> I found a new bug in a cubic solver that is as robust as any on the
>>> planet quite recently. It required a very specific near exact
>>> combination of 3 64 bit parameters to create a catastrophic numeric
>>> cancellation down a seldom trodden path where the cubic equation has
>>> three real roots and you want the one that it can't compute accurately.
>>>
>>> Most of these problems we try very hard to only have one real root...
>>>
>>> My initial reaction was that it was tested library code so it must be my
>>> problem - until I traced into it and saw how it failed. It gives 8
>>> instead of 16 sig fig in double precision for these pathological data.
>> 
>> That's a good one.
>
>I was a bit surprised. It required an astonishingly exact combination of 
>factors for it to trigger and one of my real problems hit it square on.

So biased probing paid off here?  Or only for diagnosis?

In the 1980s I hit a similar problem, but it wasn't hard to find.  In
scaled integer arithmetic (no floating point then), it's very common
to multiply two 32-bit integers together, yielding a 64-bit product,
and then divide that by a 64-bit scaling factor, yielding a 32-bit
quotient.  In the Fortran of the day, if one just wrote this as
(a*b)/c, the compiler would truncate the product (a*b) to 32 bits
before dividing by c, yielding a wildly wrong answer, without any
warnings or drama.  The solution was define a 64-bit intermediate
variable, and do the operation in two lines:  p=a*b, result = p/c.

Joe Gwinn

Reply by Martin Brown ●September 11, 20232023-09-11

On 08/09/2023 22:14, Joe Gwinn wrote:
> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:
> 

>>> NR = Newton Raphson?
>>
>> Yes. Although these days my production code actually uses Halley's
>> method in preference since it is almost the same speed to execute on
>> modern pipeline CPUs and gives cubic rather than quadratic convergence.
> 
> Interesting.  I had not heard of Halley's Method, and cubic
> convergence is worthwhile.

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections provided 
that the function is sufficiently differentiable. They have to be 
computed very carefully with considerable attention to detail!

One curiosity I have seen with them is that for me D_5 is anomalously 
fast despite containing D_4 ! On certain problems D5 is *faster* than NR 
(I suspect it is some sort of pipeline stall slowing the others down).

In the past they all got slower because there was a division for each 
step down the chain. I have a new computational scheme which avoids that 
(at an increased risk of under/over flow in some edge cases)

The codes all run as expected on simple starting guesses but when the 
fancy cubic ones are used they show this peculiar speed variation. It is 
most noticeable with the Intel compiler (less so on MSC or GCC).

The reason these high order methods are fashionable right now is that 
the iterative methods require extra function evaluations and for that 
you have to crystallise the next value of x1 and recompute f(x1),f'(x1) 
etc. Historically they would iterate 3-5 times to solve it. Having to 
wait for that new input value is an unavoidable hard pipeline stall.

But if you can get the initial guess with a relative error below 
10^(-16/N) then in an ideal world the Nth order difference corrector 
gets the answer in a single step and so the next value can be computed.

>> Some practitioners prefer what they call modified NR which is a sort of
>> hybrid of NR with Halley with slightly better convergence than Halley.
>> However it involves a sqrt and isn't always better.
>>
>> Equation (4) in this paper (which includes the derivation) - method
>> originally due to Laguerre but now seeing a revival in some codes.
>>
>> <https://academic.oup.com/mnras/article/467/2/1702/2929272>
>>
>> The gotcha is that although as written with abs() it never actually
>> fails - a sufficiently bad guess will still give poor answers, but is
>> much less likely to explode or oscillate in the way that NR does.
> 
> So NR is better because it always warns?

No. NR can go completely AWOL far too easily and diverge to infinity or 
oscillate chaotically between a set of values. Halley's method is the 
best of the three for stability, but MNR will get you slightly better 
precision if you can be sure your guesses are always close enough.

>> I'm not convinced that it is worth the cost of a sqrt but with CPUs
>> improving all the time it is now very close. The latest i5-12600 I have
>> can do a sqrt in about the same time as it takes to do a divide!
> 
> Wonder what algorithm it uses?

My guess is some sort of cunning hardware shift, multiply and add. The 
latest generation of chips stand out in this respect. If only they would 
implement a hardware cube root then I would be very happy.

>>> And then there is co-plotting and staring - the Mark I eyeball is a
>>> very good pattern and anomaly detector, especially of concentricity or
>>> linearity.
>>
>> Looking at the plot of residual errors can be very enlightening. So can
>> looking at the histogram for many test cases of the error distribution
>> of solve, test and then measure error in the resulting answer.
>>
>> It is remarkably easy to home in on any weak spots and edge cases. My
>> normal reporting rule is printout exceptions whenever the worst case
>> high or low error is exceeded. Verbose mode shows when it is equalled.
> 
> Yes.  But I always make plots and stare at them.

I only do that if something catches my attention as potentially odd.

>>>> My initial reaction was that it was tested library code so it must be my
>>>> problem - until I traced into it and saw how it failed. It gives 8
>>>> instead of 16 sig fig in double precision for these pathological data.
>>>
>>> That's a good one.
>>
>> I was a bit surprised. It required an astonishingly exact combination of
>> factors for it to trigger and one of my real problems hit it square on.
> 
> So biased probing paid off here?  Or only for diagnosis?

It confirmed that given the right awkward parameters the solver would 
fail but also that it was a a very low risk event. Taking an overnight 
run just to find a single example using random data.

I just happened to be unlucky in my exact choice of problem. Expansion 
around pi/4 was absolutely fine but the expansion around 3pi/4 went 
completely haywire even though it differed only in a handful of sign 
changes to coefficients. I was utterly convinced it was my user error.
> 
> In the 1980s I hit a similar problem, but it wasn't hard to find.  In
> scaled integer arithmetic (no floating point then), it's very common
> to multiply two 32-bit integers together, yielding a 64-bit product,
> and then divide that by a 64-bit scaling factor, yielding a 32-bit
> quotient.  In the Fortran of the day, if one just wrote this as
> (a*b)/c, the compiler would truncate the product (a*b) to 32 bits
> before dividing by c, yielding a wildly wrong answer, without any
> warnings or drama.  The solution was define a 64-bit intermediate
> variable, and do the operation in two lines:  p=a*b, result = p/c.

ISTR there being a library function MULDIV(a,b,c) for that at least on 
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I 
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).

-- 
Martin Brown

Reply by Phil Hobbs ●September 11, 20232023-09-11

On 2023-09-11 04:05, Martin Brown wrote:
> On 08/09/2023 22:14, Joe Gwinn wrote:
>> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown 
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
> 
>>>> NR = Newton Raphson?
>>> 
>>> Yes. Although these days my production code actually uses 
>>> Halley's method in preference since it is almost the same speed 
>>> to execute on modern pipeline CPUs and gives cubic rather than 
>>> quadratic convergence.
>> 
>> Interesting.  I had not heard of Halley's Method, and cubic 
>> convergence is worthwhile.
> 
> Indeed. Halley was pretty clever but forever overshadowed by Newton.
> 
> Some of the latest codes use Delta_4 and Delta_5 corrections
> provided that the function is sufficiently differentiable. They have
> to be computed very carefully with considerable attention to detail!

> 
> One curiosity I have seen with them is that for me D_5 is
> anomalously fast despite containing D_4 ! On certain problems D5 is
> *faster* than NR (I suspect it is some sort of pipeline stall slowing
> the others down).
> 
> In the past they all got slower because there was a division for
> each step down the chain. I have a new computational scheme which
> avoids that (at an increased risk of under/over flow in some edge
> cases)
> 
> The codes all run as expected on simple starting guesses but when
> the fancy cubic ones are used they show this peculiar speed
> variation. It is most noticeable with the Intel compiler (less so on
> MSC or GCC).
> 
> The reason these high order methods are fashionable right now is
> that the iterative methods require extra function evaluations and for
> that you have to crystallise the next value of x1 and recompute 
> f(x1),f'(x1) etc. Historically they would iterate 3-5 times to solve 
> it. Having to wait for that new input value is an unavoidable hard 
> pipeline stall.
> 
> But if you can get the initial guess with a relative error below 
> 10^(-16/N) then in an ideal world the Nth order difference corrector 
> gets the answer in a single step and so the next value can be 
> computed.
> 
>>> Some practitioners prefer what they call modified NR which is a 
>>> sort of hybrid of NR with Halley with slightly better
>>> convergence than Halley. However it involves a sqrt and isn't
>>> always better.
>>> 
>>> Equation (4) in this paper (which includes the derivation) - 
>>> method originally due to Laguerre but now seeing a revival in 
>>> some codes.
>>> 
>>> <https://academic.oup.com/mnras/article/467/2/1702/2929272>
>>> 
>>> The gotcha is that although as written with abs() it never 
>>> actually fails - a sufficiently bad guess will still give poor 
>>> answers, but is much less likely to explode or oscillate in the 
>>> way that NR does.
>> 
>> So NR is better because it always warns?
> 
> No. NR can go completely AWOL far too easily and diverge to infinity 
> or oscillate chaotically between a set of values. Halley's method is 
> the best of the three for stability, but MNR will get you slightly 
> better precision if you can be sure your guesses are always close 
> enough.
> 
>>> I'm not convinced that it is worth the cost of a sqrt but with 
>>> CPUs improving all the time it is now very close. The latest 
>>> i5-12600 I have can do a sqrt in about the same time as it takes 
>>> to do a divide!
>> 
>> Wonder what algorithm it uses?
> 
> My guess is some sort of cunning hardware shift, multiply and add. 
> The latest generation of chips stand out in this respect. If only 
> they would implement a hardware cube root then I would be very 
> happy.
> 
>>>> And then there is co-plotting and staring - the Mark I eyeball 
>>>> is a very good pattern and anomaly detector, especially of 
>>>> concentricity or linearity.
>>> 
>>> Looking at the plot of residual errors can be very enlightening. 
>>> So can looking at the histogram for many test cases of the error 
>>> distribution of solve, test and then measure error in the 
>>> resulting answer.
>>> 
>>> It is remarkably easy to home in on any weak spots and edge 
>>> cases. My normal reporting rule is printout exceptions whenever 
>>> the worst case high or low error is exceeded. Verbose mode shows 
>>> when it is equalled.
>> 
>> Yes.  But I always make plots and stare at them.
> 
> I only do that if something catches my attention as potentially odd.
> 

> 
It's been quite awhile since I looked into the problem (and I was never
close to your level of expertise there, Martin).  However, this being
Usenet, I am undeterred. ;)

Cubic methods are great as long as computing the second derivative is
cheap.  BITD the "Illinois algorithm" and its relatives got a fair
amount of attention (in select circles).  They're regula-falsi based,
but with hyperlinear convergence.  The best that I know is a  Bjorck &
Anderson variant whose efficiency index (basically the average order of
convergence per function evaluation) gets up to  8**(1/4) ~ 1.68.
This compares with 1.0 for bisection or vanilla Regula Falsi, and
sqrt(2) for the secant method.

Secant is too squirrelly to use by itself, because of its tendency to
hit a flat spot and jump straight to East Limbo, never to return.
Newton has that problem too, so you have to find some finite interval
where f_1 * f_2 < 0, and reject steps that fall outside it.

Besides that, there are two cases with Mr. Newton: if computing the
derivative is comparatively cheap, as with trig functions, he gets an 
efficiency index of 2, but if not, he's stuck down at sqrt(2) with 
secant.  (Sometimes the derivative is harder to compute than the value, 
e.g. with special functions or ratios of trig polynomials, in which case 
Newton is even worse off.)

On the other hand, if the derivative can be computed without nasty
significance loss due to cancellation, N. is likely to work better at
high precision.

I've nearly always used derivative-free methods, because I'm usually not
too concerned with ultimate accuracy, and the best ones can reach an
efficiency index of very nearly 2.0, which is pretty cool.  (Besides, e 
I'm too lazy to want to manage all of those derivatives.)

>>>>> My initial reaction was that it was tested library code so
>>>>> it must be my problem - until I traced into it and saw how
>>>>> it failed. It gives 8 instead of 16 sig fig in double
>>>>> precision for these pathological data.
>>>> 
>>>> That's a good one.
>>> 
>>> I was a bit surprised. It required an astonishingly exact 
>>> combination of factors for it to trigger and one of my real 
>>> problems hit it square on.
>> 
>> So biased probing paid off here?  Or only for diagnosis?
> 
> It confirmed that given the right awkward parameters the solver would
> fail but also that it was a a very low risk event. Taking an 
> overnight run just to find a single example using random data.
> 
> I just happened to be unlucky in my exact choice of problem. 
> Expansion around pi/4 was absolutely fine but the expansion around 
> 3pi/4 went completely haywire even though it differed only in a 
> handful of sign changes to coefficients. I was utterly convinced it 
> was my user error.
>> 
>> In the 1980s I hit a similar problem, but it wasn't hard to find. 
>> In scaled integer arithmetic (no floating point then), it's very 
>> common to multiply two 32-bit integers together, yielding a 64-bit 
>> product, and then divide that by a 64-bit scaling factor, yielding 
>> a 32-bit quotient.  In the Fortran of the day, if one just wrote 
>> this as (a*b)/c, the compiler would truncate the product (a*b) to 
>> 32 bits before dividing by c, yielding a wildly wrong answer, 
>> without any warnings or drama.  The solution was define a 64-bit 
>> intermediate variable, and do the operation in two lines:  p=a*b, 
>> result = p/c.
> 
> ISTR there being a library function MULDIV(a,b,c) for that at least 
> on the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 
> but I think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 
> 70's).

Cheers

Phil Hobbs


-- 
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com

Reply by Joe Gwinn ●September 11, 20232023-09-11

On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 08/09/2023 22:14, Joe Gwinn wrote:
>> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
>
>>>> NR = Newton Raphson?
>>>
>>> Yes. Although these days my production code actually uses Halley's
>>> method in preference since it is almost the same speed to execute on
>>> modern pipeline CPUs and gives cubic rather than quadratic convergence.
>> 
>> Interesting.  I had not heard of Halley's Method, and cubic
>> convergence is worthwhile.
>
>Indeed. Halley was pretty clever but forever overshadowed by Newton.
>
>Some of the latest codes use Delta_4 and Delta_5 corrections provided 
>that the function is sufficiently differentiable. They have to be 
>computed very carefully with considerable attention to detail!

And differentiation amplifies noise.


>One curiosity I have seen with them is that for me D_5 is anomalously 
>fast despite containing D_4 ! On certain problems D5 is *faster* than NR 
>(I suspect it is some sort of pipeline stall slowing the others down).

In the old days, one would see a bunch of NOPs in the generated
assembly, but that won't work for this.  I would code a test case in
assembly code and time it.  Maybe by toggling a wire before and after,
and looking with a scope.


>In the past they all got slower because there was a division for each 
>step down the chain. I have a new computational scheme which avoids that 
>(at an increased risk of under/over flow in some edge cases)

The old dodge was to multiply by the pre computed reciprocal, if this
could be computed less often than every division.


>The codes all run as expected on simple starting guesses but when the 
>fancy cubic ones are used they show this peculiar speed variation. It is 
>most noticeable with the Intel compiler (less so on MSC or GCC).
>
>The reason these high order methods are fashionable right now is that 
>the iterative methods require extra function evaluations and for that 
>you have to crystallise the next value of x1 and recompute f(x1),f'(x1) 
>etc. Historically they would iterate 3-5 times to solve it. Having to 
>wait for that new input value is an unavoidable hard pipeline stall.
 
Use staggered parallel pipelines?


>But if you can get the initial guess with a relative error below 
>10^(-16/N) then in an ideal world the Nth order difference corrector 
>gets the answer in a single step and so the next value can be computed.

Yes.

I used to do this level of optimizing, but in operating-system
kernels.  High-level interrupt routines warranted personal
instruction-by-instruction hand optimization, with a drawing for each
and every step.  It might have been 100 lines of assembly code.
Multiplied by the interrupt rate and overhead of this or that bit of
hardware.   Eliminating even one clock cycle was worthwhile.


>>> Some practitioners prefer what they call modified NR which is a sort of
>>> hybrid of NR with Halley with slightly better convergence than Halley.
>>> However it involves a sqrt and isn't always better.
>>>
>>> Equation (4) in this paper (which includes the derivation) - method
>>> originally due to Laguerre but now seeing a revival in some codes.
>>>
>>> <https://academic.oup.com/mnras/article/467/2/1702/2929272>
>>>
>>> The gotcha is that although as written with abs() it never actually
>>> fails - a sufficiently bad guess will still give poor answers, but is
>>> much less likely to explode or oscillate in the way that NR does.
>> 
>> So NR is better because it always warns?
>
>No. NR can go completely AWOL far too easily and diverge to infinity or 
>oscillate chaotically between a set of values. Halley's method is the 
>best of the three for stability, but MNR will get you slightly better 
>precision if you can be sure your guesses are always close enough.
>
>>> I'm not convinced that it is worth the cost of a sqrt but with CPUs
>>> improving all the time it is now very close. The latest i5-12600 I have
>>> can do a sqrt in about the same time as it takes to do a divide!
>> 
>> Wonder what algorithm it uses?
>
>My guess is some sort of cunning hardware shift, multiply and add. The 
>latest generation of chips stand out in this respect.  If only they would 
>implement a hardware cube root then I would be very happy.

That makes sense, as many have a barrel shifter in hardware.


>
>>>> And then there is co-plotting and staring - the Mark I eyeball is a
>>>> very good pattern and anomaly detector, especially of concentricity or
>>>> linearity.
>>>
>>> Looking at the plot of residual errors can be very enlightening. So can
>>> looking at the histogram for many test cases of the error distribution
>>> of solve, test and then measure error in the resulting answer.
>>>
>>> It is remarkably easy to home in on any weak spots and edge cases. My
>>> normal reporting rule is printout exceptions whenever the worst case
>>> high or low error is exceeded. Verbose mode shows when it is equalled.
>> 
>> Yes.  But I always make plots and stare at them.
>
>I only do that if something catches my attention as potentially odd.

I always do it, precisely to improve my chances of noticing something
odd.


>>>>> My initial reaction was that it was tested library code so it must be my
>>>>> problem - until I traced into it and saw how it failed. It gives 8
>>>>> instead of 16 sig fig in double precision for these pathological data.
>>>>
>>>> That's a good one.
>>>
>>> I was a bit surprised. It required an astonishingly exact combination of
>>> factors for it to trigger and one of my real problems hit it square on.
>> 
>> So biased probing paid off here?  Or only for diagnosis?
>
>It confirmed that given the right awkward parameters the solver would 
>fail but also that it was a a very low risk event. Taking an overnight 
>run just to find a single example using random data.
>
>I just happened to be unlucky in my exact choice of problem. Expansion 
>around pi/4 was absolutely fine but the expansion around 3pi/4 went 
>completely haywire even though it differed only in a handful of sign 
>changes to coefficients. I was utterly convinced it was my user error.
>> 
>> In the 1980s I hit a similar problem, but it wasn't hard to find.  In
>> scaled integer arithmetic (no floating point then), it's very common
>> to multiply two 32-bit integers together, yielding a 64-bit product,
>> and then divide that by a 64-bit scaling factor, yielding a 32-bit
>> quotient.  In the Fortran of the day, if one just wrote this as
>> (a*b)/c, the compiler would truncate the product (a*b) to 32 bits
>> before dividing by c, yielding a wildly wrong answer, without any
>> warnings or drama.  The solution was define a 64-bit intermediate
>> variable, and do the operation in two lines:  p=a*b, result = p/c.
>
>ISTR there being a library function MULDIV(a,b,c) for that at least on 
>the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I 
>think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).

Yes, it either didn't exist yet or didn't cover the specific case we
had.

A parallel story from the late 1970s was while building a C-141
maintenance trainer (mockup cockpit animated by a midi-computer of the
day) in Fortran, and we were forced to&#4294967295;implement partly in assembler,
specifically a collection of assembly-coded Fortran functions, to
implement 32-bit bitwise logic functions.  When I told the application
programmers that they would be writing assembly, there was much
wailing and rending of garments.  But I provided a template and
annotated example, and they soon got the hang of it.  

Thirty or forty year later, I get a call out of the blue from a
company I'd never heard of.  I don't know how they got my name.  They
had just won a contract to port the now legacy trainer code to C, and
didn't know what to do with all that assembler.   I told him the
reason, and that this had been just before the ISA Extensions had been
released, but that all that assembler did was bitwise logic, which
could be done directly in C.  The caller was very happy to hear this.

Joe Gwinn

Reply by Martin Brown ●September 13, 20232023-09-13

On 11/09/2023 13:36, Phil Hobbs wrote:
> On 2023-09-11 04:05, Martin Brown wrote:

>> Indeed. Halley was pretty clever but forever overshadowed by Newton.
>>
>> Some of the latest codes use Delta_4 and Delta_5 corrections
>> provided that the function is sufficiently differentiable. They have
>> to be computed very carefully with considerable attention to detail!
> 
>>
> It's been quite awhile since I looked into the problem (and I was never
> close to your level of expertise there, Martin).&nbsp; However, this being
> Usenet, I am undeterred. ;)
> 
> Cubic methods are great as long as computing the second derivative is
> cheap.&nbsp; BITD the "Illinois algorithm" and its relatives got a fair
> amount of attention (in select circles).&nbsp; They're regula-falsi based,
> but with hyperlinear convergence.&nbsp; The best that I know is a&nbsp; Bjorck &
> Anderson variant whose efficiency index (basically the average order of
> convergence per function evaluation) gets up to&nbsp; 8**(1/4) ~ 1.68.
> This compares with 1.0 for bisection or vanilla Regula Falsi, and
> sqrt(2) for the secant method.

That's a blast from the past. It used to be popular in some of these 
problems until a way was found to tame potential divergence (basically 
by making sure that the starting guess is always within the range where 
the sequence is absolutely convergent). Some analytical sleight of hand 
is needed to get it spot on or there will be some insanely small but 
still non-zero value where it goes crazy as eccentricity e -> 1.

There has been a recent paper using bisection but it has various flaws - 
not least that it determines an accurate answer to the wrong question.

> Secant is too squirrelly to use by itself, because of its tendency to
> hit a flat spot and jump straight to East Limbo, never to return.
> Newton has that problem too, so you have to find some finite interval
> where f_1 * f_2 < 0, and reject steps that fall outside it.

One of the cute features of Kepler's problem E = M + e*sin(E) is that 
one of the simplest naive upper bounds of E_0 = M+e is absolutely 
convergent so if you clip on that you remain numerically stable.

> Besides that, there are two cases with Mr. Newton: if computing the
> derivative is comparatively cheap, as with trig functions, he gets an 
> efficiency index of 2, but if not, he's stuck down at sqrt(2) with 
> secant.&nbsp; (Sometimes the derivative is harder to compute than the value, 
> e.g. with special functions or ratios of trig polynomials, in which case 
> Newton is even worse off.)

The trend at the moment is to avoid system trig functions and have 5th 
order polynomial expansions around a table of equispaced fixed sin, cos 
values. Right now the trick for speed is avoid divides where possible.

> On the other hand, if the derivative can be computed without nasty
> significance loss due to cancellation, N. is likely to work better at
> high precision.

We always have analytical derivatives available (although they have some 
unpleasant properties in the singular corner).

> I've nearly always used derivative-free methods, because I'm usually not
> too concerned with ultimate accuracy, and the best ones can reach an
> efficiency index of very nearly 2.0, which is pretty cool.&nbsp; (Besides, e 
> I'm too lazy to want to manage all of those derivatives.)

If I only have the function available numerically as a black box and no 
analytical derivative I tend to favour Golden ratio search for a maximum 
and then take a punt on computing the derivative numerically and seeing 
where NR actually lands. It either polishes the answer nicely or goes 
AWOL. A few extra function evaluations is usually neither here nor there.

I did some work a long time ago on computing accurate numerical 
derivatives from equally spaced function evaluations. It still comes in 
handy sometimes (a bit of a lost art from the old days of manual 
computors sat at desks).

What surprised me most was that a flaw like that could be lurking in 
almost every cubic solver (I bet NAGlib got it right) just waiting for 
the right set of unplayable data to come along.

-- 
Martin Brown

Reply by Phil Hobbs ●September 13, 20232023-09-13

Martin Brown <'''newspam'''@nonad.co.uk> wrote:
> On 11/09/2023 13:36, Phil Hobbs wrote:
>> On 2023-09-11 04:05, Martin Brown wrote:
> 
>>> Indeed. Halley was pretty clever but forever overshadowed by Newton.
>>> 
>>> Some of the latest codes use Delta_4 and Delta_5 corrections
>>> provided that the function is sufficiently differentiable. They have
>>> to be computed very carefully with considerable attention to detail!
>> 
>>> 
>> It's been quite awhile since I looked into the problem (and I was never
>> close to your level of expertise there, Martin).&nbsp; However, this being
>> Usenet, I am undeterred. ;)
>> 
>> Cubic methods are great as long as computing the second derivative is
>> cheap.&nbsp; BITD the "Illinois algorithm" and its relatives got a fair
>> amount of attention (in select circles).&nbsp; They're regula-falsi based,
>> but with hyperlinear convergence.&nbsp; The best that I know is a&nbsp; Bjorck &
>> Anderson variant whose efficiency index (basically the average order of
>> convergence per function evaluation) gets up to&nbsp; 8**(1/4) ~ 1.68.
>> This compares with 1.0 for bisection or vanilla Regula Falsi, and
>> sqrt(2) for the secant method.
> 
> That's a blast from the past. It used to be popular in some of these 
> problems until a way was found to tame potential divergence (basically 
> by making sure that the starting guess is always within the range where 
> the sequence is absolutely convergent). Some analytical sleight of hand 
> is needed to get it spot on or there will be some insanely small but 
> still non-zero value where it goes crazy as eccentricity e -> 1.
> 
> There has been a recent paper using bisection but it has various flaws - 
> not least that it determines an accurate answer to the wrong question.

Could be a problem. 

> 
>> Secant is too squirrelly to use by itself, because of its tendency to
>> hit a flat spot and jump straight to East Limbo, never to return.
>> Newton has that problem too, so you have to find some finite interval
>> where f_1 * f_2 < 0, and reject steps that fall outside it.
> 
> One of the cute features of Kepler's problem E = M + e*sin(E) is that 
> one of the simplest naive upper bounds of E_0 = M+e is absolutely 
> convergent so if you clip on that you remain numerically stable.

Fun.  Last time I used Kepler&rsquo;s equation was on a problem set in about
1980. I recall being too lazy to use Newton, so since it was a planetary
orbit with epsilon <<1, I just rearranged terms and iterated on epsilon. 
(Prof Ovenden was obviously bored with grading Newtonian solutions, because
he gave me an extra point.) ;)

The best thing about it is the names&mdash;M is &ldquo;mean anomaly&rdquo; and E is
&ldquo;eccentric anomaly&rdquo;. 

> 
>> Besides that, there are two cases with Mr. Newton: if computing the
>> derivative is comparatively cheap, as with trig functions, he gets an 
>> efficiency index of 2, but if not, he's stuck down at sqrt(2) with 
>> secant.&nbsp; (Sometimes the derivative is harder to compute than the value, 
>> e.g. with special functions or ratios of trig polynomials, in which case 
>> Newton is even worse off.)
> 
> The trend at the moment is to avoid system trig functions and have 5th 
> order polynomial expansions around a table of equispaced fixed sin, cos 
> values. Right now the trick for speed is avoid divides where possible.
> 
>> On the other hand, if the derivative can be computed without nasty
>> significance loss due to cancellation, N. is likely to work better at
>> high precision.
> 
> We always have analytical derivatives available (although they have some 
> unpleasant properties in the singular corner).
> 
>> I've nearly always used derivative-free methods, because I'm usually not
>> too concerned with ultimate accuracy, and the best ones can reach an
>> efficiency index of very nearly 2.0, which is pretty cool.&nbsp; (Besides, e 
>> I'm too lazy to want to manage all of those derivatives.)
> 
> If I only have the function available numerically as a black box and no 
> analytical derivative I tend to favour Golden ratio search for a maximum 
> and then take a punt on computing the derivative numerically and seeing 
> where NR actually lands. It either polishes the answer nicely or goes 
> AWOL. A few extra function evaluations is usually neither here nor there.
> 
> I did some work a long time ago on computing accurate numerical 
> derivatives from equally spaced function evaluations. It still comes in 
> handy sometimes (a bit of a lost art from the old days of manual 
> computors sat at desks).

Long ago I did one of those on an HP41C, along the way to an
Adams-Bashforth-Moulton predictor-corrector ODE solver. (It&rsquo;s easy to
remember which is which&mdash;&ldquo;Bashforth&rdquo; is an excellent name for an
extrapolator.) ;)
> 
> What surprised me most was that a flaw like that could be lurking in 
> almost every cubic solver (I bet NAGlib got it right) just waiting for 
> the right set of unplayable data to come along.

Cheers 

Phil Hobbs





-- 
Dr Philip C D Hobbs  Principal Consultant  ElectroOptical Innovations LLC /
Hobbs ElectroOptics  Optics, Electro-optics, Photonics, Analog Electronics

Reply by Martin Brown ●September 15, 20232023-09-15

On 11/09/2023 21:58, Joe Gwinn wrote:
> On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:
> 
>> On 08/09/2023 22:14, Joe Gwinn wrote:
>>> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
>>> <'''newspam'''@nonad.co.uk> wrote:
>>>
>>
>>>>> NR = Newton Raphson?
>>>>
>>>> Yes. Although these days my production code actually uses Halley's
>>>> method in preference since it is almost the same speed to execute on
>>>> modern pipeline CPUs and gives cubic rather than quadratic convergence.
>>>
>>> Interesting.  I had not heard of Halley's Method, and cubic
>>> convergence is worthwhile.
>>
>> Indeed. Halley was pretty clever but forever overshadowed by Newton.
>>
>> Some of the latest codes use Delta_4 and Delta_5 corrections provided
>> that the function is sufficiently differentiable. They have to be
>> computed very carefully with considerable attention to detail!
> 
> And differentiation amplifies noise.

Symbolic differentiation doesn't. We have analytical expressions for the 
problem to solve and most of them are differentiable several times apart 
from where there are poles. Those regions need extra care.

>> One curiosity I have seen with them is that for me D_5 is anomalously
>> fast despite containing D_4 ! On certain problems D5 is *faster* than NR
>> (I suspect it is some sort of pipeline stall slowing the others down).
> 
> In the old days, one would see a bunch of NOPs in the generated
> assembly, but that won't work for this.  I would code a test case in
> assembly code and time it.  Maybe by toggling a wire before and after,
> and looking with a scope.

I tend to use RDTSC although it is deprecated for this sort of thing it 
works fairly well. You just have to be aware of its limitations. The 
ultimate way to do it is run a shim that lets you access the machine 
specific registers and directly measure pipeline stalls and other modes 
of failure. I haven't done that for this code (yet).

>> In the past they all got slower because there was a division for each
>> step down the chain. I have a new computational scheme which avoids that
>> (at an increased risk of under/over flow in some edge cases)
> 
> The old dodge was to multiply by the pre computed reciprocal, if this
> could be computed less often than every division.

These days you can pretty much rely on any decent compiler to precompute 
static constants into whatever form is best. The strength reduction 
rules work well. You can write x/3.0 confident in the knowledge that it 
will compile to x*(1.0/3) with the 1.0/3 computed at compile time.
(although it is as well to check this on new compilers)

Even if you do something like several expressions in a row /x  most of 
the latest compilers will compute 1/x and then multiply it out. Trying 
to micromanage such things can actually make it worse since the compiler 
might store the intermediate value in memory if you code it explicitly.

>> The codes all run as expected on simple starting guesses but when the
>> fancy cubic ones are used they show this peculiar speed variation. It is
>> most noticeable with the Intel compiler (less so on MSC or GCC).
>>
>> The reason these high order methods are fashionable right now is that
>> the iterative methods require extra function evaluations and for that
>> you have to crystallise the next value of x1 and recompute f(x1),f'(x1)
>> etc. Historically they would iterate 3-5 times to solve it. Having to
>> wait for that new input value is an unavoidable hard pipeline stall.
>   
> Use staggered parallel pipelines?

There is something about the odd numbered differences that makes them 
unusually fast.

>> But if you can get the initial guess with a relative error below
>> 10^(-16/N) then in an ideal world the Nth order difference corrector
>> gets the answer in a single step and so the next value can be computed.
> 
> Yes.

It is only being done because of the requirement for speed and accuracy 
at the same time. Every cycle counts. Someone is even working on a 
dedicated hardware solution using a variant of CORDIC for this problem.

https://arxiv.org/abs/2008.02894

They are getting quite close to having working hardware now.
>>>> I'm not convinced that it is worth the cost of a sqrt but with CPUs
>>>> improving all the time it is now very close. The latest i5-12600 I have
>>>> can do a sqrt in about the same time as it takes to do a divide!
>>>
>>> Wonder what algorithm it uses?
>>
>> My guess is some sort of cunning hardware shift, multiply and add. The
>> latest generation of chips stand out in this respect.  If only they would
>> implement a hardware cube root then I would be very happy.
> 
> That makes sense, as many have a barrel shifter in hardware.
>> ISTR there being a library function MULDIV(a,b,c) for that at least on
>> the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
>> think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).
> 
> Yes, it either didn't exist yet or didn't cover the specific case we
> had.

I'm pretty sure it was in one of the locally written system libraries. I 
also had a routine to mask out certain harmless underflows in an 
astrophysics FLIC code. It was scaled in SI units which were something 
like h/c^2 which didn't leave a lot of room in 32bit floats.

The original code spent most of its time in underflow recovery routines. 
I got an immediate 5x speed improvement with no loss of accuracy with 
just a few lines of assembler added to the project.

Eventually the problem was rescaled to put things into the middle ground 
so that underflows were much less likely.

-- 
Martin Brown

Reply by Joe Gwinn ●September 15, 20232023-09-15

On Fri, 15 Sep 2023 10:55:34 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

>On 11/09/2023 21:58, Joe Gwinn wrote:
>> On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
>> <'''newspam'''@nonad.co.uk> wrote:
>> 
>>> On 08/09/2023 22:14, Joe Gwinn wrote:
>>>> On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
>>>> <'''newspam'''@nonad.co.uk> wrote:
>>>>
>>>
>>>>>> NR = Newton Raphson?
>>>>>
>>>>> Yes. Although these days my production code actually uses Halley's
>>>>> method in preference since it is almost the same speed to execute on
>>>>> modern pipeline CPUs and gives cubic rather than quadratic convergence.
>>>>
>>>> Interesting.  I had not heard of Halley's Method, and cubic
>>>> convergence is worthwhile.
>>>
>>> Indeed. Halley was pretty clever but forever overshadowed by Newton.
>>>
>>> Some of the latest codes use Delta_4 and Delta_5 corrections provided
>>> that the function is sufficiently differentiable. They have to be
>>> computed very carefully with considerable attention to detail!
>> 
>> And differentiation amplifies noise.
>
>Symbolic differentiation doesn't. We have analytical expressions for the 
>problem to solve and most of them are differentiable several times apart 
>from where there are poles. Those regions need extra care.
>
>>> One curiosity I have seen with them is that for me D_5 is anomalously
>>> fast despite containing D_4 ! On certain problems D5 is *faster* than NR
>>> (I suspect it is some sort of pipeline stall slowing the others down).
>> 
>> In the old days, one would see a bunch of NOPs in the generated
>> assembly, but that won't work for this.  I would code a test case in
>> assembly code and time it.  Maybe by toggling a wire before and after,
>> and looking with a scope.
>
>I tend to use RDTSC although it is deprecated for this sort of thing it 
>works fairly well. You just have to be aware of its limitations. The 
>ultimate way to do it is run a shim that lets you access the machine 
>specific registers and directly measure pipeline stalls and other modes 
>of failure. I haven't done that for this code (yet).

Well, the old EE's approach always involves an oscilloscope and an
electrical signal from the device under test, simple and physical, so
it cannot so easily lie.  Well, before the advent of digital scopes,
but still...


>
>>> In the past they all got slower because there was a division for each
>>> step down the chain. I have a new computational scheme which avoids that
>>> (at an increased risk of under/over flow in some edge cases)
>> 
>> The old dodge was to multiply by the pre computed reciprocal, if this
>> could be computed less often than every division.
>
>These days you can pretty much rely on any decent compiler to precompute 
>static constants into whatever form is best. The strength reduction 
>rules work well. You can write x/3.0 confident in the knowledge that it 
>will compile to x*(1.0/3) with the 1.0/3 computed at compile time.
>(although it is as well to check this on new compilers)
>
>Even if you do something like several expressions in a row /x  most of 
>the latest compilers will compute 1/x and then multiply it out. Trying 
>to micromanage such things can actually make it worse since the compiler 
>might store the intermediate value in memory if you code it explicitly.

Yes.


>>> The codes all run as expected on simple starting guesses but when the
>>> fancy cubic ones are used they show this peculiar speed variation. It is
>>> most noticeable with the Intel compiler (less so on MSC or GCC).
>>>
>>> The reason these high order methods are fashionable right now is that
>>> the iterative methods require extra function evaluations and for that
>>> you have to crystallise the next value of x1 and recompute f(x1),f'(x1)
>>> etc. Historically they would iterate 3-5 times to solve it. Having to
>>> wait for that new input value is an unavoidable hard pipeline stall.
>>   
>> Use staggered parallel pipelines?
>
>There is something about the odd numbered differences that makes them 
>unusually fast.

I'd be looking at hardware details like how this all maps onto
physical memory, especially cache lines and the like.


>>> But if you can get the initial guess with a relative error below
>>> 10^(-16/N) then in an ideal world the Nth order difference corrector
>>> gets the answer in a single step and so the next value can be computed.
>> 
>> Yes.
>
>It is only being done because of the requirement for speed and accuracy 
>at the same time. Every cycle counts. Someone is even working on a 
>dedicated hardware solution using a variant of CORDIC for this problem.
>
><https://arxiv.org/abs/2008.02894>
>
>They are getting quite close to having working hardware now.

I've seen some CORDIC variations, typically for ellipses, but not this
one.


>>>>> I'm not convinced that it is worth the cost of a sqrt but with CPUs
>>>>> improving all the time it is now very close. The latest i5-12600 I have
>>>>> can do a sqrt in about the same time as it takes to do a divide!
>>>>
>>>> Wonder what algorithm it uses?
>>>
>>> My guess is some sort of cunning hardware shift, multiply and add. The
>>> latest generation of chips stand out in this respect.  If only they would
>>> implement a hardware cube root then I would be very happy.
>> 
>> That makes sense, as many have a barrel shifter in hardware.
>>> ISTR there being a library function MULDIV(a,b,c) for that at least on
>>> the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
>>> think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).
>> 
>> Yes, it either didn't exist yet or didn't cover the specific case we
>> had.
>
>I'm pretty sure it was in one of the locally written system libraries. I 
>also had a routine to mask out certain harmless underflows in an 
>astrophysics FLIC code. It was scaled in SI units which were something 
>like h/c^2 which didn't leave a lot of room in 32bit floats.

Sounds like Planck Scale?


>The original code spent most of its time in underflow recovery routines. 
>I got an immediate 5x speed improvement with no loss of accuracy with 
>just a few lines of assembler added to the project.
>

Been there done that.

One standard tool is a sampler tied to a hardware timer interrupt
programmed for random intervals between interrupts, collecting PC
addresses.  The "random interval" must at least be mutually prime with
respect to all periodic processes in the system under test.  

The histogram of PC addresses shows where the program was spending its
time, often in a place nobody ever heard of, usually deep in some
obscure part of the runtime system.  Like underflow handling.


>Eventually the problem was rescaled to put things into the middle ground 
>so that underflows were much less likely.

What I've seen used a lot is straight SI units with double precision
(64-bit) IEEE floating point arithmetic, with all conversions to and
from legacy units performed only at the interfaces.


Joe Gwinn

Previous 2 34Next

Tracking bug report frequency

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group