sci.electronics.design | highest frequency periodic interrupt?| page 3

Reply by Lasse Langwadt Christensen ●January 14, 20232023-01-14

l&oslash;rdag den 14. januar 2023 kl. 22.33.29 UTC+1 skrev John Larkin:
> On Sat, 14 Jan 2023 12:20:24 -0800 (PST), Lasse Langwadt Christensen 
> <lang...@fonz.dk> wrote: 
> 
> >l&oslash;rdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin: 
> >> On Sat, 14 Jan 2023 12:20:07 -0700, Don Y 
> >> <blocked...@foo.invalid> wrote: 
> >> 
> >> >On 1/14/2023 8:52 AM, Martin Brown wrote: 
> >> >> ISR code is generally very short and best done in assembler if you want it as 
> >> >> quick as possible. Examining the code generation of GCC is worthwhile since it 
> >> >> sucks compared to Intel(better) and MS (best). 
> >> > 
> >> >I always code ISRs in a HLL -- if only to act as pseudo-code 
> >> >illustrating what the (ASM) code is actually doing. IME, people 
> >> >miss details in ASM so having those expressed in a HLL makes 
> >> >it easier for them to understand the *goal* of the code. 
> >> > 
> >> >Looking at a .S is a great starting point *if* you have to 
> >> >hand-tweak the code. Remembering that the code that gets 
> >> >executed will change as the compiler is revised; ASM won't 
> >> >(which can be A Good Thing as well as A Bad Thing). 
> >> > 
> >> >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when 
> >> >> generating Intel CPU specific SIMD code with maximum optimisation. 
> >> > 
> >> >I'd be less worried about quality of code generator (compiler vs. human ASM) 
> >> >than the effects of cache, core affinity, *which* bus(es) are called on 
> >> >for each instruction, other contenders for those resources, etc. 
> >> The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 
> >> Kbyte cache. Cache misses will be *very* slow. So code will need to be 
> >> very tight bare-metal. The entire ISR should fit in cache. 
> > 
> >you can copy some (or all) of the code to ram instead of using execute-in-place from flash
> That's a good idea. A typical ISR could be pretty small, and let the 
> mainline program thrash all it likes.
> > 
> >I think you can even turn off the cache to get an additional 16k ram
> Yikes, execute out of SPI flash?

no, copy all the code to ram on boot

Reply by John Larkin ●January 14, 20232023-01-14

On Sat, 14 Jan 2023 13:42:54 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>l&#4294967295;rdag den 14. januar 2023 kl. 22.33.29 UTC+1 skrev John Larkin:
>> On Sat, 14 Jan 2023 12:20:24 -0800 (PST), Lasse Langwadt Christensen 
>> <lang...@fonz.dk> wrote: 
>> 
>> >l&#4294967295;rdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin: 
>> >> On Sat, 14 Jan 2023 12:20:07 -0700, Don Y 
>> >> <blocked...@foo.invalid> wrote: 
>> >> 
>> >> >On 1/14/2023 8:52 AM, Martin Brown wrote: 
>> >> >> ISR code is generally very short and best done in assembler if you want it as 
>> >> >> quick as possible. Examining the code generation of GCC is worthwhile since it 
>> >> >> sucks compared to Intel(better) and MS (best). 
>> >> > 
>> >> >I always code ISRs in a HLL -- if only to act as pseudo-code 
>> >> >illustrating what the (ASM) code is actually doing. IME, people 
>> >> >miss details in ASM so having those expressed in a HLL makes 
>> >> >it easier for them to understand the *goal* of the code. 
>> >> > 
>> >> >Looking at a .S is a great starting point *if* you have to 
>> >> >hand-tweak the code. Remembering that the code that gets 
>> >> >executed will change as the compiler is revised; ASM won't 
>> >> >(which can be A Good Thing as well as A Bad Thing). 
>> >> > 
>> >> >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when 
>> >> >> generating Intel CPU specific SIMD code with maximum optimisation. 
>> >> > 
>> >> >I'd be less worried about quality of code generator (compiler vs. human ASM) 
>> >> >than the effects of cache, core affinity, *which* bus(es) are called on 
>> >> >for each instruction, other contenders for those resources, etc. 
>> >> The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 
>> >> Kbyte cache. Cache misses will be *very* slow. So code will need to be 
>> >> very tight bare-metal. The entire ISR should fit in cache. 
>> > 
>> >you can copy some (or all) of the code to ram instead of using execute-in-place from flash
>> That's a good idea. A typical ISR could be pretty small, and let the 
>> mainline program thrash all it likes.
>> > 
>> >I think you can even turn off the cache to get an additional 16k ram
>> Yikes, execute out of SPI flash?
>
>no, copy all the code to ram on boot

Ok, OK, the entire app and variables and stacks and buffers would have
to fit in 256K. Might work.

Reply by Clive Arthur ●January 14, 20232023-01-14

On 13/01/2023 23:46, John Larkin wrote:
> What's the fastest periodic IRQ that you have ever run?

<snip>

Got a 3us interrupt servicing an ADC, assembler of course.  Only a 
40MIPs processor and the 3us has a scattering of 3.025us and 2.975us 
intervals as needed to maintain synchronisation to a remote transmitter 
with no possibility of a common clock.

Works fine at Gas Mark 4, aka 180'C.

-- 
Cheers
Clive

Reply by Jan Panteltje ●January 15, 20232023-01-15

On a sunny day (Sat, 14 Jan 2023 10:50:30 -0800) it happened John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote in
<nmt5shp49hqugndric000iqtgprikikub9@4ax.com>:

>On Sat, 14 Jan 2023 17:57:08 GMT, Jan Panteltje
><pNaonStpealmtje@yahoo.com> wrote:
>
>>On a sunny day (Sat, 14 Jan 2023 08:31:33 -0800) it happened John Larkin
>><jlarkin@highlandSNIPMEtechnology.com> wrote in
>><tol5shtb7chchpkq63hnb1mfsveolk1tib@4ax.com>:
>>
>>>On Sat, 14 Jan 2023 06:27:45 GMT, Jan Panteltje
>>><pNaonStpealmtje@yahoo.com> wrote:
>>
>>>>Payed about 100 USD for my Pi4 4 GB and my Pi4 8 GB just 2 years ago December 2020,
>>>>including SDcard, RapiOS, plastic housing, cables, cooling fins and supply.
>>>>
>>>>No fan, it does run hot, about 70 C.
>>>>But I use that one for web browsing.
>>>>The older one with 4 GB memory has an ebay metal housing and a fan.
>>>>After lubricating that fan with vaseline it now has run quiet for 4 years?
>>>>The metal housing also stops any WiFi, as that one is part of the security
>>>>system and no WiFi allowed there.
>>>>It runs 24/7 recording 6 cameras, 2 audio channels, weather sensors (temp, air pressure, humidity
>>>>airtraffic, ship traffic, radiation etc  (from an even older rRaspberry Pi that works as server) ..
>>>> http://panteltje.com/panteltje/xgpspc/index.html
>>>>Each Pi4 has a 4 TB Toshiba USB harddisk connected to it.
>>>>
>>>>
>>>>
>>>>
>>>>>The enclosure is a nightmare so I threw that away. Just run the board.
>>>>>It doesn't seem to need the fan.
>>>>
>>>>Type this in a terminal to see the current temperature:
>>>> vcgencmd measure_temp
>>>
>>>
>>>Fingers are easier.
>>
>>This is from google:
>> For Raspberry Pi 3+, a 'soft' temperature limit of 60&#4294967295;C has been introduced.]
>> This means that even before reaching the hard limit at 85&#4294967295;C, the clock speed is reduced from 1.4GHz to lower frequencies,
>> reducing the temperatu
>>and
>> That is the so-called throttling. The Raspberry Pi monitors the temperature continuously.
>> Above 82 &#4294967295;C (180 &#4294967295;F), the clock frequency is automatically lowered, regardless of which flag is set. This action will reduce
>> heat 
>>
>>So better use vcgencmd and it saves your finger too from getting fried.
>
>It might get hotter when it's compiling or something, but it's not
>very warm. It would be easy to add the fan if it got necessary. The
>kit did come with three stick-on heat sinks.
>
>There are also LCD monitor things that the 4B mounts on the back of.
>They have a fan.
>
>
>>I should actually get a better housing with fan for my Pi4 8 GB like I have for my Pi4 4 GB that runs at about 46 Degrees C.
>>Of course maybe bringing your own fried finger to a restaurant ?? ..Discount?
>
>My finger is calibrated. I can touch 50C forever and 60C for about
>half a second. Touching 100C briefly hurts but does no harm. Baking a
>real pie is more dangerous.
>
>I've had interns that refused to touch chips to see if they are hot.
>They were afraid of being electrocuted by 3.3 volts.

Yea, OK, 
The thing about vcgencmd is that you can use it from a script or program
to for example reduce priority or temporarily slow down or halt some not essential code
to prevent the Pi lowering the clock on your important things, give alarms, etc etc.
Change processor use too.

Reply by Brooke Matt ●January 15, 20232023-01-15

Buy Vape Cartridges Online
Variegated Plants For Sale Near Me
Bruce Banner #3 Strain
Buy Edibles Online
Buy Dank Gummies 500mg
Brass Knuckles For Sale
White Monstera For Sale
Buy AK-47 Weed Online
Buy One Up Mushroom Bar 3.5G
Tales Of Arabian Nights
Buy Green Crack Online
Ghost Train Haze For Sale
Buy Alaskan Thunder Fuck Online
Buy Budheads Edibles Chewy Cubes 600 mg
Buy Rhaphidophora tetrasperma
Buy Acapulco Gold strain online
Batman 66 Pinball For Sale
Monstera Albo For Sale Florida
Buy Gas Heads Edibles 600mg
Buy Bhang Cartridges Online
Philodendron fibraecataphyllum
Buy Iron Man Pinball Online
Buy Sour Diesel Online
Caudex (Beaucarnea)
Twilight Zone Pinball For Sale
Buy Nova Vape Carts Online
Maranta Lemon Lime For Sale
Philodendron Caramel Marble Variegated
Blueberry Strain For Sale
Pinball Machine Star Wars
Philodendron Florida Beauty Variegata
Buy Kali Mist Online
Jurassic Park Pinball
Buy Chocolope Online
Buy Durban Poison Online
Buy Spliffin Vape Cartridges Online
Buy Skywalker OG Online
Buy Push Vape Cartridges Online
Buy Wonders 1000mg THC Canna Lean Online
Buy Grapefruit Online
Friendly Farms Carts For Sale
Buy Lemon Haze Strain
Buy Weed Online
Variegated Plant Shop


https://megaweedmarketltd.com/product/bruce_banner_strain/
https://megaweedmarketltd.com/product/dank_gummies/
https://megaweedmarketltd.com/product/brass_knuckles_for_sale/
https://qualityvariegatedplants.com/product/white-monstera-for-sale/
https://megaweedmarketltd.com/product/ak_47_strain/
https://megaweedmarketltd.com/product/one_up_bar/
https://qualitypinballcompany.com/product/tales_of_arabian_nights/
https://megaweedmarketltd.com/product/green_crack_strain/
https://megaweedmarketltd.com/product/ghost-train-haze/
https://megaweedmarketltd.com/product/buy-alaskan-thunder-fuck-online/
https://megaweedmarketltd.com/product/budheads/
https://qualityvariegatedplants.com/product/buy-rhaphidophora-tetrasperma/
https://megaweedmarketltd.com/product/buy-acapulco-gold-strain-online/
https://qualitypinballcompany.com/product/batman_66_pinball_for_sale/
https://qualityvariegatedplants.com/product/monstera-albo-for-sale-florida/
https://megaweedmarketltd.com/product/gas_heads/
https://megaweedmarketltd.com/product/buy_bhang_cartridges_online/
https://qualityvariegatedplants.com/product/philodendron-fibraecataphyllum/
https://qualitypinballcompany.com/product/buy_iron_man_pinball_online/
https://qualityvariegatedplants.com/product/caudex-beaucarnea/
https://qualitypinballcompany.com/product/twilight_zone_pinball_for_sale/
https://megaweedmarketltd.com/product/buy_nova_vape_carts_online/
https://qualityvariegatedplants.com/product/maranta-lemon-lime-for-sale/
https://qualityvariegatedplants.com/product/philodendron-caramel-marble/
https://megaweedmarketltd.com/product/blueberry_strain/
https://qualitypinballcompany.com/product/pinball_machine_star_wars/
https://qualityvariegatedplants.com/product/philodendron-florida-beauty-2/
https://megaweedmarketltd.com/product/kali-mist/
https://qualitypinballcompany.com/product/jurassic_park_pinball/
https://megaweedmarketltd.com/product/chocolope/
https://megaweedmarketltd.com/product/buy-durban-poison-online/
https://megaweedmarketltd.com/product/spliffin_cartridges/
https://megaweedmarketltd.com/product/skywalker_strain/
https://megaweedmarketltd.com/product/buy_push_vape_cartridges_online/
https://megaweedmarketltd.com/product/thc_lean/
https://megaweedmarketltd.com/product/grapefruit/
https://megaweedmarketltd.com/product/friendly_farms/
https://megaweedmarketltd.com/product/lemon_haze/
https://megaweedmarketltd.com/product/buy_grease_monkey_exotic_carts/
https://megaweedmarketltd.com/product/710_kingpen_cartridges_for_sale/
https://megaweedmarketltd.com/product/buy_moonrock_clear_carts_online/
https://qualityvariegatedplants.com/product/philodendron-florida-beauty-variegated-for-sale/
https://qualityvariegatedplants.com/product/philodendron-florida-beauty-for-sale-near-me/
https://megaweedmarketltd.com/product/rove_carts/

Reply by Jan Panteltje ●January 15, 20232023-01-15

On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote in
<epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>:

>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
><'''newspam'''@nonad.co.uk> wrote:
>
>>On 13/01/2023 23:46, John Larkin wrote:
>>> What's the fastest periodic IRQ that you have ever run?
>>
>>Usually try to avoid having fast periodic IRQs in favour of offloading 
>>them onto some dedicated hardware. But CPUs were slower then than now.
>>> 
>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
>>> down some to save power, so the ISR runs for about 7 usec max.
>>> 
>>> I ask because if I use a Pi Pico on some new projects, it has a
>>> dual-core 133 MHz CPU, and one core may have enough compute power that
>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in
>>> software.
>>> 
>>> RP2040 floating point is tempting but probably too slow for control
>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
>>> I guess.
>>
>>It might be worth benchmarking how fast the FPU really is on that device 
>>(for representative sample code). The Intel i5 & i7 can do all except 
>>divide in a single cycle these days - I don't know what Arm is like in 
>>this respect. You get some +*- for free close to every divide too.
>
>The RP2040 chip has FP routines in the rom, apparently code with some
>sorts of hardware assist, but it's callable subroutines and not native
>instructions to a hardware FP engine. When it returns it's done.
>
>Various web sites seem to confuse microseconds and nanoseconds. 150 us
>does seem slow for a "fast" fp operation. We'll have to do
>experiments.
>
>I wrote one math package for the 68K, with the format signed 32.32.
>That behaved just like floating point in real life, but was small and
>fast and avoided drecky scaled integers.
>
>>
>>*BIG* time penalty for having two divides or branches too close 
>>together. Worth playing around to find patterns the CPU does well.
>
>Without true hardware FP, call locations probably don't matter.
>
>>
>>Beware that what you measure gets controlled but for polynomials up to 5 
>>term or rationals up to about 5,2 call overhead may dominate the 
>>execution time (particularly if the stupid compiler puts a 16byte 
>>structure across a cache boundary on the stack).
>
>We occasionally use polynomials, but 2nd order and rarely 3rd is
>enough to get analog i/o close enough. 
>
>>
>>Forcing inlining of small code sections can help. DO it to excess and it 
>>will slow things down - there is a sweet spot. Loop unrolling is much 
>>less useful these days now that branch prediction is so good.
>>
>>> I was also thinking that we could make a 2 or 3-bit DAC with a few
>>> resistors. The IRQ could load that at various places and a scope would
>>> trace execution. That would look cool. On the 1758 thing we brought
>>> out a single bit to a test point and raised that during the ISR so we
>>> could see ISR execution time on a scope. My c guy didn't believe that
>>> a useful ISR could run at 100K and had no idea what execution time
>>> might be.
>>
>>ISR code is generally very short and best done in assembler if you want 
>>it as quick as possible. Examining the code generation of GCC is 
>>worthwhile since it sucks compared to Intel(better) and MS (best).
>>
>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ 
>>when generating Intel CPU specific SIMD code with maximum optimisation.
>>
>>MS compiler still does pretty stupid things like internal compiler 
>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and 
>>having them crossing a cache line boundary.
>
>Nobody has answered my question. Generalizations about software timing
>abound but hard numbers are rare. Programmers don't seem to use
>oscilloscopes much.

That is silly
 http://panteltje.com/panteltje/pic/scope_pic/index.html

Try reading the asm, it is well commented.
:-)

And if you are talking Linux or other multi-taskers there is a lot more involved.

I was amazed about the other thread about logic analyzers.
Why did I never need one for my code / projects?
All you need is a scope... especially an analog one, digital ones are liars!

If you have no clue then having a hall full of equipment does not give you one!

mm
did I use a scope for any of this?
 http://panteltje.com/panteltje/newsflex/download.html
I only have a 10 MHz analog dual trace one!
Wel, it shows 25 MHz too, but attenuated.
But I DO have rtl_sdr sticks that show spectrum from 25 MHz to 1.5 GHz
It is so simple, all of it...
maybe not for a mamatician, but then...
/

Reply by ●January 15, 20232023-01-15

On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje
<pNaonStpealmtje@yahoo.com> wrote:

>On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin
><jlarkin@highlandSNIPMEtechnology.com> wrote in
><q5p3shh8f34tt34ka767750oc2ou8p7vl8@4ax.com>:
>
>>What's the fastest periodic IRQ that you have ever run?
>>
>>We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted
>>by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a
>>PID loop, which outputs to the on-chip DAC. We cranked the CPU clock
>>down some to save power, so the ISR runs for about 7 usec max.
>>
>>I ask because if I use a Pi Pico on some new projects, it has a
>>dual-core 133 MHz CPU, and one core may have enough compute power that
>>we wouldn't need an FPGA in a lot of cases. Might even do DDS in
>>software.
>>
>>RP2040 floating point is tempting but probably too slow for control
>>use. Things seem to take 50 or maybe 100 us. Back to scaled integers,
>>I guess.
>>
>>I was also thinking that we could make a 2 or 3-bit DAC with a few
>>resistors. The IRQ could load that at various places and a scope would
>>trace execution. That would look cool. On the 1758 thing we brought
>>out a single bit to a test point and raised that during the ISR so we
>>could see ISR execution time on a scope. My c guy didn't believe that
>>a useful ISR could run at 100K and had no idea what execution time
>>might be.
>
>Well in that sort of thing you need to think in asm, instruction times,
>but I have no experience with the RP2040, and little with ASM on ARM.
>Should be simple to test how long the C code takes, do you have an RP2040?
>Playing with one would be a good starting point.
>Should I get one? Was thinking just for fun...

In the past coding ISRs in assembly was the way to go, but the
complexity of current processors (cache, pipelining) makes it hard to
beat a _good_ compiler.

The main principle still is to minimize the number of registers saved
at
interrupt entry (and restored at exit).On a primitive processor only
the processor status word and program counter needs to be saved (and
restored). Additional registers may need to be saved(restored if the
ISR uses them.

If the processor has separate FP registers and/or separate FP status
words, avoid using FP registers in ISRs.

Some compilers may have "interrupt" keywords or similar extensions and
the compiler knows which registers need to be saved in the ISR. To
help the compiler, include all functions that are called by the ISR in
the same module(preferably in-lined) prior to the ISR, so that the
compiler knows what needs to be saved. Do not call external library
routines from ISR, since the compiler doesn't know which registers
need to be saved and saves all.

Reply by Don Y ●January 15, 20232023-01-15

On 1/14/2023 10:10 PM, upsidedown@downunder.com wrote:
> In the past coding ISRs in assembly was the way to go, but the
> complexity of current processors (cache, pipelining) makes it hard to
> beat a _good_ compiler.

Exactly.  And, it's usually easier to see what you are trying
to do in a HLL vs. ASM (and heaven forbid you want to port
the application to a different processor!)

The problem with using an HLL is making sure you actually
understand some "line of code" translates into when it comes
to actual opcode/memory accesses (not just which instructions
but, rather, the *cost* of those instructions)

And, this can change, based on *how* the compiler is invoked
(how aggressive the code generator)

> The main principle still is to minimize the number of registers saved
> at
> interrupt entry (and restored at exit).On a primitive processor only
> the processor status word and program counter needs to be saved (and
> restored). Additional registers may need to be saved(restored if the
> ISR uses them.

Some "advanced" processors still support a "Fast IRQ" that saves
just an accumulator and PSW.  A tacit acknowledgement that you
don't want to have to save the *entire* processor state (as
you likely don't know what portions of it the compiler *might*
call on).

> If the processor has separate FP registers and/or separate FP status
> words, avoid using FP registers in ISRs.

As with everything, *how* you use them can make a difference.
E.g., if your ISR reenables interrupts (prior to completion), it
can make sense to use "expensive" instruction sequences (assuming
the ISR doesn't interrupt itself).

[Degenerate example: the scheduler being invoked!]

> Some compilers may have "interrupt" keywords or similar extensions and
> the compiler knows which registers need to be saved in the ISR. To
> help the compiler, include all functions that are called by the ISR in
> the same module(preferably in-lined) prior to the ISR, so that the
> compiler knows what needs to be saved. Do not call external library
> routines from ISR, since the compiler doesn't know which registers
> need to be saved and saves all.

Reply by Martin Brown ●January 15, 20232023-01-15

On 14/01/2023 18:21, John Larkin wrote:
> On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown
> <'''newspam'''@nonad.co.uk> wrote:

>> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++
>> when generating Intel CPU specific SIMD code with maximum optimisation.
>>
>> MS compiler still does pretty stupid things like internal compiler
>> generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and
>> having them crossing a cache line boundary.
> 
> Nobody has answered my question. Generalizations about software timing
> abound but hard numbers are rare. Programmers don't seem to use
> oscilloscopes much.

The guardians of big iron won't let you poke around its internals with a 
scope although I do recall them having an AM radio on the console so 
that you could listen in on the RFI to see if it was stuck in a loop.

I prefer to use RDTSC for my Intel timings anyway.

On many of the modern CPUs there is a freerunning 64 bit counter clocked 
at once per cycle. Intel deprecates using it for such purposes but I 
have never found it a problem provided that you bracket it before and 
after with CPUID to force all the pipelines into an empty state.

The equivalent DWT_CYCCNT on the Arm CPUs that support it is described here:

https://stackoverflow.com/questions/32610019/arm-m4-instructions-per-cycle-ipc-counters

I prefer hard numbers to a vague scope trace.

If I'm really serious about finding out why something is unusually slow 
I run a dangerous system level driver that allows me full access to the 
model specific registers to monitor cache misses and pipeline stalls.

One of my recent test shows that in the MS SSE2 library whilst sin and 
cos are both properly rounded to acceptable machine precision tolerance 
the results from the combined sincos have worst case behaviour 4x eps.

This makes answers change when the optimisation level is increased to 
maximum in code which uses both sin(x) and cos(x) and mine does.

-- 
Regards,
Martin Brown

Reply by Don Y ●January 15, 20232023-01-15

On 1/15/2023 2:48 AM, Martin Brown wrote:
> I prefer to use RDTSC for my Intel timings anyway.
> 
> On many of the modern CPUs there is a freerunning 64 bit counter clocked at 
> once per cycle. Intel deprecates using it for such purposes but I have never 
> found it a problem provided that you bracket it before and after with CPUID to 
> force all the pipelines into an empty state.
> 
> The equivalent DWT_CYCCNT on the Arm CPUs that support it is described here:
> 
> https://stackoverflow.com/questions/32610019/arm-m4-instructions-per-cycle-ipc-counters
> 
> I prefer hard numbers to a vague scope trace.

Two downsides:
- you have to instrument your code (but, if you're concerned with performance,
   you've already done this as a matter of course)
- it doesn't tell you about anything that happens *before* the code runs
   (e.g., latency between event and recognition thereof)

> If I'm really serious about finding out why something is unusually slow I run a 
> dangerous system level driver that allows me full access to the model specific 
> registers to monitor cache misses and pipeline stalls.

But, those results can change from instance to instance (as can latency,
execution time, etc.).  So, you need to look at the *distribution* of
values and then think about whether that truly represents "typical"
and/or *worst* case.

Relying on exact timings is sort of naive; it ignores how much
things can vary with the running system (is the software in a
critical region when the ISR is invoked?) and the running
*hardware* (multilevel caches, etc.)

Do you have a way of KNOWING when your expectations (which you
have now decided are REQUIRMENTS!) are NOT being met?  And, if so,
what do you do (at runtime) with that information?   ("I'm sorry,
one of my basic assumptions is proving to be false and I am not
equipped to deal with that...")

Esp given that your implementation will likely evolve and
folks doing that work may not be as focused as you were on
this specific issue...

> One of my recent test shows that in the MS SSE2 library whilst sin and cos are 
> both properly rounded to acceptable machine precision tolerance the results 
> from the combined sincos have worst case behaviour 4x eps.
> 
> This makes answers change when the optimisation level is increased to maximum 
> in code which uses both sin(x) and cos(x) and mine does.

Previous 1 234 5 6 Next

highest frequency periodic interrupt?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group