Electronics-Related.com
Forums

highest frequency periodic interrupt?

Started by John Larkin January 13, 2023
Am 14.01.23 um 19:21 schrieb John Larkin:

> Nobody has answered my question. Generalizations about software timing > abound but hard numbers are rare. Programmers don't seem to use > oscilloscopes much.
I did it on the BeagleBoneBlack. It has an ARM CPU to run Debian Linux etc and two I/O processors that are 200 MHz RISCs without pipeline stalls and operating system. The TI C compiler is on the BBB. I can do I/O with 5ns resolution & rate. No jitter, and directly from a C program. volatile int i; myportbit = 0; .... myportbit = 1; i = 0; i = 0; myportbit = 0; would create a 15 ns wide pulse on myportbit. Cheers, Gerhard
https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188
On 1/14/2023 8:52 AM, Martin Brown wrote:
> ISR code is generally very short and best done in assembler if you want it as > quick as possible. Examining the code generation of GCC is worthwhile since it > sucks compared to Intel(better) and MS (best).
I always code ISRs in a HLL -- if only to act as pseudo-code illustrating what the (ASM) code is actually doing. IME, people miss details in ASM so having those expressed in a HLL makes it easier for them to understand the *goal* of the code. Looking at a .S is a great starting point *if* you have to hand-tweak the code. Remembering that the code that gets executed will change as the compiler is revised; ASM won't (which can be A Good Thing as well as A Bad Thing).
> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when > generating Intel CPU specific SIMD code with maximum optimisation.
I'd be less worried about quality of code generator (compiler vs. human ASM) than the effects of cache, core affinity, *which* bus(es) are called on for each instruction, other contenders for those resources, etc. I wrote a MT driver for ISA (1600bpi @ 100ips). Doesn't allow much time to actually talk to the *I/O* with the throughput available on that bus! Better approaches (barring committing hardware to a task -- boo, hiss!) are to decouple the time constraint from the code's execution. E.g., let loops run "as fast as they can" and adjust the code to compensate, dynamically (if there is any VARIABLE latency in an ISR, then you likely have overlooked this, already!)
> MS compiler still does pretty stupid things like internal compiler generated > SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and having them > crossing a cache line boundary.
Advantage: ASM. But, only if the programmer actually understands the hardware at a level above the programming model. (programmers are often pretty lousy at understanding the implications of a hardware design; engineers/coders equally so when trying to map their knowledge onto an algorithm: "Why doesn't it ALWAYS work?")
On 1/14/2023 11:55 AM, Gerhard Hoffmann wrote:
> I did it on the BeagleBoneBlack. It has an ARM CPU to run > Debian Linux etc and two I/O processors that are 200 MHz RISCs > without pipeline stalls and operating system. The TI C compiler > is on the BBB. I can do I/O with 5ns resolution & rate. > No jitter, and directly from a C program. > > volatile int i; > myportbit = 0; > .... > myportbit = 1; > i = 0; > i = 0; > myportbit = 0; > > would create a 15 ns wide pulse on myportbit.
At the very least, you want to annotate this to indicate the "i" assignments are being used SOLELY for their side-effects (else someone may erroneously remove one -- OR BOTH -- of them in a misguided attempt to "improve" the code; or, change the type of i, etc.). Likewise, making the ordering of the myportbit assignments more explicit -- to ensure they aren't reordered or removed (by an overzealous maintainer). Of course, if no one ever sees your code but you (and, you have a perfect memory), ...
On Sat, 14 Jan 2023 12:20:07 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 1/14/2023 8:52 AM, Martin Brown wrote: >> ISR code is generally very short and best done in assembler if you want it as >> quick as possible. Examining the code generation of GCC is worthwhile since it >> sucks compared to Intel(better) and MS (best). > >I always code ISRs in a HLL -- if only to act as pseudo-code >illustrating what the (ASM) code is actually doing. IME, people >miss details in ASM so having those expressed in a HLL makes >it easier for them to understand the *goal* of the code. > >Looking at a .S is a great starting point *if* you have to >hand-tweak the code. Remembering that the code that gets >executed will change as the compiler is revised; ASM won't >(which can be A Good Thing as well as A Bad Thing). > >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when >> generating Intel CPU specific SIMD code with maximum optimisation. > >I'd be less worried about quality of code generator (compiler vs. human ASM) >than the effects of cache, core affinity, *which* bus(es) are called on >for each instruction, other contenders for those resources, etc.
The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 Kbyte cache. Cache misses will be *very* slow. So code will need to be very tight bare-metal. The entire ISR should fit in cache. When that gets dicey, we'll have to add an FPGA.
> >I wrote a MT driver for ISA (1600bpi @ 100ips). Doesn't allow much time >to actually talk to the *I/O* with the throughput available on that bus! > >Better approaches (barring committing hardware to a task -- boo, hiss!) >are to decouple the time constraint from the code's execution. E.g., >let loops run "as fast as they can" and adjust the code to compensate, >dynamically (if there is any VARIABLE latency in an ISR, then you >likely have overlooked this, already!)
Control loops need to run at a constant rate, with a modest amount of jitter maybe.
> >> MS compiler still does pretty stupid things like internal compiler generated >> SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and having them >> crossing a cache line boundary. > >Advantage: ASM. > >But, only if the programmer actually understands the hardware at a level above >the programming model. (programmers are often pretty lousy at understanding >the implications of a hardware design; engineers/coders equally so when trying >to map their knowledge onto an algorithm: "Why doesn't it ALWAYS work?")
On Sat, 14 Jan 2023 11:07:11 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188
If I understand that, a floating add would take about 500 ns with a 133 MHz clock. That's not as bad as software float, but I wouldn't be able to do much fp math in a 100 KHz irq. So, scaled integers or FPGA.
l&oslash;rdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin:
> On Sat, 14 Jan 2023 12:20:07 -0700, Don Y > <blocked...@foo.invalid> wrote: > > >On 1/14/2023 8:52 AM, Martin Brown wrote: > >> ISR code is generally very short and best done in assembler if you want it as > >> quick as possible. Examining the code generation of GCC is worthwhile since it > >> sucks compared to Intel(better) and MS (best). > > > >I always code ISRs in a HLL -- if only to act as pseudo-code > >illustrating what the (ASM) code is actually doing. IME, people > >miss details in ASM so having those expressed in a HLL makes > >it easier for them to understand the *goal* of the code. > > > >Looking at a .S is a great starting point *if* you have to > >hand-tweak the code. Remembering that the code that gets > >executed will change as the compiler is revised; ASM won't > >(which can be A Good Thing as well as A Bad Thing). > > > >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when > >> generating Intel CPU specific SIMD code with maximum optimisation. > > > >I'd be less worried about quality of code generator (compiler vs. human ASM) > >than the effects of cache, core affinity, *which* bus(es) are called on > >for each instruction, other contenders for those resources, etc. > The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 > Kbyte cache. Cache misses will be *very* slow. So code will need to be > very tight bare-metal. The entire ISR should fit in cache.
you can copy some (or all) of the code to ram instead of using execute-in-place from flash I think you can even turn off the cache to get an additional 16k ram
l&oslash;rdag den 14. januar 2023 kl. 21.15.02 UTC+1 skrev John Larkin:
> On Sat, 14 Jan 2023 11:07:11 -0800 (PST), Lasse Langwadt Christensen > <lang...@fonz.dk> wrote: > > >https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188 > > If I understand that, a floating add would take about 500 ns with a > 133 MHz clock. That's not as bad as software float, but I wouldn't be > able to do much fp math in a 100 KHz irq. > > So, scaled integers or FPGA.
or a different MCU with an FPU, "blackpills" are a similar formfactor and has a cortex-M4 https://www.amazon.com/Alinan-STM32F401CCU6-Development-MicroPython-Programming/dp/B0B96YMQQP/
On Sat, 14 Jan 2023 12:20:24 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>l&#4294967295;rdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin: >> On Sat, 14 Jan 2023 12:20:07 -0700, Don Y >> <blocked...@foo.invalid> wrote: >> >> >On 1/14/2023 8:52 AM, Martin Brown wrote: >> >> ISR code is generally very short and best done in assembler if you want it as >> >> quick as possible. Examining the code generation of GCC is worthwhile since it >> >> sucks compared to Intel(better) and MS (best). >> > >> >I always code ISRs in a HLL -- if only to act as pseudo-code >> >illustrating what the (ASM) code is actually doing. IME, people >> >miss details in ASM so having those expressed in a HLL makes >> >it easier for them to understand the *goal* of the code. >> > >> >Looking at a .S is a great starting point *if* you have to >> >hand-tweak the code. Remembering that the code that gets >> >executed will change as the compiler is revised; ASM won't >> >(which can be A Good Thing as well as A Bad Thing). >> > >> >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when >> >> generating Intel CPU specific SIMD code with maximum optimisation. >> > >> >I'd be less worried about quality of code generator (compiler vs. human ASM) >> >than the effects of cache, core affinity, *which* bus(es) are called on >> >for each instruction, other contenders for those resources, etc. >> The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 >> Kbyte cache. Cache misses will be *very* slow. So code will need to be >> very tight bare-metal. The entire ISR should fit in cache. > >you can copy some (or all) of the code to ram instead of using execute-in-place from flash
That's a good idea. A typical ISR could be pretty small, and let the mainline program thrash all it likes.
> >I think you can even turn off the cache to get an additional 16k ram
Yikes, execute out of SPI flash?
On Sat, 14 Jan 2023 12:27:03 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>l&#4294967295;rdag den 14. januar 2023 kl. 21.15.02 UTC+1 skrev John Larkin: >> On Sat, 14 Jan 2023 11:07:11 -0800 (PST), Lasse Langwadt Christensen >> <lang...@fonz.dk> wrote: >> >> >https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188 >> >> If I understand that, a floating add would take about 500 ns with a >> 133 MHz clock. That's not as bad as software float, but I wouldn't be >> able to do much fp math in a 100 KHz irq. >> >> So, scaled integers or FPGA. > >or a different MCU with an FPU, "blackpills" are a similar formfactor and has a cortex-M4 >https://www.amazon.com/Alinan-STM32F401CCU6-Development-MicroPython-Programming/dp/B0B96YMQQP/
We use STM32F207IGT6 on some existing products, but they are hard to get hence expensive. The Pi Pico for $4 is very appealing.