sci.electronics.design | highest frequency periodic interrupt?| page 2

Reply by Gerhard Hoffmann ●January 14, 20232023-01-14

Am 14.01.23 um 19:21 schrieb John Larkin:

> Nobody has answered my question. Generalizations about software timing
> abound but hard numbers are rare. Programmers don't seem to use
> oscilloscopes much.

I did it on the BeagleBoneBlack. It has an ARM CPU to run
Debian Linux etc and two I/O processors that are 200 MHz RISCs
without pipeline stalls and operating system. The TI C compiler
is on the BBB. I can do I/O with 5ns resolution & rate.
No jitter, and directly from a C program.

volatile int i;
myportbit = 0;
....
myportbit = 1;
i = 0;
i = 0;
myportbit = 0;

would create a 15 ns wide pulse on myportbit.


Cheers, Gerhard

Reply by Lasse Langwadt Christensen ●January 14, 20232023-01-14

https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188

Reply by Don Y ●January 14, 20232023-01-14

On 1/14/2023 8:52 AM, Martin Brown wrote:
> ISR code is generally very short and best done in assembler if you want it as 
> quick as possible. Examining the code generation of GCC is worthwhile since it 
> sucks compared to Intel(better) and MS (best).

I always code ISRs in a HLL -- if only to act as pseudo-code
illustrating what the (ASM) code is actually doing.  IME, people
miss details in ASM so having those expressed in a HLL makes
it easier for them to understand the *goal* of the code.

Looking at a .S is a great starting point *if* you have to
hand-tweak the code.  Remembering that the code that gets
executed will change as the compiler is revised; ASM won't
(which can be A Good Thing as well as A Bad Thing).

> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when 
> generating Intel CPU specific SIMD code with maximum optimisation.

I'd be less worried about quality of code generator (compiler vs. human ASM)
than the effects of cache, core affinity, *which* bus(es) are called on
for each instruction, other contenders for those resources, etc.

I wrote a MT driver for ISA (1600bpi @ 100ips).  Doesn't allow much time
to actually talk to the *I/O* with the throughput available on that bus!

Better approaches (barring committing hardware to a task -- boo, hiss!)
are to decouple the time constraint from the code's execution.  E.g.,
let loops run "as fast as they can" and adjust the code to compensate,
dynamically (if there is any VARIABLE latency in an ISR, then you
likely have overlooked this, already!)

> MS compiler still does pretty stupid things like internal compiler generated 
> SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and having them 
> crossing a cache line boundary.

Advantage:  ASM.

But, only if the programmer actually understands the hardware at a level above
the programming model.  (programmers are often pretty lousy at understanding
the implications of a hardware design; engineers/coders equally so when trying
to map their knowledge onto an algorithm:  "Why doesn't it ALWAYS work?")

Reply by Don Y ●January 14, 20232023-01-14

On 1/14/2023 11:55 AM, Gerhard Hoffmann wrote:
> I did it on the BeagleBoneBlack. It has an ARM CPU to run
> Debian Linux etc and two I/O processors that are 200 MHz RISCs
> without pipeline stalls and operating system. The TI C compiler
> is on the BBB. I can do I/O with 5ns resolution & rate.
> No jitter, and directly from a C program.
> 
> volatile int i;
> myportbit = 0;
> ....
> myportbit = 1;
> i = 0;
> i = 0;
> myportbit = 0;
> 
> would create a 15 ns wide pulse on myportbit.

At the very least, you want to annotate this to indicate the
"i" assignments are being used SOLELY for their side-effects
(else someone may erroneously remove one -- OR BOTH -- of
them in a misguided attempt to "improve" the code; or, change
the type of i, etc.).

Likewise, making the ordering of the myportbit assignments
more explicit -- to ensure they aren't reordered or
removed (by an overzealous maintainer).

Of course, if no one ever sees your code but you (and, you
have a perfect memory), ...

Reply by John Larkin ●January 14, 20232023-01-14

On Sat, 14 Jan 2023 12:20:07 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 1/14/2023 8:52 AM, Martin Brown wrote:
>> ISR code is generally very short and best done in assembler if you want it as 
>> quick as possible. Examining the code generation of GCC is worthwhile since it 
>> sucks compared to Intel(better) and MS (best).
>
>I always code ISRs in a HLL -- if only to act as pseudo-code
>illustrating what the (ASM) code is actually doing.  IME, people
>miss details in ASM so having those expressed in a HLL makes
>it easier for them to understand the *goal* of the code.
>
>Looking at a .S is a great starting point *if* you have to
>hand-tweak the code.  Remembering that the code that gets
>executed will change as the compiler is revised; ASM won't
>(which can be A Good Thing as well as A Bad Thing).
>
>> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when 
>> generating Intel CPU specific SIMD code with maximum optimisation.
>
>I'd be less worried about quality of code generator (compiler vs. human ASM)
>than the effects of cache, core affinity, *which* bus(es) are called on
>for each instruction, other contenders for those resources, etc.

The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16
Kbyte cache. Cache misses will be *very* slow. So code will need to be
very tight bare-metal. The entire ISR should fit in cache.

When that gets dicey, we'll have to add an FPGA.



>
>I wrote a MT driver for ISA (1600bpi @ 100ips).  Doesn't allow much time
>to actually talk to the *I/O* with the throughput available on that bus!
>
>Better approaches (barring committing hardware to a task -- boo, hiss!)
>are to decouple the time constraint from the code's execution.  E.g.,
>let loops run "as fast as they can" and adjust the code to compensate,
>dynamically (if there is any VARIABLE latency in an ISR, then you
>likely have overlooked this, already!)

Control loops need to run at a constant rate, with a modest amount of
jitter maybe.



>
>> MS compiler still does pretty stupid things like internal compiler generated 
>> SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and having them 
>> crossing a cache line boundary.
>
>Advantage:  ASM.
>
>But, only if the programmer actually understands the hardware at a level above
>the programming model.  (programmers are often pretty lousy at understanding
>the implications of a hardware design; engineers/coders equally so when trying
>to map their knowledge onto an algorithm:  "Why doesn't it ALWAYS work?")

Reply by John Larkin ●January 14, 20232023-01-14

On Sat, 14 Jan 2023 11:07:11 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188

If I understand that, a floating add would take about 500 ns with a
133 MHz clock. That's not as bad as software float, but I wouldn't be
able to do much fp math in a 100 KHz irq.

So, scaled integers or FPGA.

Reply by Lasse Langwadt Christensen ●January 14, 20232023-01-14

l&oslash;rdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin:
> On Sat, 14 Jan 2023 12:20:07 -0700, Don Y 
> <blocked...@foo.invalid> wrote: 
> 
> >On 1/14/2023 8:52 AM, Martin Brown wrote: 
> >> ISR code is generally very short and best done in assembler if you want it as 
> >> quick as possible. Examining the code generation of GCC is worthwhile since it 
> >> sucks compared to Intel(better) and MS (best). 
> > 
> >I always code ISRs in a HLL -- if only to act as pseudo-code 
> >illustrating what the (ASM) code is actually doing. IME, people 
> >miss details in ASM so having those expressed in a HLL makes 
> >it easier for them to understand the *goal* of the code. 
> > 
> >Looking at a .S is a great starting point *if* you have to 
> >hand-tweak the code. Remembering that the code that gets 
> >executed will change as the compiler is revised; ASM won't 
> >(which can be A Good Thing as well as A Bad Thing). 
> > 
> >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when 
> >> generating Intel CPU specific SIMD code with maximum optimisation. 
> > 
> >I'd be less worried about quality of code generator (compiler vs. human ASM) 
> >than the effects of cache, core affinity, *which* bus(es) are called on 
> >for each instruction, other contenders for those resources, etc.
> The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 
> Kbyte cache. Cache misses will be *very* slow. So code will need to be 
> very tight bare-metal. The entire ISR should fit in cache. 

you can copy some (or all) of the code to ram instead of using  execute-in-place from flash

I think you can even turn off the cache to get an additional 16k ram

Reply by Lasse Langwadt Christensen ●January 14, 20232023-01-14

l&oslash;rdag den 14. januar 2023 kl. 21.15.02 UTC+1 skrev John Larkin:
> On Sat, 14 Jan 2023 11:07:11 -0800 (PST), Lasse Langwadt Christensen 
> <lang...@fonz.dk> wrote: 
> 
> >https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188 
> 
> If I understand that, a floating add would take about 500 ns with a 
> 133 MHz clock. That's not as bad as software float, but I wouldn't be 
> able to do much fp math in a 100 KHz irq. 
> 
> So, scaled integers or FPGA.

or a different MCU with an FPU,  "blackpills" are a similar formfactor and has a cortex-M4
https://www.amazon.com/Alinan-STM32F401CCU6-Development-MicroPython-Programming/dp/B0B96YMQQP/

Reply by John Larkin ●January 14, 20232023-01-14

On Sat, 14 Jan 2023 12:20:24 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>l&#4294967295;rdag den 14. januar 2023 kl. 21.09.04 UTC+1 skrev John Larkin:
>> On Sat, 14 Jan 2023 12:20:07 -0700, Don Y 
>> <blocked...@foo.invalid> wrote: 
>> 
>> >On 1/14/2023 8:52 AM, Martin Brown wrote: 
>> >> ISR code is generally very short and best done in assembler if you want it as 
>> >> quick as possible. Examining the code generation of GCC is worthwhile since it 
>> >> sucks compared to Intel(better) and MS (best). 
>> > 
>> >I always code ISRs in a HLL -- if only to act as pseudo-code 
>> >illustrating what the (ASM) code is actually doing. IME, people 
>> >miss details in ASM so having those expressed in a HLL makes 
>> >it easier for them to understand the *goal* of the code. 
>> > 
>> >Looking at a .S is a great starting point *if* you have to 
>> >hand-tweak the code. Remembering that the code that gets 
>> >executed will change as the compiler is revised; ASM won't 
>> >(which can be A Good Thing as well as A Bad Thing). 
>> > 
>> >> In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ when 
>> >> generating Intel CPU specific SIMD code with maximum optimisation. 
>> > 
>> >I'd be less worried about quality of code generator (compiler vs. human ASM) 
>> >than the effects of cache, core affinity, *which* bus(es) are called on 
>> >for each instruction, other contenders for those resources, etc.
>> The Pi Pico executes code out of the 2 Mbyte SPI flash, with a 16 
>> Kbyte cache. Cache misses will be *very* slow. So code will need to be 
>> very tight bare-metal. The entire ISR should fit in cache. 
>
>you can copy some (or all) of the code to ram instead of using  execute-in-place from flash

That's a good idea. A typical ISR could be pretty small, and let the
mainline program thrash all it likes.

>
>I think you can even turn off the cache to get an additional 16k ram

Yikes, execute out of SPI flash?

Reply by John Larkin ●January 14, 20232023-01-14

On Sat, 14 Jan 2023 12:27:03 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>l&#4294967295;rdag den 14. januar 2023 kl. 21.15.02 UTC+1 skrev John Larkin:
>> On Sat, 14 Jan 2023 11:07:11 -0800 (PST), Lasse Langwadt Christensen 
>> <lang...@fonz.dk> wrote: 
>> 
>> >https://forums.raspberrypi.com/viewtopic.php?t=308794#p1848188 
>> 
>> If I understand that, a floating add would take about 500 ns with a 
>> 133 MHz clock. That's not as bad as software float, but I wouldn't be 
>> able to do much fp math in a 100 KHz irq. 
>> 
>> So, scaled integers or FPGA.
>
>or a different MCU with an FPU,  "blackpills" are a similar formfactor and has a cortex-M4
>https://www.amazon.com/Alinan-STM32F401CCU6-Development-MicroPython-Programming/dp/B0B96YMQQP/

We use STM32F207IGT6 on some existing products, but they are hard to
get hence expensive. The Pi Pico for $4 is very appealing.

Previous 123 4 5 6 Next

highest frequency periodic interrupt?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About Electronics-Related.com

Social Networks

The Related Media Group