Electronics-Related.com
Forums

highest frequency periodic interrupt?

Started by John Larkin January 13, 2023
søndag den 15. januar 2023 kl. 15.10.17 UTC+1 skrev Dimiter Popoff:
> On 1/15/2023 12:48, Lasse Langwadt Christensen wrote: > > s&oslash;ndag den 15. januar 2023 kl. 06.10.24 UTC+1 skrev upsid...@downunder.com: > >> On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje > >> <pNaonSt...@yahoo.com> wrote: > >> > >>> On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin > >>> <jla...@highlandSNIPMEtechnology.com> wrote in > >>> <q5p3shh8f34tt34ka...@4ax.com>: > >>> > >>>> What's the fastest periodic IRQ that you have ever run? > >>>> > >>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted > >>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a > >>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock > >>>> down some to save power, so the ISR runs for about 7 usec max. > >>>> > >>>> I ask because if I use a Pi Pico on some new projects, it has a > >>>> dual-core 133 MHz CPU, and one core may have enough compute power that > >>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in > >>>> software. > >>>> > >>>> RP2040 floating point is tempting but probably too slow for control > >>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, > >>>> I guess. > >>>> > >>>> I was also thinking that we could make a 2 or 3-bit DAC with a few > >>>> resistors. The IRQ could load that at various places and a scope would > >>>> trace execution. That would look cool. On the 1758 thing we brought > >>>> out a single bit to a test point and raised that during the ISR so we > >>>> could see ISR execution time on a scope. My c guy didn't believe that > >>>> a useful ISR could run at 100K and had no idea what execution time > >>>> might be. > >>> > >>> Well in that sort of thing you need to think in asm, instruction times, > >>> but I have no experience with the RP2040, and little with ASM on ARM. > >>> Should be simple to test how long the C code takes, do you have an RP2040? > >>> Playing with one would be a good starting point. > >>> Should I get one? Was thinking just for fun... > >> In the past coding ISRs in assembly was the way to go, but the > >> complexity of current processors (cache, pipelining) makes it hard to > >> beat a _good_ compiler. > >> > >> The main principle still is to minimize the number of registers saved > >> at > >> interrupt entry (and restored at exit).On a primitive processor only > >> the processor status word and program counter needs to be saved (and > >> restored). Additional registers may need to be saved(restored if the > >> ISR uses them. > >> > >> If the processor has separate FP registers and/or separate FP status > >> words, avoid using FP registers in ISRs. > >> > >> Some compilers may have "interrupt" keywords or similar extensions and > >> the compiler knows which registers need to be saved in the ISR. To > >> help the compiler, include all functions that are called by the ISR in > >> the same module(preferably in-lined) prior to the ISR, so that the > >> compiler knows what needs to be saved. Do not call external library > >> routines from ISR, since the compiler doesn't know which registers > >> need to be saved and saves all. > > > > cortex-m automatically stack the registers needed to call a regular C function > > and if it has an FPU it supports "lazy stacking" which means it keeps track of > > whether the FPU is used and only stack/un-stack them when they are used > > > > it also knows that if another interrupt is pending at ISR exit is doesn't need to > > to un-stack/stack before calling the other interrupt > > > How many registers does it stack automatically?
eight and usually in parallel with the fetch of the ISR address and instructions so with overhead 12 cycles from interrupt to first instruction of the ISR is executed
> I knew the HLL nonsense > would catch up with CPU design eventually. Good CPU design still means > load/store machines, stacking *nothing* at IRQ, just saving PC and CCR > to special purpose regs which can be stacked as needed by the IRQ > routine, along with registers to be used in it.
"good" depending of what your objective is, automatic stacking save code space and the time it takes to fetch that code
> Memory accesses are > the bottleneck, and with HLL code being bloated as it is chances > are some cache will have to be flushed to make room for stacking. > Some *really* well designed for control applications processors allow > you to lock a part of the cache but I doubt ARM have that, they seem to > have gone the way "make programming a two click job" to target a > wider audience.
we are talking cortex-m with no real caches. The pico has a special cache to run code directly from slow external serial flash resonable speed, but you can tell the compiler to copy and keep a function in ram
s&oslash;ndag den 15. januar 2023 kl. 15.29.08 UTC+1 skrev Dimiter Popoff:
> On 1/14/2023 1:46, John Larkin wrote: > > What's the fastest periodic IRQ that you have ever run? > > > > We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted > > by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a > > PID loop, which outputs to the on-chip DAC. We cranked the CPU clock > > down some to save power, so the ISR runs for about 7 usec max. > > > > I ask because if I use a Pi Pico on some new projects, it has a > > dual-core 133 MHz CPU, and one core may have enough compute power that > > we wouldn't need an FPGA in a lot of cases. Might even do DDS in > > software. > > > > RP2040 floating point is tempting but probably too slow for control > > use. Things seem to take 50 or maybe 100 us. Back to scaled integers, > > I guess. > > > > I was also thinking that we could make a 2 or 3-bit DAC with a few > > resistors. The IRQ could load that at various places and a scope would > > trace execution. That would look cool. On the 1758 thing we brought > > out a single bit to a test point and raised that during the ISR so we > > could see ISR execution time on a scope. My c guy didn't believe that > > a useful ISR could run at 100K and had no idea what execution time > > might be. > > > 10 us for a 100+ MHz CPU should be doable; I don't know about ARM > though, they keep on surprising me with this or that nonsense. (never > used one, just by chance stumbling on that sort of thing). > What you might need to consider is that on modern day CPUs you > don't have the nice prioritized IRQ scheme you must be used to from > the CPU32; once in an interrupt you are just masked for all interrupts, > they have some priority resolver which only resolves which interrupt > will come next *after* you get unmasked.
not on a cortex-m. The interrupt controller has programmable priority for each interrupt. Higher priority interrupts interrupt lower priority interrupts. Another sub priority determines which interrupt to run first if two or more interrupts off the same priority is pending
On Sun, 15 Jan 2023 13:53:12 -0800 (PST), Lasse Langwadt Christensen
<langwadt@fonz.dk> wrote:

>s&#4294967295;ndag den 15. januar 2023 kl. 15.10.17 UTC+1 skrev Dimiter Popoff: >> On 1/15/2023 12:48, Lasse Langwadt Christensen wrote: >> > s&#4294967295;ndag den 15. januar 2023 kl. 06.10.24 UTC+1 skrev upsid...@downunder.com: >> >> On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje >> >> <pNaonSt...@yahoo.com> wrote: >> >> >> >>> On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin >> >>> <jla...@highlandSNIPMEtechnology.com> wrote in >> >>> <q5p3shh8f34tt34ka...@4ax.com>: >> >>> >> >>>> What's the fastest periodic IRQ that you have ever run? >> >>>> >> >>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >> >>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >> >>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >> >>>> down some to save power, so the ISR runs for about 7 usec max. >> >>>> >> >>>> I ask because if I use a Pi Pico on some new projects, it has a >> >>>> dual-core 133 MHz CPU, and one core may have enough compute power that >> >>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >> >>>> software. >> >>>> >> >>>> RP2040 floating point is tempting but probably too slow for control >> >>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >> >>>> I guess. >> >>>> >> >>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >> >>>> resistors. The IRQ could load that at various places and a scope would >> >>>> trace execution. That would look cool. On the 1758 thing we brought >> >>>> out a single bit to a test point and raised that during the ISR so we >> >>>> could see ISR execution time on a scope. My c guy didn't believe that >> >>>> a useful ISR could run at 100K and had no idea what execution time >> >>>> might be. >> >>> >> >>> Well in that sort of thing you need to think in asm, instruction times, >> >>> but I have no experience with the RP2040, and little with ASM on ARM. >> >>> Should be simple to test how long the C code takes, do you have an RP2040? >> >>> Playing with one would be a good starting point. >> >>> Should I get one? Was thinking just for fun... >> >> In the past coding ISRs in assembly was the way to go, but the >> >> complexity of current processors (cache, pipelining) makes it hard to >> >> beat a _good_ compiler. >> >> >> >> The main principle still is to minimize the number of registers saved >> >> at >> >> interrupt entry (and restored at exit).On a primitive processor only >> >> the processor status word and program counter needs to be saved (and >> >> restored). Additional registers may need to be saved(restored if the >> >> ISR uses them. >> >> >> >> If the processor has separate FP registers and/or separate FP status >> >> words, avoid using FP registers in ISRs. >> >> >> >> Some compilers may have "interrupt" keywords or similar extensions and >> >> the compiler knows which registers need to be saved in the ISR. To >> >> help the compiler, include all functions that are called by the ISR in >> >> the same module(preferably in-lined) prior to the ISR, so that the >> >> compiler knows what needs to be saved. Do not call external library >> >> routines from ISR, since the compiler doesn't know which registers >> >> need to be saved and saves all. >> > >> > cortex-m automatically stack the registers needed to call a regular C function >> > and if it has an FPU it supports "lazy stacking" which means it keeps track of >> > whether the FPU is used and only stack/un-stack them when they are used >> > >> > it also knows that if another interrupt is pending at ISR exit is doesn't need to >> > to un-stack/stack before calling the other interrupt >> > >> How many registers does it stack automatically? > >eight and usually in parallel with the fetch of the ISR address and instructions >so with overhead 12 cycles from interrupt to first instruction of the ISR is executed > >> I knew the HLL nonsense >> would catch up with CPU design eventually. Good CPU design still means >> load/store machines, stacking *nothing* at IRQ, just saving PC and CCR >> to special purpose regs which can be stacked as needed by the IRQ >> routine, along with registers to be used in it. > >"good" depending of what your objective is, automatic stacking save code space >and the time it takes to fetch that code > >> Memory accesses are >> the bottleneck, and with HLL code being bloated as it is chances >> are some cache will have to be flushed to make room for stacking. >> Some *really* well designed for control applications processors allow >> you to lock a part of the cache but I doubt ARM have that, they seem to >> have gone the way "make programming a two click job" to target a >> wider audience. > >we are talking cortex-m with no real caches. The pico has a special cache to >run code directly from slow external serial flash resonable speed, but you can >tell the compiler to copy and keep a function in ram
That's the thing to do: run the fast control loop on one CPU in ram and let the other CPU do the slow stuff and thrash cache.
On Sun, 15 Jan 2023 16:13:21 -0500, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Sun, 15 Jan 2023 11:16:45 -0800, John Larkin ><jlarkin@highlandSNIPMEtechnology.com> wrote: > >>On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net> >>wrote: >> >>>On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin >>><jlarkin@highlandSNIPMEtechnology.com> wrote: >>> >>>>On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje >>>><pNaonStpealmtje@yahoo.com> wrote: >>>> >>>>>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin >>>>><jlarkin@highlandSNIPMEtechnology.com> wrote in >>>>><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: >>>>> >>>>>>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >>>>>><'''newspam'''@nonad.co.uk> wrote: >>>>>> >>>>>>>On 13/01/2023 23:46, John Larkin wrote: >>>>>>>> What's the fastest periodic IRQ that you have ever run? >>>>>>> >>>>>>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>>>>>them onto some dedicated hardware. But CPUs were slower then than now. >>>>>>>> >>>>>>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>>>>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>>>>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>>>>>> down some to save power, so the ISR runs for about 7 usec max. >>>>>>>> >>>>>>>> I ask because if I use a Pi Pico on some new projects, it has a >>>>>>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>>>>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>>>>>> software. >>>>>>>> >>>>>>>> RP2040 floating point is tempting but probably too slow for control >>>>>>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>>>>>> I guess. >>>>>>> >>>>>>>It might be worth benchmarking how fast the FPU really is on that device >>>>>>>(for representative sample code). The Intel i5 & i7 can do all except >>>>>>>divide in a single cycle these days - I don't know what Arm is like in >>>>>>>this respect. You get some +*- for free close to every divide too. >>>>>> >>>>>>The RP2040 chip has FP routines in the rom, apparently code with some >>>>>>sorts of hardware assist, but it's callable subroutines and not native >>>>>>instructions to a hardware FP engine. When it returns it's done. >>>>>> >>>>>>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>>>>>does seem slow for a "fast" fp operation. We'll have to do >>>>>>experiments. >>>>>> >>>>>>I wrote one math package for the 68K, with the format signed 32.32. >>>>>>That behaved just like floating point in real life, but was small and >>>>>>fast and avoided drecky scaled integers. >>>>>> >>>>>>> >>>>>>>*BIG* time penalty for having two divides or branches too close >>>>>>>together. Worth playing around to find patterns the CPU does well. >>>>>> >>>>>>Without true hardware FP, call locations probably don't matter. >>>>>> >>>>>>> >>>>>>>Beware that what you measure gets controlled but for polynomials up to 5 >>>>>>>term or rationals up to about 5,2 call overhead may dominate the >>>>>>>execution time (particularly if the stupid compiler puts a 16byte >>>>>>>structure across a cache boundary on the stack). >>>>>> >>>>>>We occasionally use polynomials, but 2nd order and rarely 3rd is >>>>>>enough to get analog i/o close enough. >>>>>> >>>>>>> >>>>>>>Forcing inlining of small code sections can help. DO it to excess and it >>>>>>>will slow things down - there is a sweet spot. Loop unrolling is much >>>>>>>less useful these days now that branch prediction is so good. >>>>>>> >>>>>>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>>>>>> resistors. The IRQ could load that at various places and a scope would >>>>>>>> trace execution. That would look cool. On the 1758 thing we brought >>>>>>>> out a single bit to a test point and raised that during the ISR so we >>>>>>>> could see ISR execution time on a scope. My c guy didn't believe that >>>>>>>> a useful ISR could run at 100K and had no idea what execution time >>>>>>>> might be. >>>>>>> >>>>>>>ISR code is generally very short and best done in assembler if you want >>>>>>>it as quick as possible. Examining the code generation of GCC is >>>>>>>worthwhile since it sucks compared to Intel(better) and MS (best). >>>>>>> >>>>>>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>>>>>when generating Intel CPU specific SIMD code with maximum optimisation. >>>>>>> >>>>>>>MS compiler still does pretty stupid things like internal compiler >>>>>>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>>>>>having them crossing a cache line boundary. >>>>>> >>>>>>Nobody has answered my question. Generalizations about software timing >>>>>>abound but hard numbers are rare. Programmers don't seem to use >>>>>>oscilloscopes much. >>>>> >>>>>That is silly >>>>> http://panteltje.com/panteltje/pic/scope_pic/index.html >>>>> >>>>>Try reading the asm, it is well commented. >>>>>:-) >>>>> >>>>>And if you are talking Linux or other multi-taskers there is a lot more involved. >>>> >>>>I was thinking about doing closed-loop control, switching power >>>>supplies and dummy loads and such, using one core of an RP2040 instead >>>>of an FPGA. That would be coded hard-metal, no OS or RTOS. >>>> >>>>I guess I don't really need interrupts. I could run a single >>>>persistant loop that waits on a timer until it's time to compute >>>>again, to run at for instance 100 KHz. If execution time is reasonably >>>>constant, it could just loop as fast as it can; even simpler. I like >>>>that one. >>> >>>This is a very common approach, being pioneered by Bell Labs when >>>designing the first digital telephone switch, the 1ESS: >>> >>>.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System> >>> >>>The approach endures in such things as missile autopilots, but always >>>with some way to gracefully handle when the control code occasionally >>>runs too long and isn't done in time for the next frame to start. >> >>I was thinking of an endless loop that just runs compute bound as hard >>as it can. The "next frame" is the top of the loop. The control loop >>time base is whatever the average loop execution time is. >> >>As you say, no interrupt overhead. > >To be more specific, the frames effectively run at interrupt priority, >triggered by a timer interrupt, but we also run various background >tasks at user level utilizing whatever CPU is left over, if any. The >sample rate is set by controller dynamics, and going faster does not >help. Especially if FFTs are being performed over a moving window of >samples. > >Joe Gwinn
No, just one control loop runing full-blast on one of the CPUs, running in sram, and no interrupts. I don't think a power supply needs FFTS. Maybe a little lowpass filtering, but that's just a few lines of code. Or one line. The actual control loop might be a page of code.
On Sun, 15 Jan 2023 14:33:50 -0800, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Sun, 15 Jan 2023 16:13:21 -0500, Joe Gwinn <joegwinn@comcast.net> >wrote: > >>On Sun, 15 Jan 2023 11:16:45 -0800, John Larkin >><jlarkin@highlandSNIPMEtechnology.com> wrote: >> >>>On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net> >>>wrote: >>> >>>>On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin >>>><jlarkin@highlandSNIPMEtechnology.com> wrote: >>>> >>>>>On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje >>>>><pNaonStpealmtje@yahoo.com> wrote: >>>>> >>>>>>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin >>>>>><jlarkin@highlandSNIPMEtechnology.com> wrote in >>>>>><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: >>>>>> >>>>>>>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >>>>>>><'''newspam'''@nonad.co.uk> wrote: >>>>>>> >>>>>>>>On 13/01/2023 23:46, John Larkin wrote: >>>>>>>>> What's the fastest periodic IRQ that you have ever run? >>>>>>>> >>>>>>>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>>>>>>them onto some dedicated hardware. But CPUs were slower then than now. >>>>>>>>> >>>>>>>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>>>>>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>>>>>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>>>>>>> down some to save power, so the ISR runs for about 7 usec max. >>>>>>>>> >>>>>>>>> I ask because if I use a Pi Pico on some new projects, it has a >>>>>>>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>>>>>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>>>>>>> software. >>>>>>>>> >>>>>>>>> RP2040 floating point is tempting but probably too slow for control >>>>>>>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>>>>>>> I guess. >>>>>>>> >>>>>>>>It might be worth benchmarking how fast the FPU really is on that device >>>>>>>>(for representative sample code). The Intel i5 & i7 can do all except >>>>>>>>divide in a single cycle these days - I don't know what Arm is like in >>>>>>>>this respect. You get some +*- for free close to every divide too. >>>>>>> >>>>>>>The RP2040 chip has FP routines in the rom, apparently code with some >>>>>>>sorts of hardware assist, but it's callable subroutines and not native >>>>>>>instructions to a hardware FP engine. When it returns it's done. >>>>>>> >>>>>>>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>>>>>>does seem slow for a "fast" fp operation. We'll have to do >>>>>>>experiments. >>>>>>> >>>>>>>I wrote one math package for the 68K, with the format signed 32.32. >>>>>>>That behaved just like floating point in real life, but was small and >>>>>>>fast and avoided drecky scaled integers. >>>>>>> >>>>>>>> >>>>>>>>*BIG* time penalty for having two divides or branches too close >>>>>>>>together. Worth playing around to find patterns the CPU does well. >>>>>>> >>>>>>>Without true hardware FP, call locations probably don't matter. >>>>>>> >>>>>>>> >>>>>>>>Beware that what you measure gets controlled but for polynomials up to 5 >>>>>>>>term or rationals up to about 5,2 call overhead may dominate the >>>>>>>>execution time (particularly if the stupid compiler puts a 16byte >>>>>>>>structure across a cache boundary on the stack). >>>>>>> >>>>>>>We occasionally use polynomials, but 2nd order and rarely 3rd is >>>>>>>enough to get analog i/o close enough. >>>>>>> >>>>>>>> >>>>>>>>Forcing inlining of small code sections can help. DO it to excess and it >>>>>>>>will slow things down - there is a sweet spot. Loop unrolling is much >>>>>>>>less useful these days now that branch prediction is so good. >>>>>>>> >>>>>>>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>>>>>>> resistors. The IRQ could load that at various places and a scope would >>>>>>>>> trace execution. That would look cool. On the 1758 thing we brought >>>>>>>>> out a single bit to a test point and raised that during the ISR so we >>>>>>>>> could see ISR execution time on a scope. My c guy didn't believe that >>>>>>>>> a useful ISR could run at 100K and had no idea what execution time >>>>>>>>> might be. >>>>>>>> >>>>>>>>ISR code is generally very short and best done in assembler if you want >>>>>>>>it as quick as possible. Examining the code generation of GCC is >>>>>>>>worthwhile since it sucks compared to Intel(better) and MS (best). >>>>>>>> >>>>>>>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>>>>>>when generating Intel CPU specific SIMD code with maximum optimisation. >>>>>>>> >>>>>>>>MS compiler still does pretty stupid things like internal compiler >>>>>>>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>>>>>>having them crossing a cache line boundary. >>>>>>> >>>>>>>Nobody has answered my question. Generalizations about software timing >>>>>>>abound but hard numbers are rare. Programmers don't seem to use >>>>>>>oscilloscopes much. >>>>>> >>>>>>That is silly >>>>>> http://panteltje.com/panteltje/pic/scope_pic/index.html >>>>>> >>>>>>Try reading the asm, it is well commented. >>>>>>:-) >>>>>> >>>>>>And if you are talking Linux or other multi-taskers there is a lot more involved. >>>>> >>>>>I was thinking about doing closed-loop control, switching power >>>>>supplies and dummy loads and such, using one core of an RP2040 instead >>>>>of an FPGA. That would be coded hard-metal, no OS or RTOS. >>>>> >>>>>I guess I don't really need interrupts. I could run a single >>>>>persistant loop that waits on a timer until it's time to compute >>>>>again, to run at for instance 100 KHz. If execution time is reasonably >>>>>constant, it could just loop as fast as it can; even simpler. I like >>>>>that one. >>>> >>>>This is a very common approach, being pioneered by Bell Labs when >>>>designing the first digital telephone switch, the 1ESS: >>>> >>>>.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System> >>>> >>>>The approach endures in such things as missile autopilots, but always >>>>with some way to gracefully handle when the control code occasionally >>>>runs too long and isn't done in time for the next frame to start. >>> >>>I was thinking of an endless loop that just runs compute bound as hard >>>as it can. The "next frame" is the top of the loop. The control loop >>>time base is whatever the average loop execution time is. >>> >>>As you say, no interrupt overhead. >> >>To be more specific, the frames effectively run at interrupt priority, >>triggered by a timer interrupt, but we also run various background >>tasks at user level utilizing whatever CPU is left over, if any. The >>sample rate is set by controller dynamics, and going faster does not >>help. Especially if FFTs are being performed over a moving window of >>samples. >> >>Joe Gwinn > >No, just one control loop runing full-blast on one of the CPUs, >running in sram, and no interrupts. > >I don't think a power supply needs FFTS. Maybe a little lowpass >filtering, but that's just a few lines of code. Or one line.
Probably so. I was thinking radars and missile autopilots. Generally, the FFTs (or anything lengthy) are not done at interrupt level. The interrupt code grabs and stores the data in ram, sets a flag to release the user level code doing the signal processing, and then exits the interrupt. Whereupon the user level code commences running the signal processing code. Otherwise, the system could not respond to important but rare interrupts.
>The actual control loop might be a page of code.
Could be. What I've seen the power-supply folk do is to use SPICE to tweak the PS's control law, which is generally implemented in a FIR filter. IIR filters are feared because they can become unstable, especially in the somewhat wild environment of a power supply Joe Gwinn.
John Larkin <jlarkin@highlandsnipmetechnology.com> wrote:
> What's the fastest periodic IRQ that you have ever run? > > We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted > by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a > PID loop, which outputs to the on-chip DAC. We cranked the CPU clock > down some to save power, so the ISR runs for about 7 usec max. > > I ask because if I use a Pi Pico on some new projects, it has a > dual-core 133 MHz CPU, and one core may have enough compute power that > we wouldn't need an FPGA in a lot of cases. Might even do DDS in > software. > > RP2040 floating point is tempting but probably too slow for control > use. Things seem to take 50 or maybe 100 us. Back to scaled integers, > I guess. > > I was also thinking that we could make a 2 or 3-bit DAC with a few > resistors. The IRQ could load that at various places and a scope would > trace execution. That would look cool. On the 1758 thing we brought > out a single bit to a test point and raised that during the ISR so we > could see ISR execution time on a scope. My c guy didn't believe that > a useful ISR could run at 100K and had no idea what execution time > might be.
Not exactly periodic but I did 2Mb/s interrupt driven bi-directional serial communication. That is about 5uS between characters and there were 2 interrupts per character (one to receive, the other to transmit answer). In other words, about 400kHz inerrupt rate. That was on STM32F103 running at 72 MHz (that is Cortex M3). I also tried 3Mb/s, but apparently that was too much for USB bus in PC (standard 12Mb/s port). Concerning interrupt overhead, for STM32F030 running code from RAM overhead seem to be between 26-28 clocks. More precisely, I had very simple interrupt handler that just increments a variable (millisecond counter). "Work" part of the interrupt handler should execute in 7 clocks. When I timed busy loop interrupt increased execution time of the loop by 33-35 clocks. That agrees reasonably well with cycle counts for Cortex-M0 published in ARM forums: 16 clocks to enter to interrupt handler and 12 clocks to get back to main program. Processor in Pi Pico is Cortex-M0+ which is supposed to take 15 clocks to enter to interrupt handler. So you can expect 1 clock less overhead than for Cortex-M0. Concerning useful procedures, there is a lot of things which can slow down the code. For example read-modify-write cycle on I/O port is likely to insert some extra wait states. Most MCU-s execute code from flash, and usually flash can not run at max CPU speed so there are extra wait states. For example Cortex-M4 running from one RAM bank and having stack in separate RAM bank can do interrupt like above in 27-28 clocks, so overhead probably is 20-21 cycles (I write probably because Cortex-M4 has complex rules concerning instruction times so I am not sure if interrupt handler takes 7 clocks). But different configuration can brings time up to 42-48 clocks. Cortex-M3 (which should have very close times to Cortex-M4) running from flash with 0 wait states (8MHz clock) needs 24 clocks to execute interrupt handler, but with 2 wait states (needed to run at 72MHz) needs 29 to 31 clocks and more when there are more wait states. RP2040 in Pi Pico normally runs form RAM, so should be free from slowdown due to flash. But with two cores and several DMA channels there may be bus contention. Still, interrups rates of order 1M/s should not be a problem. -- Waldek Hebisch
On Sun, 15 Jan 2023 18:21:12 -0500, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Sun, 15 Jan 2023 14:33:50 -0800, John Larkin ><jlarkin@highlandSNIPMEtechnology.com> wrote: > >>On Sun, 15 Jan 2023 16:13:21 -0500, Joe Gwinn <joegwinn@comcast.net> >>wrote: >> >>>On Sun, 15 Jan 2023 11:16:45 -0800, John Larkin >>><jlarkin@highlandSNIPMEtechnology.com> wrote: >>> >>>>On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net> >>>>wrote: >>>> >>>>>On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin >>>>><jlarkin@highlandSNIPMEtechnology.com> wrote: >>>>> >>>>>>On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje >>>>>><pNaonStpealmtje@yahoo.com> wrote: >>>>>> >>>>>>>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin >>>>>>><jlarkin@highlandSNIPMEtechnology.com> wrote in >>>>>>><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: >>>>>>> >>>>>>>>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >>>>>>>><'''newspam'''@nonad.co.uk> wrote: >>>>>>>> >>>>>>>>>On 13/01/2023 23:46, John Larkin wrote: >>>>>>>>>> What's the fastest periodic IRQ that you have ever run? >>>>>>>>> >>>>>>>>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>>>>>>>them onto some dedicated hardware. But CPUs were slower then than now. >>>>>>>>>> >>>>>>>>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>>>>>>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>>>>>>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>>>>>>>> down some to save power, so the ISR runs for about 7 usec max. >>>>>>>>>> >>>>>>>>>> I ask because if I use a Pi Pico on some new projects, it has a >>>>>>>>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>>>>>>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>>>>>>>> software. >>>>>>>>>> >>>>>>>>>> RP2040 floating point is tempting but probably too slow for control >>>>>>>>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>>>>>>>> I guess. >>>>>>>>> >>>>>>>>>It might be worth benchmarking how fast the FPU really is on that device >>>>>>>>>(for representative sample code). The Intel i5 & i7 can do all except >>>>>>>>>divide in a single cycle these days - I don't know what Arm is like in >>>>>>>>>this respect. You get some +*- for free close to every divide too. >>>>>>>> >>>>>>>>The RP2040 chip has FP routines in the rom, apparently code with some >>>>>>>>sorts of hardware assist, but it's callable subroutines and not native >>>>>>>>instructions to a hardware FP engine. When it returns it's done. >>>>>>>> >>>>>>>>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>>>>>>>does seem slow for a "fast" fp operation. We'll have to do >>>>>>>>experiments. >>>>>>>> >>>>>>>>I wrote one math package for the 68K, with the format signed 32.32. >>>>>>>>That behaved just like floating point in real life, but was small and >>>>>>>>fast and avoided drecky scaled integers. >>>>>>>> >>>>>>>>> >>>>>>>>>*BIG* time penalty for having two divides or branches too close >>>>>>>>>together. Worth playing around to find patterns the CPU does well. >>>>>>>> >>>>>>>>Without true hardware FP, call locations probably don't matter. >>>>>>>> >>>>>>>>> >>>>>>>>>Beware that what you measure gets controlled but for polynomials up to 5 >>>>>>>>>term or rationals up to about 5,2 call overhead may dominate the >>>>>>>>>execution time (particularly if the stupid compiler puts a 16byte >>>>>>>>>structure across a cache boundary on the stack). >>>>>>>> >>>>>>>>We occasionally use polynomials, but 2nd order and rarely 3rd is >>>>>>>>enough to get analog i/o close enough. >>>>>>>> >>>>>>>>> >>>>>>>>>Forcing inlining of small code sections can help. DO it to excess and it >>>>>>>>>will slow things down - there is a sweet spot. Loop unrolling is much >>>>>>>>>less useful these days now that branch prediction is so good. >>>>>>>>> >>>>>>>>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>>>>>>>> resistors. The IRQ could load that at various places and a scope would >>>>>>>>>> trace execution. That would look cool. On the 1758 thing we brought >>>>>>>>>> out a single bit to a test point and raised that during the ISR so we >>>>>>>>>> could see ISR execution time on a scope. My c guy didn't believe that >>>>>>>>>> a useful ISR could run at 100K and had no idea what execution time >>>>>>>>>> might be. >>>>>>>>> >>>>>>>>>ISR code is generally very short and best done in assembler if you want >>>>>>>>>it as quick as possible. Examining the code generation of GCC is >>>>>>>>>worthwhile since it sucks compared to Intel(better) and MS (best). >>>>>>>>> >>>>>>>>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>>>>>>>when generating Intel CPU specific SIMD code with maximum optimisation. >>>>>>>>> >>>>>>>>>MS compiler still does pretty stupid things like internal compiler >>>>>>>>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>>>>>>>having them crossing a cache line boundary. >>>>>>>> >>>>>>>>Nobody has answered my question. Generalizations about software timing >>>>>>>>abound but hard numbers are rare. Programmers don't seem to use >>>>>>>>oscilloscopes much. >>>>>>> >>>>>>>That is silly >>>>>>> http://panteltje.com/panteltje/pic/scope_pic/index.html >>>>>>> >>>>>>>Try reading the asm, it is well commented. >>>>>>>:-) >>>>>>> >>>>>>>And if you are talking Linux or other multi-taskers there is a lot more involved. >>>>>> >>>>>>I was thinking about doing closed-loop control, switching power >>>>>>supplies and dummy loads and such, using one core of an RP2040 instead >>>>>>of an FPGA. That would be coded hard-metal, no OS or RTOS. >>>>>> >>>>>>I guess I don't really need interrupts. I could run a single >>>>>>persistant loop that waits on a timer until it's time to compute >>>>>>again, to run at for instance 100 KHz. If execution time is reasonably >>>>>>constant, it could just loop as fast as it can; even simpler. I like >>>>>>that one. >>>>> >>>>>This is a very common approach, being pioneered by Bell Labs when >>>>>designing the first digital telephone switch, the 1ESS: >>>>> >>>>>.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System> >>>>> >>>>>The approach endures in such things as missile autopilots, but always >>>>>with some way to gracefully handle when the control code occasionally >>>>>runs too long and isn't done in time for the next frame to start. >>>> >>>>I was thinking of an endless loop that just runs compute bound as hard >>>>as it can. The "next frame" is the top of the loop. The control loop >>>>time base is whatever the average loop execution time is. >>>> >>>>As you say, no interrupt overhead. >>> >>>To be more specific, the frames effectively run at interrupt priority, >>>triggered by a timer interrupt, but we also run various background >>>tasks at user level utilizing whatever CPU is left over, if any. The >>>sample rate is set by controller dynamics, and going faster does not >>>help. Especially if FFTs are being performed over a moving window of >>>samples. >>> >>>Joe Gwinn >> >>No, just one control loop runing full-blast on one of the CPUs, >>running in sram, and no interrupts. >> >>I don't think a power supply needs FFTS. Maybe a little lowpass >>filtering, but that's just a few lines of code. Or one line. > >Probably so. I was thinking radars and missile autopilots. > >Generally, the FFTs (or anything lengthy) are not done at interrupt >level. The interrupt code grabs and stores the data in ram, sets a >flag to release the user level code doing the signal processing, and >then exits the interrupt. Whereupon the user level code commences >running the signal processing code. Otherwise, the system could not >respond to important but rare interrupts. > > >>The actual control loop might be a page of code. > >Could be. What I've seen the power-supply folk do is to use SPICE to >tweak the PS's control law, which is generally implemented in a FIR >filter. IIR filters are feared because they can become unstable, >especially in the somewhat wild environment of a power supply > >Joe Gwinn.
A proportional+integral error amplifier is all that most power supplies need. That is easily Spiced and then easily turned into a few lines of code. An integrator is of course IIR. A FIR filter has a finite gain hence some DC error. I'm not afraid of integrators!
On Sun, 15 Jan 2023 23:37:38 -0000 (UTC), antispam@math.uni.wroc.pl
wrote:

>John Larkin <jlarkin@highlandsnipmetechnology.com> wrote: >> What's the fastest periodic IRQ that you have ever run? >> >> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >> down some to save power, so the ISR runs for about 7 usec max. >> >> I ask because if I use a Pi Pico on some new projects, it has a >> dual-core 133 MHz CPU, and one core may have enough compute power that >> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >> software. >> >> RP2040 floating point is tempting but probably too slow for control >> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >> I guess. >> >> I was also thinking that we could make a 2 or 3-bit DAC with a few >> resistors. The IRQ could load that at various places and a scope would >> trace execution. That would look cool. On the 1758 thing we brought >> out a single bit to a test point and raised that during the ISR so we >> could see ISR execution time on a scope. My c guy didn't believe that >> a useful ISR could run at 100K and had no idea what execution time >> might be. > >Not exactly periodic but I did 2Mb/s interrupt driven bi-directional >serial communication. That is about 5uS between characters and >there were 2 interrupts per character (one to receive, the other >to transmit answer). In other words, about 400kHz inerrupt rate. >That was on STM32F103 running at 72 MHz (that is Cortex M3). I also >tried 3Mb/s, but apparently that was too much for USB bus in PC >(standard 12Mb/s port). > >Concerning interrupt overhead, for STM32F030 running code from >RAM overhead seem to be between 26-28 clocks. More precisely, >I had very simple interrupt handler that just increments a variable >(millisecond counter). "Work" part of the interrupt handler >should execute in 7 clocks. When I timed busy loop interrupt >increased execution time of the loop by 33-35 clocks. That >agrees reasonably well with cycle counts for Cortex-M0 published >in ARM forums: 16 clocks to enter to interrupt handler and 12 >clocks to get back to main program. Processor in Pi Pico is >Cortex-M0+ which is supposed to take 15 clocks to enter to interrupt >handler. So you can expect 1 clock less overhead than for Cortex-M0. > >Concerning useful procedures, there is a lot of things which can >slow down the code. For example read-modify-write cycle on I/O >port is likely to insert some extra wait states. Most MCU-s >execute code from flash, and usually flash can not run at max >CPU speed so there are extra wait states. For example Cortex-M4 >running from one RAM bank and having stack in separate RAM bank >can do interrupt like above in 27-28 clocks, so overhead probably >is 20-21 cycles (I write probably because Cortex-M4 has complex >rules concerning instruction times so I am not sure if interrupt >handler takes 7 clocks). But different configuration can brings >time up to 42-48 clocks. Cortex-M3 (which should have very close >times to Cortex-M4) running from flash with 0 wait states (8MHz >clock) needs 24 clocks to execute interrupt handler, but with >2 wait states (needed to run at 72MHz) needs 29 to 31 clocks >and more when there are more wait states. > >RP2040 in Pi Pico normally runs form RAM, so should be free >from slowdown due to flash. But with two cores and several >DMA channels there may be bus contention. Still, interrups >rates of order 1M/s should not be a problem.
Sound like roughly 200 ns of overhead, interrupt entry and exit, on the 133 MHz pico. That's not bad for a 100 KHz interrupt. I probably don't even need 100 KHz for a power supply control loop. A 1 ms step response would be fine. It would be fun to do a DDS in software, for an AC supply.
On Monday, January 16, 2023 at 1:26:03 PM UTC+11, John Larkin wrote:
> On Sun, 15 Jan 2023 23:37:38 -0000 (UTC), anti...@math.uni.wroc.pl > wrote: > >John Larkin <jla...@highlandsnipmetechnology.com> wrote:
<snip>
> >RP2040 in Pi Pico normally runs form RAM, so should be free > >from slowdown due to flash. But with two cores and several > >DMA channels there may be bus contention. Still, interrups > >rates of order 1M/s should not be a problem. > > Sound like roughly 200 ns of overhead, interrupt entry and exit, on > the 133 MHz pico. That's not bad for a 100 KHz interrupt. > > I probably don't even need 100 KHz for a power supply control loop. A > 1 ms step response would be fine. > > It would be fun to do a DDS in software, for an AC supply.
Sort of off the point through. An AC supply has to deliver a sine-wave voltage while looking like a low impedance source to the load. You can use software to calculate what that voltage ought to be (which is what direct digital synthesis is all about) but the switching arrangements that connected a more or less stable DC voltage source to the load and delivered the desired voltage, and the currents required to sustain that voltage, might require quite a lot of fast processing capacity to let them create the desired effect, without wasting a lot of power in the process It wouldn't look much like a regular DDS source. -- Bill Sloman, Sydney
On 1/15/2023 7:29 AM, Dimiter_Popoff wrote:
> 10 us for a 100+ MHz CPU should be doable;
I was doing 6us on a 386 running with I/O on the ISA bus (where I/O accesses were dreadfully slow!)
> I don't know about ARM > though, they keep on surprising me with this or that nonsense. (never > used one, just by chance stumbling on that sort of thing). > What you might need to consider is that on modern day CPUs you > don't have the nice prioritized IRQ scheme you must be used to from > the CPU32; once in an interrupt you are just masked for all interrupts, > they have some priority resolver which only resolves which interrupt > will come next *after* you get unmasked. Some I have used have a
No, there is an NVIC (nested vectored interrupt controller) that allows the software to decide what priorities to assign to each interrupt source. It then decides if it can preempt a "lower" priority interrupt being serviced at the present time, returning to that ISR after the higher priority ISR executes. This is almost a requirement as many of the ARMs have *scores* (50+) of onboard interrupt sources; you wouldn't want to have to poll them *in* an ISR to decide if you wanted to reenable the interrupt. (Of course, if you map two sources to the same priority level, then you will have to rewsolve the conflict at run-time.) And, of course, some sources have fixed, immutable priorities. You can also configure the NVIC to wake the processor (if in a "sleep" mode) when one of these events are detected. And, to return to sleep after servicing the IRQ. [And, of course, you can wait FOR an interrupt...] The bigger problem is knowing, at design time, how these IRQs can stack up as time spent in the (nested/cascaded) ISRs is time that isn't spent running in "Thread Mode". [I designed a product that could spend 100.0% of its time in an ISR, in response to (unconstrainable) user actions. But, when the user eventually got tired of trying to f*ck the system up, it would silently recover where it had left off.]
> second, higher priority IRQ (like the 6809 FIRQ) but on the core I have > used they differ from the 6809-s FIRQ in that the errata sheet says > they don't work. > On load/store machines latency should be less of an issue for the > jitter you will get as long as you don't do division in your code to > be interrupted.
Latency matters to the current ISR but is defined/influenced by factors OUTSIDE that ISR. E.g., if a higher priority ISR is executing (or, the processor is in a critical region), then servicing THIS IRQ will be delayed. This delay may not be predictable.
> Make sure you look into the FPU you'd consider deep enough, none > will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or > so (can't remember exactly now) mantissa, the rest goes on the > exponent. I have found 32 bit FP numbers convenient to store some > constants (on the core I use the load is 1 cycle, expanding > automatically to 64 bit), did not find any other use for those. > > Finally, to give you some numbers :). Back during the 80-s I wrote > a floppy disk controller for the 765 on a 1 MHz 6809. It had about > 10 us per byte IIRC; doing IRQ was out of question. But the 6809 > had a "sync" opcode, if IRQs were masked it would stop and wait > for an IRQ; and would just resume execution once the line was pulled. > This worked for the fastest of floppies (5" HD), so perhaps you > can use a 6809 :D. (I may have one or two somewhere here, 2 MHz > ones at that - in DIP40....).
A lot depends on what you have to *do* in the ISR. If all you are doing is grabbing a value from an I/O port and stuffing it in a FIFO, then you have very little code to execute *in* the ISR and your ISR can be really short and responsive. (My MT driver was like that; read the available byte from the i/f, stuff it in a FIFO and done -- 160KHz. It would be silly to try to do the ECC in the ISR; just let the background process handle that at its leisure -- and, repeat the operation, if needed -- e.g., READ REVERSE or BACKSPACE, READ FORWARD, depending on the next scheduled transport operation in the queue)