Electronics-Related.com
Forums

highest frequency periodic interrupt?

Started by John Larkin January 13, 2023
søndag den 15. januar 2023 kl. 06.10.24 UTC+1 skrev upsid...@downunder.com:
> On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje > <pNaonSt...@yahoo.com> wrote: > > >On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin > ><jla...@highlandSNIPMEtechnology.com> wrote in > ><q5p3shh8f34tt34ka...@4ax.com>: > > > >>What's the fastest periodic IRQ that you have ever run? > >> > >>We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted > >>by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a > >>PID loop, which outputs to the on-chip DAC. We cranked the CPU clock > >>down some to save power, so the ISR runs for about 7 usec max. > >> > >>I ask because if I use a Pi Pico on some new projects, it has a > >>dual-core 133 MHz CPU, and one core may have enough compute power that > >>we wouldn't need an FPGA in a lot of cases. Might even do DDS in > >>software. > >> > >>RP2040 floating point is tempting but probably too slow for control > >>use. Things seem to take 50 or maybe 100 us. Back to scaled integers, > >>I guess. > >> > >>I was also thinking that we could make a 2 or 3-bit DAC with a few > >>resistors. The IRQ could load that at various places and a scope would > >>trace execution. That would look cool. On the 1758 thing we brought > >>out a single bit to a test point and raised that during the ISR so we > >>could see ISR execution time on a scope. My c guy didn't believe that > >>a useful ISR could run at 100K and had no idea what execution time > >>might be. > > > >Well in that sort of thing you need to think in asm, instruction times, > >but I have no experience with the RP2040, and little with ASM on ARM. > >Should be simple to test how long the C code takes, do you have an RP2040? > >Playing with one would be a good starting point. > >Should I get one? Was thinking just for fun... > In the past coding ISRs in assembly was the way to go, but the > complexity of current processors (cache, pipelining) makes it hard to > beat a _good_ compiler. > > The main principle still is to minimize the number of registers saved > at > interrupt entry (and restored at exit).On a primitive processor only > the processor status word and program counter needs to be saved (and > restored). Additional registers may need to be saved(restored if the > ISR uses them. > > If the processor has separate FP registers and/or separate FP status > words, avoid using FP registers in ISRs. > > Some compilers may have "interrupt" keywords or similar extensions and > the compiler knows which registers need to be saved in the ISR. To > help the compiler, include all functions that are called by the ISR in > the same module(preferably in-lined) prior to the ISR, so that the > compiler knows what needs to be saved. Do not call external library > routines from ISR, since the compiler doesn't know which registers > need to be saved and saves all.
cortex-m automatically stack the registers needed to call a regular C function and if it has an FPU it supports "lazy stacking" which means it keeps track of whether the FPU is used and only stack/un-stack them when they are used it also knows that if another interrupt is pending at ISR exit is doesn't need to to un-stack/stack before calling the other interrupt
On 1/15/2023 12:48, Lasse Langwadt Christensen wrote:
> s&oslash;ndag den 15. januar 2023 kl. 06.10.24 UTC+1 skrev upsid...@downunder.com: >> On Sat, 14 Jan 2023 04:47:22 GMT, Jan Panteltje >> <pNaonSt...@yahoo.com> wrote: >> >>> On a sunny day (Fri, 13 Jan 2023 15:46:16 -0800) it happened John Larkin >>> <jla...@highlandSNIPMEtechnology.com> wrote in >>> <q5p3shh8f34tt34ka...@4ax.com>: >>> >>>> What's the fastest periodic IRQ that you have ever run? >>>> >>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>> down some to save power, so the ISR runs for about 7 usec max. >>>> >>>> I ask because if I use a Pi Pico on some new projects, it has a >>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>> software. >>>> >>>> RP2040 floating point is tempting but probably too slow for control >>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>> I guess. >>>> >>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>> resistors. The IRQ could load that at various places and a scope would >>>> trace execution. That would look cool. On the 1758 thing we brought >>>> out a single bit to a test point and raised that during the ISR so we >>>> could see ISR execution time on a scope. My c guy didn't believe that >>>> a useful ISR could run at 100K and had no idea what execution time >>>> might be. >>> >>> Well in that sort of thing you need to think in asm, instruction times, >>> but I have no experience with the RP2040, and little with ASM on ARM. >>> Should be simple to test how long the C code takes, do you have an RP2040? >>> Playing with one would be a good starting point. >>> Should I get one? Was thinking just for fun... >> In the past coding ISRs in assembly was the way to go, but the >> complexity of current processors (cache, pipelining) makes it hard to >> beat a _good_ compiler. >> >> The main principle still is to minimize the number of registers saved >> at >> interrupt entry (and restored at exit).On a primitive processor only >> the processor status word and program counter needs to be saved (and >> restored). Additional registers may need to be saved(restored if the >> ISR uses them. >> >> If the processor has separate FP registers and/or separate FP status >> words, avoid using FP registers in ISRs. >> >> Some compilers may have "interrupt" keywords or similar extensions and >> the compiler knows which registers need to be saved in the ISR. To >> help the compiler, include all functions that are called by the ISR in >> the same module(preferably in-lined) prior to the ISR, so that the >> compiler knows what needs to be saved. Do not call external library >> routines from ISR, since the compiler doesn't know which registers >> need to be saved and saves all. > > cortex-m automatically stack the registers needed to call a regular C function > and if it has an FPU it supports "lazy stacking" which means it keeps track of > whether the FPU is used and only stack/un-stack them when they are used > > it also knows that if another interrupt is pending at ISR exit is doesn't need to > to un-stack/stack before calling the other interrupt >
How many registers does it stack automatically? I knew the HLL nonsense would catch up with CPU design eventually. Good CPU design still means load/store machines, stacking *nothing* at IRQ, just saving PC and CCR to special purpose regs which can be stacked as needed by the IRQ routine, along with registers to be used in it. Memory accesses are the bottleneck, and with HLL code being bloated as it is chances are some cache will have to be flushed to make room for stacking. Some *really* well designed for control applications processors allow you to lock a part of the cache but I doubt ARM have that, they seem to have gone the way "make programming a two click job" to target a wider audience.
On 1/14/2023 1:46, John Larkin wrote:
> What's the fastest periodic IRQ that you have ever run? > > We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted > by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a > PID loop, which outputs to the on-chip DAC. We cranked the CPU clock > down some to save power, so the ISR runs for about 7 usec max. > > I ask because if I use a Pi Pico on some new projects, it has a > dual-core 133 MHz CPU, and one core may have enough compute power that > we wouldn't need an FPGA in a lot of cases. Might even do DDS in > software. > > RP2040 floating point is tempting but probably too slow for control > use. Things seem to take 50 or maybe 100 us. Back to scaled integers, > I guess. > > I was also thinking that we could make a 2 or 3-bit DAC with a few > resistors. The IRQ could load that at various places and a scope would > trace execution. That would look cool. On the 1758 thing we brought > out a single bit to a test point and raised that during the ISR so we > could see ISR execution time on a scope. My c guy didn't believe that > a useful ISR could run at 100K and had no idea what execution time > might be. >
10 us for a 100+ MHz CPU should be doable; I don't know about ARM though, they keep on surprising me with this or that nonsense. (never used one, just by chance stumbling on that sort of thing). What you might need to consider is that on modern day CPUs you don't have the nice prioritized IRQ scheme you must be used to from the CPU32; once in an interrupt you are just masked for all interrupts, they have some priority resolver which only resolves which interrupt will come next *after* you get unmasked. Some I have used have a second, higher priority IRQ (like the 6809 FIRQ) but on the core I have used they differ from the 6809-s FIRQ in that the errata sheet says they don't work. On load/store machines latency should be less of an issue for the jitter you will get as long as you don't do division in your code to be interrupted. Make sure you look into the FPU you'd consider deep enough, none will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or so (can't remember exactly now) mantissa, the rest goes on the exponent. I have found 32 bit FP numbers convenient to store some constants (on the core I use the load is 1 cycle, expanding automatically to 64 bit), did not find any other use for those. Finally, to give you some numbers :). Back during the 80-s I wrote a floppy disk controller for the 765 on a 1 MHz 6809. It had about 10 us per byte IIRC; doing IRQ was out of question. But the 6809 had a "sync" opcode, if IRQs were masked it would stop and wait for an IRQ; and would just resume execution once the line was pulled. This worked for the fastest of floppies (5" HD), so perhaps you can use a 6809 :D. (I may have one or two somewhere here, 2 MHz ones at that - in DIP40....). ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje
<pNaonStpealmtje@yahoo.com> wrote:

>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin ><jlarkin@highlandSNIPMEtechnology.com> wrote in ><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: > >>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >><'''newspam'''@nonad.co.uk> wrote: >> >>>On 13/01/2023 23:46, John Larkin wrote: >>>> What's the fastest periodic IRQ that you have ever run? >>> >>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>them onto some dedicated hardware. But CPUs were slower then than now. >>>> >>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>> down some to save power, so the ISR runs for about 7 usec max. >>>> >>>> I ask because if I use a Pi Pico on some new projects, it has a >>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>> software. >>>> >>>> RP2040 floating point is tempting but probably too slow for control >>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>> I guess. >>> >>>It might be worth benchmarking how fast the FPU really is on that device >>>(for representative sample code). The Intel i5 & i7 can do all except >>>divide in a single cycle these days - I don't know what Arm is like in >>>this respect. You get some +*- for free close to every divide too. >> >>The RP2040 chip has FP routines in the rom, apparently code with some >>sorts of hardware assist, but it's callable subroutines and not native >>instructions to a hardware FP engine. When it returns it's done. >> >>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>does seem slow for a "fast" fp operation. We'll have to do >>experiments. >> >>I wrote one math package for the 68K, with the format signed 32.32. >>That behaved just like floating point in real life, but was small and >>fast and avoided drecky scaled integers. >> >>> >>>*BIG* time penalty for having two divides or branches too close >>>together. Worth playing around to find patterns the CPU does well. >> >>Without true hardware FP, call locations probably don't matter. >> >>> >>>Beware that what you measure gets controlled but for polynomials up to 5 >>>term or rationals up to about 5,2 call overhead may dominate the >>>execution time (particularly if the stupid compiler puts a 16byte >>>structure across a cache boundary on the stack). >> >>We occasionally use polynomials, but 2nd order and rarely 3rd is >>enough to get analog i/o close enough. >> >>> >>>Forcing inlining of small code sections can help. DO it to excess and it >>>will slow things down - there is a sweet spot. Loop unrolling is much >>>less useful these days now that branch prediction is so good. >>> >>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>> resistors. The IRQ could load that at various places and a scope would >>>> trace execution. That would look cool. On the 1758 thing we brought >>>> out a single bit to a test point and raised that during the ISR so we >>>> could see ISR execution time on a scope. My c guy didn't believe that >>>> a useful ISR could run at 100K and had no idea what execution time >>>> might be. >>> >>>ISR code is generally very short and best done in assembler if you want >>>it as quick as possible. Examining the code generation of GCC is >>>worthwhile since it sucks compared to Intel(better) and MS (best). >>> >>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>when generating Intel CPU specific SIMD code with maximum optimisation. >>> >>>MS compiler still does pretty stupid things like internal compiler >>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>having them crossing a cache line boundary. >> >>Nobody has answered my question. Generalizations about software timing >>abound but hard numbers are rare. Programmers don't seem to use >>oscilloscopes much. > >That is silly > http://panteltje.com/panteltje/pic/scope_pic/index.html > >Try reading the asm, it is well commented. >:-) > >And if you are talking Linux or other multi-taskers there is a lot more involved.
I was thinking about doing closed-loop control, switching power supplies and dummy loads and such, using one core of an RP2040 instead of an FPGA. That would be coded hard-metal, no OS or RTOS. I guess I don't really need interrupts. I could run a single persistant loop that waits on a timer until it's time to compute again, to run at for instance 100 KHz. If execution time is reasonably constant, it could just loop as fast as it can; even simpler. I like that one.
> >I was amazed about the other thread about logic analyzers. >Why did I never need one for my code / projects? >All you need is a scope... especially an analog one, digital ones are liars!
I've never used a logic analyzer; they look hard to connect, especially into a single-chip uP. But color digital scopes rock.
> >If you have no clue then having a hall full of equipment does not give you one! > >mm >did I use a scope for any of this? > http://panteltje.com/panteltje/newsflex/download.html >I only have a 10 MHz analog dual trace one!
My usual scope is a 500 MHz 4-channel Rigol. And an old Tek 11802 sampler for the fast stuff and TDR. I have a 40 GHz plugin.
On Sun, 15 Jan 2023 16:29:00 +0200, Dimiter_Popoff <dp@tgi-sci.com>
wrote:

>On 1/14/2023 1:46, John Larkin wrote: >> What's the fastest periodic IRQ that you have ever run? >> >> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >> down some to save power, so the ISR runs for about 7 usec max. >> >> I ask because if I use a Pi Pico on some new projects, it has a >> dual-core 133 MHz CPU, and one core may have enough compute power that >> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >> software. >> >> RP2040 floating point is tempting but probably too slow for control >> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >> I guess. >> >> I was also thinking that we could make a 2 or 3-bit DAC with a few >> resistors. The IRQ could load that at various places and a scope would >> trace execution. That would look cool. On the 1758 thing we brought >> out a single bit to a test point and raised that during the ISR so we >> could see ISR execution time on a scope. My c guy didn't believe that >> a useful ISR could run at 100K and had no idea what execution time >> might be. >> > >10 us for a 100+ MHz CPU should be doable; I don't know about ARM >though, they keep on surprising me with this or that nonsense. (never >used one, just by chance stumbling on that sort of thing). >What you might need to consider is that on modern day CPUs you >don't have the nice prioritized IRQ scheme you must be used to from >the CPU32; once in an interrupt you are just masked for all interrupts, >they have some priority resolver which only resolves which interrupt >will come next *after* you get unmasked. Some I have used have a >second, higher priority IRQ (like the 6809 FIRQ) but on the core I have >used they differ from the 6809-s FIRQ in that the errata sheet says >they don't work.
I'll be doing single-function bare-metal control, like a power supply for example, on a dedicated CPU core. The only interrupt will be a periodic timer, or maybe an ADC that digitizes a few channels and then interrupts. I'd like the power supply to be a mosfet half-bridge and an ADC to digitize output voltage and current, and code to close the voltage and current limit loops. I could use a uP timer to make the PWM into the half-bridge. Possibly go full-bridge and have a bipolar supply. I'm just considering new product possibilities now; none of this may ever happen. Raspberry Pi Pico is sort of a solution looking for a problem, bottom-up design.
>On load/store machines latency should be less of an issue for the >jitter you will get as long as you don't do division in your code to >be interrupted. >Make sure you look into the FPU you'd consider deep enough, none >will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or >so (can't remember exactly now) mantissa, the rest goes on the >exponent. I have found 32 bit FP numbers convenient to store some >constants (on the core I use the load is 1 cycle, expanding >automatically to 64 bit), did not find any other use for those.
The RP2040 doesn't have an FPU, and its semi-hardware FP calls look too slow to run a decent control loop. The barbaric way to do this is with signed 32-bit ints where the LSB is 1 microvolt.
> >Finally, to give you some numbers :). Back during the 80-s I wrote >a floppy disk controller for the 765 on a 1 MHz 6809. It had about >10 us per byte IIRC; doing IRQ was out of question. But the 6809 >had a "sync" opcode, if IRQs were masked it would stop and wait >for an IRQ; and would just resume execution once the line was pulled. >This worked for the fastest of floppies (5" HD), so perhaps you >can use a 6809 :D. (I may have one or two somewhere here, 2 MHz >ones at that - in DIP40....).
I wrote an RTOS for the MC6800! Longhand in Juneau Alaska! That was fairly awful. I mean the RTOS; Juneau was great. The 6800 wouldn't even push the index onto the stack. I did some 6802 and 6803 poducts too, but skipped 6809 and went to 68K. You can still buy 68332's !!!!!!
> >====================================================== >Dimiter Popoff, TGI http://www.tgi-sci.com >====================================================== >http://www.flickr.com/photos/didi_tgi/ > >
On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje ><pNaonStpealmtje@yahoo.com> wrote: > >>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin >><jlarkin@highlandSNIPMEtechnology.com> wrote in >><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: >> >>>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >>><'''newspam'''@nonad.co.uk> wrote: >>> >>>>On 13/01/2023 23:46, John Larkin wrote: >>>>> What's the fastest periodic IRQ that you have ever run? >>>> >>>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>>them onto some dedicated hardware. But CPUs were slower then than now. >>>>> >>>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>>> down some to save power, so the ISR runs for about 7 usec max. >>>>> >>>>> I ask because if I use a Pi Pico on some new projects, it has a >>>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>>> software. >>>>> >>>>> RP2040 floating point is tempting but probably too slow for control >>>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>>> I guess. >>>> >>>>It might be worth benchmarking how fast the FPU really is on that device >>>>(for representative sample code). The Intel i5 & i7 can do all except >>>>divide in a single cycle these days - I don't know what Arm is like in >>>>this respect. You get some +*- for free close to every divide too. >>> >>>The RP2040 chip has FP routines in the rom, apparently code with some >>>sorts of hardware assist, but it's callable subroutines and not native >>>instructions to a hardware FP engine. When it returns it's done. >>> >>>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>>does seem slow for a "fast" fp operation. We'll have to do >>>experiments. >>> >>>I wrote one math package for the 68K, with the format signed 32.32. >>>That behaved just like floating point in real life, but was small and >>>fast and avoided drecky scaled integers. >>> >>>> >>>>*BIG* time penalty for having two divides or branches too close >>>>together. Worth playing around to find patterns the CPU does well. >>> >>>Without true hardware FP, call locations probably don't matter. >>> >>>> >>>>Beware that what you measure gets controlled but for polynomials up to 5 >>>>term or rationals up to about 5,2 call overhead may dominate the >>>>execution time (particularly if the stupid compiler puts a 16byte >>>>structure across a cache boundary on the stack). >>> >>>We occasionally use polynomials, but 2nd order and rarely 3rd is >>>enough to get analog i/o close enough. >>> >>>> >>>>Forcing inlining of small code sections can help. DO it to excess and it >>>>will slow things down - there is a sweet spot. Loop unrolling is much >>>>less useful these days now that branch prediction is so good. >>>> >>>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>>> resistors. The IRQ could load that at various places and a scope would >>>>> trace execution. That would look cool. On the 1758 thing we brought >>>>> out a single bit to a test point and raised that during the ISR so we >>>>> could see ISR execution time on a scope. My c guy didn't believe that >>>>> a useful ISR could run at 100K and had no idea what execution time >>>>> might be. >>>> >>>>ISR code is generally very short and best done in assembler if you want >>>>it as quick as possible. Examining the code generation of GCC is >>>>worthwhile since it sucks compared to Intel(better) and MS (best). >>>> >>>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>>when generating Intel CPU specific SIMD code with maximum optimisation. >>>> >>>>MS compiler still does pretty stupid things like internal compiler >>>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>>having them crossing a cache line boundary. >>> >>>Nobody has answered my question. Generalizations about software timing >>>abound but hard numbers are rare. Programmers don't seem to use >>>oscilloscopes much. >> >>That is silly >> http://panteltje.com/panteltje/pic/scope_pic/index.html >> >>Try reading the asm, it is well commented. >>:-) >> >>And if you are talking Linux or other multi-taskers there is a lot more involved. > >I was thinking about doing closed-loop control, switching power >supplies and dummy loads and such, using one core of an RP2040 instead >of an FPGA. That would be coded hard-metal, no OS or RTOS. > >I guess I don't really need interrupts. I could run a single >persistant loop that waits on a timer until it's time to compute >again, to run at for instance 100 KHz. If execution time is reasonably >constant, it could just loop as fast as it can; even simpler. I like >that one.
This is a very common approach, being pioneered by Bell Labs when designing the first digital telephone switch, the 1ESS: .<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System> The approach endures in such things as missile autopilots, but always with some way to gracefully handle when the control code occasionally runs too long and isn't done in time for the next frame to start. Typically, the frames are started by arrival of a clock interrupt, and there are no data interrupts. The problem being that interrupts (including the hardware to monitor 10,000 lines) are expensive in both overhead and hardware cost, and so are not worthwhile when doing something like scanning 10,000 phone lines for new activity (like a phone having been picked up) where individual lines change only rarely.
>>I was amazed about the other thread about logic analyzers. >>Why did I never need one for my code / projects? >>All you need is a scope... especially an analog one, digital ones are liars! > >I've never used a logic analyzer; they look hard to connect, >especially into a single-chip uP. But color digital scopes rock.
Logic analyzers are usefully for a board full of logic, but not for things like power supplies. One does need to design the board to accept the test leads. This gets interesting with GHz clocks. Joe Gwinn
On a sunny day (Sun, 15 Jan 2023 08:00:39 -0800) it happened John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote in
<6088shtd32gc5r7t4cksj9oqiviq5udjmr@4ax.com>:

>I was thinking about doing closed-loop control, switching power >supplies and dummy loads and such, using one core of an RP2040 instead >of an FPGA. That would be coded hard-metal, no OS or RTOS.
Power supplies work great for me with a Microchip PIC 18F14K22 It has all the PWM, 2 hardware comparators, voltage reference, multi channel ADC, and is fast enough to do cycle by cycle current limiting, one of its hardware comparators is hardwired to the PMW generator and resets it in a few ns if needed.
>I guess I don't really need interrupts. I could run a single >persistant loop that waits on a timer until it's time to compute >again, to run at for instance 100 KHz. If execution time is reasonably >constant, it could just loop as fast as it can; even simpler. I like >that one.
Yes,
>>I was amazed about the other thread about logic analyzers. >>Why did I never need one for my code / projects? >>All you need is a scope... especially an analog one, digital ones are liars! > >I've never used a logic analyzer; they look hard to connect, >especially into a single-chip uP. But color digital scopes rock.
I started looking into building one once, but for showing the i2c bytes? The software to create i2c I use is so good and has been running for decades that it is much easier to look at the code... And the scope for the waveforms, same for the other serial protocols.
>>If you have no clue then having a hall full of equipment does not give you one! >> >>mm >>did I use a scope for any of this? >> http://panteltje.com/panteltje/newsflex/download.html >>I only have a 10 MHz analog dual trace one! > >My usual scope is a 500 MHz 4-channel Rigol. And an old Tek 11802 >sampler for the fast stuff and TDR. I have a 40 GHz plugin.
If I wanted I could buy a Rigol or Tek... Most gigle-Hertz stuff I play with is done via a RTL_SDR stick with converter for 2.4 GHz or 10 GHz (satellite). http://panteltje.com/panteltje/xpsa/index.html old version, latest has many more functions. Now I want a 1 TB (1000 GB) USB stick, found a cheap one for 75 USD online here... Tomshardware just did a test: https://www.tomshardware.com/best-picks/best-flash-drives All your movies and stuff in your pocket when traveling... Security? Maybe encrypt it with something simple. Would still be more secure than storage in the cloud.
On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin ><jlarkin@highlandSNIPMEtechnology.com> wrote: > >>On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje >><pNaonStpealmtje@yahoo.com> wrote: >> >>>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin >>><jlarkin@highlandSNIPMEtechnology.com> wrote in >>><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: >>> >>>>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >>>><'''newspam'''@nonad.co.uk> wrote: >>>> >>>>>On 13/01/2023 23:46, John Larkin wrote: >>>>>> What's the fastest periodic IRQ that you have ever run? >>>>> >>>>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>>>them onto some dedicated hardware. But CPUs were slower then than now. >>>>>> >>>>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>>>> down some to save power, so the ISR runs for about 7 usec max. >>>>>> >>>>>> I ask because if I use a Pi Pico on some new projects, it has a >>>>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>>>> software. >>>>>> >>>>>> RP2040 floating point is tempting but probably too slow for control >>>>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>>>> I guess. >>>>> >>>>>It might be worth benchmarking how fast the FPU really is on that device >>>>>(for representative sample code). The Intel i5 & i7 can do all except >>>>>divide in a single cycle these days - I don't know what Arm is like in >>>>>this respect. You get some +*- for free close to every divide too. >>>> >>>>The RP2040 chip has FP routines in the rom, apparently code with some >>>>sorts of hardware assist, but it's callable subroutines and not native >>>>instructions to a hardware FP engine. When it returns it's done. >>>> >>>>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>>>does seem slow for a "fast" fp operation. We'll have to do >>>>experiments. >>>> >>>>I wrote one math package for the 68K, with the format signed 32.32. >>>>That behaved just like floating point in real life, but was small and >>>>fast and avoided drecky scaled integers. >>>> >>>>> >>>>>*BIG* time penalty for having two divides or branches too close >>>>>together. Worth playing around to find patterns the CPU does well. >>>> >>>>Without true hardware FP, call locations probably don't matter. >>>> >>>>> >>>>>Beware that what you measure gets controlled but for polynomials up to 5 >>>>>term or rationals up to about 5,2 call overhead may dominate the >>>>>execution time (particularly if the stupid compiler puts a 16byte >>>>>structure across a cache boundary on the stack). >>>> >>>>We occasionally use polynomials, but 2nd order and rarely 3rd is >>>>enough to get analog i/o close enough. >>>> >>>>> >>>>>Forcing inlining of small code sections can help. DO it to excess and it >>>>>will slow things down - there is a sweet spot. Loop unrolling is much >>>>>less useful these days now that branch prediction is so good. >>>>> >>>>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>>>> resistors. The IRQ could load that at various places and a scope would >>>>>> trace execution. That would look cool. On the 1758 thing we brought >>>>>> out a single bit to a test point and raised that during the ISR so we >>>>>> could see ISR execution time on a scope. My c guy didn't believe that >>>>>> a useful ISR could run at 100K and had no idea what execution time >>>>>> might be. >>>>> >>>>>ISR code is generally very short and best done in assembler if you want >>>>>it as quick as possible. Examining the code generation of GCC is >>>>>worthwhile since it sucks compared to Intel(better) and MS (best). >>>>> >>>>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>>>when generating Intel CPU specific SIMD code with maximum optimisation. >>>>> >>>>>MS compiler still does pretty stupid things like internal compiler >>>>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>>>having them crossing a cache line boundary. >>>> >>>>Nobody has answered my question. Generalizations about software timing >>>>abound but hard numbers are rare. Programmers don't seem to use >>>>oscilloscopes much. >>> >>>That is silly >>> http://panteltje.com/panteltje/pic/scope_pic/index.html >>> >>>Try reading the asm, it is well commented. >>>:-) >>> >>>And if you are talking Linux or other multi-taskers there is a lot more involved. >> >>I was thinking about doing closed-loop control, switching power >>supplies and dummy loads and such, using one core of an RP2040 instead >>of an FPGA. That would be coded hard-metal, no OS or RTOS. >> >>I guess I don't really need interrupts. I could run a single >>persistant loop that waits on a timer until it's time to compute >>again, to run at for instance 100 KHz. If execution time is reasonably >>constant, it could just loop as fast as it can; even simpler. I like >>that one. > >This is a very common approach, being pioneered by Bell Labs when >designing the first digital telephone switch, the 1ESS: > >.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System> > >The approach endures in such things as missile autopilots, but always >with some way to gracefully handle when the control code occasionally >runs too long and isn't done in time for the next frame to start.
I was thinking of an endless loop that just runs compute bound as hard as it can. The "next frame" is the top of the loop. The control loop time base is whatever the average loop execution time is. As you say, no interrupt overhead.
On 1/15/2023 18:20, John Larkin wrote:
> On Sun, 15 Jan 2023 16:29:00 +0200, Dimiter_Popoff <dp@tgi-sci.com> wrote: > >> On 1/14/2023 1:46, John Larkin wrote: >>> What's the fastest periodic IRQ that you have ever run? >>> >>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>> down some to save power, so the ISR runs for about 7 usec max. >>> >>> I ask because if I use a Pi Pico on some new projects, it has a >>> dual-core 133 MHz CPU, and one core may have enough compute power that >>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>> software. >>> >>> RP2040 floating point is tempting but probably too slow for control >>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>> I guess. >>> >>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>> resistors. The IRQ could load that at various places and a scope would >>> trace execution. That would look cool. On the 1758 thing we brought >>> out a single bit to a test point and raised that during the ISR so we >>> could see ISR execution time on a scope. My c guy didn't believe that >>> a useful ISR could run at 100K and had no idea what execution time >>> might be. >>> >> >> 10 us for a 100+ MHz CPU should be doable; I don't know about ARM >> though, they keep on surprising me with this or that nonsense. (never >> used one, just by chance stumbling on that sort of thing). >> What you might need to consider is that on modern day CPUs you >> don't have the nice prioritized IRQ scheme you must be used to from >> the CPU32; once in an interrupt you are just masked for all interrupts, >> they have some priority resolver which only resolves which interrupt >> will come next *after* you get unmasked. Some I have used have a >> second, higher priority IRQ (like the 6809 FIRQ) but on the core I have >> used they differ from the 6809-s FIRQ in that the errata sheet says >> they don't work. > > I'll be doing single-function bare-metal control, like a power supply > for example, on a dedicated CPU core. The only interrupt will be a > periodic timer, or maybe an ADC that digitizes a few channels and then > interrupts. > > I'd like the power supply to be a mosfet half-bridge and an ADC to > digitize output voltage and current, and code to close the voltage and > current limit loops. I could use a uP timer to make the PWM into the > half-bridge. Possibly go full-bridge and have a bipolar supply. > > I'm just considering new product possibilities now; none of this may > ever happen. Raspberry Pi Pico is sort of a solution looking for a > problem, bottom-up design.
Do you know whether it is documented enough to allow you to throw away all the code that comes with it and write your own bare-metal one? At 100 kHz you'd likely need to do so.
> > >> On load/store machines latency should be less of an issue for the >> jitter you will get as long as you don't do division in your code to >> be interrupted. >> Make sure you look into the FPU you'd consider deep enough, none >> will get you your 32.32 bit accuracy. 64 bit FP numbers have a 52 or >> so (can't remember exactly now) mantissa, the rest goes on the >> exponent. I have found 32 bit FP numbers convenient to store some >> constants (on the core I use the load is 1 cycle, expanding >> automatically to 64 bit), did not find any other use for those. > > The RP2040 doesn't have an FPU, and its semi-hardware FP calls look > too slow to run a decent control loop. The barbaric way to do this is > with signed 32-bit ints where the LSB is 1 microvolt.
Nothing I'd call barbaric about that, you have the 32 bits so why not use them.
> >> >> Finally, to give you some numbers :). Back during the 80-s I wrote >> a floppy disk controller for the 765 on a 1 MHz 6809. It had about >> 10 us per byte IIRC; doing IRQ was out of question. But the 6809 >> had a "sync" opcode, if IRQs were masked it would stop and wait >> for an IRQ; and would just resume execution once the line was pulled. >> This worked for the fastest of floppies (5" HD), so perhaps you >> can use a 6809 :D. (I may have one or two somewhere here, 2 MHz >> ones at that - in DIP40....). > > I wrote an RTOS for the MC6800! Longhand in Juneau Alaska! That was > fairly awful. I mean the RTOS; Juneau was great. The 6800 wouldn't > even push the index onto the stack.
My first board was a 6809 one, but I had no terminal to talk to it so in order to make one I made a clone of Motorola's D5 kit (clone meaning the debug monitor written by Herve Tireford worked on it, its source was public). Then I designed a terminal board, 6800 based, programmed it on an Exorciser clone I had access to and debugged the code with the 6800 on the board being emulated by the kit, a 40 pin dip right from the kit's CPU via a flat cable... (may be there were buffers to drive the cable, don't remember) So I am also used to push the X register via a swi call, doing tsx etc., taught us to be grateful for what 68k gave us.
> > I did some 6802 and 6803 poducts too, but skipped 6809 and went to > 68K. You can still buy 68332's !!!!!!
Some years (10, may be some more) I tamed the mcf52211 in my working environment, still available, too. You will feel quite familiar with it, though it will probably be eol-ed soon. At 66 MHz you could do a lot, the ADC is true 12 bit, it has PWM-s (clocked at 33 MHz though, that resolution can be quite an enemy, especially at higher PWM frequencies). I have done some auxiliary HV sources with it for our netMCA, but they don't work at 100 kHz (IIRC something like 5) and the stepwise change in pulse width was still something I had to deal with. Probably not a great idea to start a new product with it though, only if you feel it will be best for you to write it in 68k assembler.
> > > >> >> ====================================================== >> Dimiter Popoff, TGI http://www.tgi-sci.com >> ====================================================== >> http://www.flickr.com/photos/didi_tgi/ >> >>
On Sun, 15 Jan 2023 11:16:45 -0800, John Larkin
<jlarkin@highlandSNIPMEtechnology.com> wrote:

>On Sun, 15 Jan 2023 12:16:36 -0500, Joe Gwinn <joegwinn@comcast.net> >wrote: > >>On Sun, 15 Jan 2023 08:00:39 -0800, John Larkin >><jlarkin@highlandSNIPMEtechnology.com> wrote: >> >>>On Sun, 15 Jan 2023 04:39:22 GMT, Jan Panteltje >>><pNaonStpealmtje@yahoo.com> wrote: >>> >>>>On a sunny day (Sat, 14 Jan 2023 10:21:59 -0800) it happened John Larkin >>>><jlarkin@highlandSNIPMEtechnology.com> wrote in >>>><epr5sh59k5q62qkapubhkfk8ubf9r0vnng@4ax.com>: >>>> >>>>>On Sat, 14 Jan 2023 15:52:49 +0000, Martin Brown >>>>><'''newspam'''@nonad.co.uk> wrote: >>>>> >>>>>>On 13/01/2023 23:46, John Larkin wrote: >>>>>>> What's the fastest periodic IRQ that you have ever run? >>>>>> >>>>>>Usually try to avoid having fast periodic IRQs in favour of offloading >>>>>>them onto some dedicated hardware. But CPUs were slower then than now. >>>>>>> >>>>>>> We have one board with 12 isolated LPC1758 ARMs. Each gets interrupted >>>>>>> by its on-chip ADC at 100 KHz and does a bunch of filtering and runs a >>>>>>> PID loop, which outputs to the on-chip DAC. We cranked the CPU clock >>>>>>> down some to save power, so the ISR runs for about 7 usec max. >>>>>>> >>>>>>> I ask because if I use a Pi Pico on some new projects, it has a >>>>>>> dual-core 133 MHz CPU, and one core may have enough compute power that >>>>>>> we wouldn't need an FPGA in a lot of cases. Might even do DDS in >>>>>>> software. >>>>>>> >>>>>>> RP2040 floating point is tempting but probably too slow for control >>>>>>> use. Things seem to take 50 or maybe 100 us. Back to scaled integers, >>>>>>> I guess. >>>>>> >>>>>>It might be worth benchmarking how fast the FPU really is on that device >>>>>>(for representative sample code). The Intel i5 & i7 can do all except >>>>>>divide in a single cycle these days - I don't know what Arm is like in >>>>>>this respect. You get some +*- for free close to every divide too. >>>>> >>>>>The RP2040 chip has FP routines in the rom, apparently code with some >>>>>sorts of hardware assist, but it's callable subroutines and not native >>>>>instructions to a hardware FP engine. When it returns it's done. >>>>> >>>>>Various web sites seem to confuse microseconds and nanoseconds. 150 us >>>>>does seem slow for a "fast" fp operation. We'll have to do >>>>>experiments. >>>>> >>>>>I wrote one math package for the 68K, with the format signed 32.32. >>>>>That behaved just like floating point in real life, but was small and >>>>>fast and avoided drecky scaled integers. >>>>> >>>>>> >>>>>>*BIG* time penalty for having two divides or branches too close >>>>>>together. Worth playing around to find patterns the CPU does well. >>>>> >>>>>Without true hardware FP, call locations probably don't matter. >>>>> >>>>>> >>>>>>Beware that what you measure gets controlled but for polynomials up to 5 >>>>>>term or rationals up to about 5,2 call overhead may dominate the >>>>>>execution time (particularly if the stupid compiler puts a 16byte >>>>>>structure across a cache boundary on the stack). >>>>> >>>>>We occasionally use polynomials, but 2nd order and rarely 3rd is >>>>>enough to get analog i/o close enough. >>>>> >>>>>> >>>>>>Forcing inlining of small code sections can help. DO it to excess and it >>>>>>will slow things down - there is a sweet spot. Loop unrolling is much >>>>>>less useful these days now that branch prediction is so good. >>>>>> >>>>>>> I was also thinking that we could make a 2 or 3-bit DAC with a few >>>>>>> resistors. The IRQ could load that at various places and a scope would >>>>>>> trace execution. That would look cool. On the 1758 thing we brought >>>>>>> out a single bit to a test point and raised that during the ISR so we >>>>>>> could see ISR execution time on a scope. My c guy didn't believe that >>>>>>> a useful ISR could run at 100K and had no idea what execution time >>>>>>> might be. >>>>>> >>>>>>ISR code is generally very short and best done in assembler if you want >>>>>>it as quick as possible. Examining the code generation of GCC is >>>>>>worthwhile since it sucks compared to Intel(better) and MS (best). >>>>>> >>>>>>In my tests GCC is between 30% and 3x slower than Intel or MS for C/C++ >>>>>>when generating Intel CPU specific SIMD code with maximum optimisation. >>>>>> >>>>>>MS compiler still does pretty stupid things like internal compiler >>>>>>generated SIMD objects of 128, 256 or 512 bits (16, 33 or 64 byte) and >>>>>>having them crossing a cache line boundary. >>>>> >>>>>Nobody has answered my question. Generalizations about software timing >>>>>abound but hard numbers are rare. Programmers don't seem to use >>>>>oscilloscopes much. >>>> >>>>That is silly >>>> http://panteltje.com/panteltje/pic/scope_pic/index.html >>>> >>>>Try reading the asm, it is well commented. >>>>:-) >>>> >>>>And if you are talking Linux or other multi-taskers there is a lot more involved. >>> >>>I was thinking about doing closed-loop control, switching power >>>supplies and dummy loads and such, using one core of an RP2040 instead >>>of an FPGA. That would be coded hard-metal, no OS or RTOS. >>> >>>I guess I don't really need interrupts. I could run a single >>>persistant loop that waits on a timer until it's time to compute >>>again, to run at for instance 100 KHz. If execution time is reasonably >>>constant, it could just loop as fast as it can; even simpler. I like >>>that one. >> >>This is a very common approach, being pioneered by Bell Labs when >>designing the first digital telephone switch, the 1ESS: >> >>.<https://en.wikipedia.org/wiki/Number_One_Electronic_Switching_System> >> >>The approach endures in such things as missile autopilots, but always >>with some way to gracefully handle when the control code occasionally >>runs too long and isn't done in time for the next frame to start. > >I was thinking of an endless loop that just runs compute bound as hard >as it can. The "next frame" is the top of the loop. The control loop >time base is whatever the average loop execution time is. > >As you say, no interrupt overhead.
To be more specific, the frames effectively run at interrupt priority, triggered by a timer interrupt, but we also run various background tasks at user level utilizing whatever CPU is left over, if any. The sample rate is set by controller dynamics, and going faster does not help. Especially if FFTs are being performed over a moving window of samples. Joe Gwinn