Electronics-Related.com
Forums

new spice

Started by John Larkin September 28, 2021
On Fri, 1 Oct 2021 12:13:40 -0400, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

>jlarkin@highlandsniptechnology.com wrote: >> On Fri, 1 Oct 2021 11:07:48 -0400, Phil Hobbs >> <pcdhSpamMeSenseless@electrooptical.net> wrote: >> >>> jlarkin@highlandsniptechnology.com wrote: >>>> On Fri, 1 Oct 2021 08:55:01 -0400, Phil Hobbs >>>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>>> >>>>> Gerhard Hoffmann wrote: >>>>>> Am 30.09.21 um 21:24 schrieb John Larkin: >>>>>>> On Thu, 30 Sep 2021 19:58:50 +0100, "Kevin Aylward" >>>>>>> <kevinRemoveandReplaceATkevinaylward.co.uk> wrote: >>>>>>> >>>>>>>>> "John Larkin"&#4294967295; wrote in message >>>>>>>>> news:bdc7lgdap8o66j8m92ullph1nojbg9c5ni@4ax.com... >>>>>>>> >>>>>>>>> https://www.linkedin.com/in/mike-engelhardt-a788a822 >>>>>>>> >>>>>>>> ...but why ......? >>>>>>>> >>>>>>> >>>>>>> Maybe he enjoys it. I'm sure he's enormously wealthy and could do >>>>>>> anything he wahts. >>>>>>> >>>>>>> I want a Spice that uses an nvidia board to speed it up 1000:1. >>>>>>> >>>>>> >>>>>> Hopeless. That has already been tried in IBM AT times with these >>>>>> Weitek coprocessors and NS 32032 processor boards; it never lived >>>>>> up to the expectations. >>>>> >>>>> Parallellizing sparse matrix computation is an area of active research. >>>>> The best approach ATM seems to be runtime profiling that generates a >>>>> JIT-compiled FPGA image for the actual crunching, sort of like the way >>>>> FFTW generates optimum FFT routines. >>>>> >>>>>> >>>>>> That's no wonder. When in time domain integration the next result >>>>>> depends on the current value and maybe a few in the past, you cannot >>>>>> compute more future timesteps in parallel. Maybe some speculative >>>>>> versions in parallel and then selecting the best. But that is no >>>>>> work for 1000 processors. >>>>> >>>>> SPICE isn't a bad application for parallellism, if you can figure out >>>>> how to do it--you wouldn't bother for trivial things, where the run >>>>> times are less than 30s or so, but for longer calculations the profiling >>>>> would be a small part of the work. The inner loop is time-stepping the >>>>> same matrix topology all the time (though the coefficients change with >>>>> the time step). >>>>> >>>>> Since all that horsepower would be spending most of its time waiting for >>>>> us to dork the circuit and dink the fonts, it could be running the >>>>> profiling in the background during editing. You might get 100x speedup >>>>> that way, ISTM. >>>>> >>>>>> >>>>>> The inversion of the nodal matrix might use some improvement since >>>>>> it is NP complete, like almost everything that is interesting. >>>>> >>>>> Matrix inversion is NP-complete? Since when? It's actually not even >>>>> cubic, asymptotically--the lowest-known complexity bound is less than >>>>> O(N**2.4). >>>>> >>>>>> Its size grows with the number of nodes and the matrix is sparse since >>>>>> most nodes have no interaction. Dividing the circuit into subcircuits, >>>>>> solving these separately and combining the results could provide >>>>>> a speedup, for problems with many nodes. That would be a MAJOR change. >>>>>> >>>>>> Spice has not made much progress since Berkeley is no longer involved. >>>>>> Some people make some local improvements and when they lose interest >>>>>> after 15 years their improvements die. There is no one to integrate >>>>>> that stuff in one open official version. Maybe NGspice comes closest. >>>>>> >>>>>> Keysight ADS has on option to run it on a bunch of workstations but >>>>>> that helps probably most for electromagnetics which has not much >>>>>> in common with spice. >>>>>> >>>>> >>>>> It has more than you might think. EM simulators basically have to loop >>>>> over all of main memory twice per time step, and all the computational >>>>> boundaries have to be kept time-coherent. With low-latency >>>>> interconnects and an OS with a thread scheduler that isn't completely >>>>> brain-dead (i.e. anything except Linux AFAICT), my EM code scales within >>>>> 20% or so of linearly up to 15 compute nodes, which is as far as I've >>>>> tried. >>>>> >>>>> So I'm more optimistic than you, if still rather less than JL. ;) >>>>> >>>>> Cheers >>>>> >>>>> Phil Hobbs >>>> >>>> Since a schematic has a finite number of nodes, why not have one CPU >>>> per node? >>> >>> Doing what, exactly? >> >> Computing the node voltage for the next time step. > >Right, but exactly how? >> >>> Given that the circuit topology forms an irregular >>> sparse matrix, there would be a gigantic communication bottleneck in >>> general. >> >> Shared ram. Most nodes only need to see a few neighbors, plainly >> visible on the schematic. > >"Shared ram" is all virtual, though--you don't have N-port memory >really.
FPGAs do.
>It has to be connected somehow, and all the caches kept >coherent. That causes communications traffic that grows very rapidly >with the number of cores--about N**4 if it's done in symmetrical (SMP) >fashion.
Then don't cache the node voltages; put them in sram. Mux the ram accesses cleverly.
> >> The ram memory map could be clever that way. > >Sure, that's what the JIT FPGA approach does, but the memory layout >doesn't solve the communications bottleneck with a normal CPU or GPU. > >>> Somebody has to decide on the size of the next time step, for >>> instance, which is a global property that has to be properly >>> disseminated after computation. >> >> Step when the slowest CPU is done processing its node. > >But then it has to decide what to do next. The coefficients of the next >iteration depend on the global time step, so there's no purely >node-local method for doing adaptive step size.
Proceed when all the nodes are done their computation. Then each reads the global node ram to get its inputs for the next step. This would all work in a FPGA that had a lot of CPUs on chip. Let the FPGA hardware do the node ram and access paths. I've advocated for such a chip as a general OS host. One CPU per process, with absolute hardware protections. You could about do that now, with klunky soft-core CPUs. LT Spice sometimes runs a billion to one behind real time. We can do better. -- Father Brown's figure remained quite dark and still; but in that instant he had lost his head. His head was always most valuable when he had lost it.
jlarkin@highlandsniptechnology.com wrote:
> On Fri, 1 Oct 2021 12:13:40 -0400, Phil Hobbs > <pcdhSpamMeSenseless@electrooptical.net> wrote: > >> jlarkin@highlandsniptechnology.com wrote: >>> On Fri, 1 Oct 2021 11:07:48 -0400, Phil Hobbs >>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>> >>>> jlarkin@highlandsniptechnology.com wrote: >>>>> On Fri, 1 Oct 2021 08:55:01 -0400, Phil Hobbs >>>>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>>>> >>>>>> Gerhard Hoffmann wrote: >>>>>>> Am 30.09.21 um 21:24 schrieb John Larkin: >>>>>>>> On Thu, 30 Sep 2021 19:58:50 +0100, "Kevin Aylward" >>>>>>>> <kevinRemoveandReplaceATkevinaylward.co.uk> wrote: >>>>>>>> >>>>>>>>>> "John Larkin"&nbsp; wrote in message >>>>>>>>>> news:bdc7lgdap8o66j8m92ullph1nojbg9c5ni@4ax.com... >>>>>>>>> >>>>>>>>>> https://www.linkedin.com/in/mike-engelhardt-a788a822 >>>>>>>>> >>>>>>>>> ...but why ......? >>>>>>>>> >>>>>>>> >>>>>>>> Maybe he enjoys it. I'm sure he's enormously wealthy and could do >>>>>>>> anything he wahts. >>>>>>>> >>>>>>>> I want a Spice that uses an nvidia board to speed it up 1000:1. >>>>>>>> >>>>>>> >>>>>>> Hopeless. That has already been tried in IBM AT times with these >>>>>>> Weitek coprocessors and NS 32032 processor boards; it never lived >>>>>>> up to the expectations. >>>>>> >>>>>> Parallellizing sparse matrix computation is an area of active research. >>>>>> The best approach ATM seems to be runtime profiling that generates a >>>>>> JIT-compiled FPGA image for the actual crunching, sort of like the way >>>>>> FFTW generates optimum FFT routines. >>>>>> >>>>>>> >>>>>>> That's no wonder. When in time domain integration the next result >>>>>>> depends on the current value and maybe a few in the past, you cannot >>>>>>> compute more future timesteps in parallel. Maybe some speculative >>>>>>> versions in parallel and then selecting the best. But that is no >>>>>>> work for 1000 processors. >>>>>> >>>>>> SPICE isn't a bad application for parallellism, if you can figure out >>>>>> how to do it--you wouldn't bother for trivial things, where the run >>>>>> times are less than 30s or so, but for longer calculations the profiling >>>>>> would be a small part of the work. The inner loop is time-stepping the >>>>>> same matrix topology all the time (though the coefficients change with >>>>>> the time step). >>>>>> >>>>>> Since all that horsepower would be spending most of its time waiting for >>>>>> us to dork the circuit and dink the fonts, it could be running the >>>>>> profiling in the background during editing. You might get 100x speedup >>>>>> that way, ISTM. >>>>>> >>>>>>> >>>>>>> The inversion of the nodal matrix might use some improvement since >>>>>>> it is NP complete, like almost everything that is interesting. >>>>>> >>>>>> Matrix inversion is NP-complete? Since when? It's actually not even >>>>>> cubic, asymptotically--the lowest-known complexity bound is less than >>>>>> O(N**2.4). >>>>>> >>>>>>> Its size grows with the number of nodes and the matrix is sparse since >>>>>>> most nodes have no interaction. Dividing the circuit into subcircuits, >>>>>>> solving these separately and combining the results could provide >>>>>>> a speedup, for problems with many nodes. That would be a MAJOR change. >>>>>>> >>>>>>> Spice has not made much progress since Berkeley is no longer involved. >>>>>>> Some people make some local improvements and when they lose interest >>>>>>> after 15 years their improvements die. There is no one to integrate >>>>>>> that stuff in one open official version. Maybe NGspice comes closest. >>>>>>> >>>>>>> Keysight ADS has on option to run it on a bunch of workstations but >>>>>>> that helps probably most for electromagnetics which has not much >>>>>>> in common with spice. >>>>>>> >>>>>> >>>>>> It has more than you might think. EM simulators basically have to loop >>>>>> over all of main memory twice per time step, and all the computational >>>>>> boundaries have to be kept time-coherent. With low-latency >>>>>> interconnects and an OS with a thread scheduler that isn't completely >>>>>> brain-dead (i.e. anything except Linux AFAICT), my EM code scales within >>>>>> 20% or so of linearly up to 15 compute nodes, which is as far as I've >>>>>> tried. >>>>>> >>>>>> So I'm more optimistic than you, if still rather less than JL. ;) >>>>>> >>>>>> Cheers >>>>>> >>>>>> Phil Hobbs >>>>> >>>>> Since a schematic has a finite number of nodes, why not have one CPU >>>>> per node? >>>> >>>> Doing what, exactly? >>> >>> Computing the node voltage for the next time step. >> >> Right, but exactly how? >>> >>>> Given that the circuit topology forms an irregular >>>> sparse matrix, there would be a gigantic communication bottleneck in >>>> general. >>> >>> Shared ram. Most nodes only need to see a few neighbors, plainly >>> visible on the schematic. >> >> "Shared ram" is all virtual, though--you don't have N-port memory >> really. > > FPGAs do. > >> It has to be connected somehow, and all the caches kept >> coherent. That causes communications traffic that grows very rapidly >> with the number of cores--about N**4 if it's done in symmetrical (SMP) >> fashion. > > Then don't cache the node voltages; put them in sram. Mux the ram > accesses cleverly. > >> >>> The ram memory map could be clever that way. >> >> Sure, that's what the JIT FPGA approach does, but the memory layout >> doesn't solve the communications bottleneck with a normal CPU or GPU. >> >>>> Somebody has to decide on the size of the next time step, for >>>> instance, which is a global property that has to be properly >>>> disseminated after computation. >>> >>> Step when the slowest CPU is done processing its node. >> >> But then it has to decide what to do next. The coefficients of the next >> iteration depend on the global time step, so there's no purely >> node-local method for doing adaptive step size. > > Proceed when all the nodes are done their computation. Then each reads > the global node ram to get its inputs for the next step. > > This would all work in a FPGA that had a lot of CPUs on chip. Let the > FPGA hardware do the node ram and access paths.
Yeah, the key is that the circuit topology gets handled by the FPGA, which is more or less my point. (Not that I'm the one doing it.) Large sparse matrices don't map well onto purely general-purpose hardware.
> > I've advocated for such a chip as a general OS host. One CPU per > process, with absolute hardware protections.
Still has the SMP problem if the processes need to know about each other at all.
> > You could about do that now, with klunky soft-core CPUs. > > LT Spice sometimes runs a billion to one behind real time. We can do > better.
Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
On Fri, 1 Oct 2021 12:34:25 -0400, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

>jlarkin@highlandsniptechnology.com wrote: >> On Fri, 1 Oct 2021 12:13:40 -0400, Phil Hobbs >> <pcdhSpamMeSenseless@electrooptical.net> wrote: >> >>> jlarkin@highlandsniptechnology.com wrote: >>>> On Fri, 1 Oct 2021 11:07:48 -0400, Phil Hobbs >>>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>>> >>>>> jlarkin@highlandsniptechnology.com wrote: >>>>>> On Fri, 1 Oct 2021 08:55:01 -0400, Phil Hobbs >>>>>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>>>>> >>>>>>> Gerhard Hoffmann wrote: >>>>>>>> Am 30.09.21 um 21:24 schrieb John Larkin: >>>>>>>>> On Thu, 30 Sep 2021 19:58:50 +0100, "Kevin Aylward" >>>>>>>>> <kevinRemoveandReplaceATkevinaylward.co.uk> wrote: >>>>>>>>> >>>>>>>>>>> "John Larkin"&#4294967295; wrote in message >>>>>>>>>>> news:bdc7lgdap8o66j8m92ullph1nojbg9c5ni@4ax.com... >>>>>>>>>> >>>>>>>>>>> https://www.linkedin.com/in/mike-engelhardt-a788a822 >>>>>>>>>> >>>>>>>>>> ...but why ......? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Maybe he enjoys it. I'm sure he's enormously wealthy and could do >>>>>>>>> anything he wahts. >>>>>>>>> >>>>>>>>> I want a Spice that uses an nvidia board to speed it up 1000:1. >>>>>>>>> >>>>>>>> >>>>>>>> Hopeless. That has already been tried in IBM AT times with these >>>>>>>> Weitek coprocessors and NS 32032 processor boards; it never lived >>>>>>>> up to the expectations. >>>>>>> >>>>>>> Parallellizing sparse matrix computation is an area of active research. >>>>>>> The best approach ATM seems to be runtime profiling that generates a >>>>>>> JIT-compiled FPGA image for the actual crunching, sort of like the way >>>>>>> FFTW generates optimum FFT routines. >>>>>>> >>>>>>>> >>>>>>>> That's no wonder. When in time domain integration the next result >>>>>>>> depends on the current value and maybe a few in the past, you cannot >>>>>>>> compute more future timesteps in parallel. Maybe some speculative >>>>>>>> versions in parallel and then selecting the best. But that is no >>>>>>>> work for 1000 processors. >>>>>>> >>>>>>> SPICE isn't a bad application for parallellism, if you can figure out >>>>>>> how to do it--you wouldn't bother for trivial things, where the run >>>>>>> times are less than 30s or so, but for longer calculations the profiling >>>>>>> would be a small part of the work. The inner loop is time-stepping the >>>>>>> same matrix topology all the time (though the coefficients change with >>>>>>> the time step). >>>>>>> >>>>>>> Since all that horsepower would be spending most of its time waiting for >>>>>>> us to dork the circuit and dink the fonts, it could be running the >>>>>>> profiling in the background during editing. You might get 100x speedup >>>>>>> that way, ISTM. >>>>>>> >>>>>>>> >>>>>>>> The inversion of the nodal matrix might use some improvement since >>>>>>>> it is NP complete, like almost everything that is interesting. >>>>>>> >>>>>>> Matrix inversion is NP-complete? Since when? It's actually not even >>>>>>> cubic, asymptotically--the lowest-known complexity bound is less than >>>>>>> O(N**2.4). >>>>>>> >>>>>>>> Its size grows with the number of nodes and the matrix is sparse since >>>>>>>> most nodes have no interaction. Dividing the circuit into subcircuits, >>>>>>>> solving these separately and combining the results could provide >>>>>>>> a speedup, for problems with many nodes. That would be a MAJOR change. >>>>>>>> >>>>>>>> Spice has not made much progress since Berkeley is no longer involved. >>>>>>>> Some people make some local improvements and when they lose interest >>>>>>>> after 15 years their improvements die. There is no one to integrate >>>>>>>> that stuff in one open official version. Maybe NGspice comes closest. >>>>>>>> >>>>>>>> Keysight ADS has on option to run it on a bunch of workstations but >>>>>>>> that helps probably most for electromagnetics which has not much >>>>>>>> in common with spice. >>>>>>>> >>>>>>> >>>>>>> It has more than you might think. EM simulators basically have to loop >>>>>>> over all of main memory twice per time step, and all the computational >>>>>>> boundaries have to be kept time-coherent. With low-latency >>>>>>> interconnects and an OS with a thread scheduler that isn't completely >>>>>>> brain-dead (i.e. anything except Linux AFAICT), my EM code scales within >>>>>>> 20% or so of linearly up to 15 compute nodes, which is as far as I've >>>>>>> tried. >>>>>>> >>>>>>> So I'm more optimistic than you, if still rather less than JL. ;) >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> Phil Hobbs >>>>>> >>>>>> Since a schematic has a finite number of nodes, why not have one CPU >>>>>> per node? >>>>> >>>>> Doing what, exactly? >>>> >>>> Computing the node voltage for the next time step. >>> >>> Right, but exactly how? >>>> >>>>> Given that the circuit topology forms an irregular >>>>> sparse matrix, there would be a gigantic communication bottleneck in >>>>> general. >>>> >>>> Shared ram. Most nodes only need to see a few neighbors, plainly >>>> visible on the schematic. >>> >>> "Shared ram" is all virtual, though--you don't have N-port memory >>> really. >> >> FPGAs do. >> >>> It has to be connected somehow, and all the caches kept >>> coherent. That causes communications traffic that grows very rapidly >>> with the number of cores--about N**4 if it's done in symmetrical (SMP) >>> fashion. >> >> Then don't cache the node voltages; put them in sram. Mux the ram >> accesses cleverly. >> >>> >>>> The ram memory map could be clever that way. >>> >>> Sure, that's what the JIT FPGA approach does, but the memory layout >>> doesn't solve the communications bottleneck with a normal CPU or GPU. >>> >>>>> Somebody has to decide on the size of the next time step, for >>>>> instance, which is a global property that has to be properly >>>>> disseminated after computation. >>>> >>>> Step when the slowest CPU is done processing its node. >>> >>> But then it has to decide what to do next. The coefficients of the next >>> iteration depend on the global time step, so there's no purely >>> node-local method for doing adaptive step size. >> >> Proceed when all the nodes are done their computation. Then each reads >> the global node ram to get its inputs for the next step. >> >> This would all work in a FPGA that had a lot of CPUs on chip. Let the >> FPGA hardware do the node ram and access paths. > >Yeah, the key is that the circuit topology gets handled by the FPGA, >which is more or less my point. (Not that I'm the one doing it.) > >Large sparse matrices don't map well onto purely general-purpose hardware.
Then stop thinking of circuit simulation in terms of matrix math.
>> >> I've advocated for such a chip as a general OS host. One CPU per >> process, with absolute hardware protections. > >Still has the SMP problem if the processes need to know about each other >at all.
There needs to be the clever multiport common SRAM, and one global DONE line. One shared register could be readable by all CPUs. It could have some management bits. I sure hope we're not still running Windows on 86 and linux on ARM a hundred years from now. -- Father Brown's figure remained quite dark and still; but in that instant he had lost his head. His head was always most valuable when he had lost it.
jlarkin@highlandsniptechnology.com wrote:
> On Fri, 1 Oct 2021 12:34:25 -0400, Phil Hobbs > <pcdhSpamMeSenseless@electrooptical.net> wrote: > >> jlarkin@highlandsniptechnology.com wrote: >>> On Fri, 1 Oct 2021 12:13:40 -0400, Phil Hobbs >>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>> >>>> jlarkin@highlandsniptechnology.com wrote: >>>>> On Fri, 1 Oct 2021 11:07:48 -0400, Phil Hobbs >>>>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>>>> >>>>>> jlarkin@highlandsniptechnology.com wrote: >>>>>>> On Fri, 1 Oct 2021 08:55:01 -0400, Phil Hobbs >>>>>>> <pcdhSpamMeSenseless@electrooptical.net> wrote: >>>>>>> >>>>>>>> Gerhard Hoffmann wrote: >>>>>>>>> Am 30.09.21 um 21:24 schrieb John Larkin: >>>>>>>>>> On Thu, 30 Sep 2021 19:58:50 +0100, "Kevin Aylward" >>>>>>>>>> <kevinRemoveandReplaceATkevinaylward.co.uk> wrote: >>>>>>>>>> >>>>>>>>>>>> "John Larkin"&nbsp; wrote in message >>>>>>>>>>>> news:bdc7lgdap8o66j8m92ullph1nojbg9c5ni@4ax.com... >>>>>>>>>>> >>>>>>>>>>>> https://www.linkedin.com/in/mike-engelhardt-a788a822 >>>>>>>>>>> >>>>>>>>>>> ...but why ......? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Maybe he enjoys it. I'm sure he's enormously wealthy and could do >>>>>>>>>> anything he wahts. >>>>>>>>>> >>>>>>>>>> I want a Spice that uses an nvidia board to speed it up 1000:1. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hopeless. That has already been tried in IBM AT times with these >>>>>>>>> Weitek coprocessors and NS 32032 processor boards; it never lived >>>>>>>>> up to the expectations. >>>>>>>> >>>>>>>> Parallellizing sparse matrix computation is an area of active research. >>>>>>>> The best approach ATM seems to be runtime profiling that generates a >>>>>>>> JIT-compiled FPGA image for the actual crunching, sort of like the way >>>>>>>> FFTW generates optimum FFT routines. >>>>>>>> >>>>>>>>> >>>>>>>>> That's no wonder. When in time domain integration the next result >>>>>>>>> depends on the current value and maybe a few in the past, you cannot >>>>>>>>> compute more future timesteps in parallel. Maybe some speculative >>>>>>>>> versions in parallel and then selecting the best. But that is no >>>>>>>>> work for 1000 processors. >>>>>>>> >>>>>>>> SPICE isn't a bad application for parallellism, if you can figure out >>>>>>>> how to do it--you wouldn't bother for trivial things, where the run >>>>>>>> times are less than 30s or so, but for longer calculations the profiling >>>>>>>> would be a small part of the work. The inner loop is time-stepping the >>>>>>>> same matrix topology all the time (though the coefficients change with >>>>>>>> the time step). >>>>>>>> >>>>>>>> Since all that horsepower would be spending most of its time waiting for >>>>>>>> us to dork the circuit and dink the fonts, it could be running the >>>>>>>> profiling in the background during editing. You might get 100x speedup >>>>>>>> that way, ISTM. >>>>>>>> >>>>>>>>> >>>>>>>>> The inversion of the nodal matrix might use some improvement since >>>>>>>>> it is NP complete, like almost everything that is interesting. >>>>>>>> >>>>>>>> Matrix inversion is NP-complete? Since when? It's actually not even >>>>>>>> cubic, asymptotically--the lowest-known complexity bound is less than >>>>>>>> O(N**2.4). >>>>>>>> >>>>>>>>> Its size grows with the number of nodes and the matrix is sparse since >>>>>>>>> most nodes have no interaction. Dividing the circuit into subcircuits, >>>>>>>>> solving these separately and combining the results could provide >>>>>>>>> a speedup, for problems with many nodes. That would be a MAJOR change. >>>>>>>>> >>>>>>>>> Spice has not made much progress since Berkeley is no longer involved. >>>>>>>>> Some people make some local improvements and when they lose interest >>>>>>>>> after 15 years their improvements die. There is no one to integrate >>>>>>>>> that stuff in one open official version. Maybe NGspice comes closest. >>>>>>>>> >>>>>>>>> Keysight ADS has on option to run it on a bunch of workstations but >>>>>>>>> that helps probably most for electromagnetics which has not much >>>>>>>>> in common with spice. >>>>>>>>> >>>>>>>> >>>>>>>> It has more than you might think. EM simulators basically have to loop >>>>>>>> over all of main memory twice per time step, and all the computational >>>>>>>> boundaries have to be kept time-coherent. With low-latency >>>>>>>> interconnects and an OS with a thread scheduler that isn't completely >>>>>>>> brain-dead (i.e. anything except Linux AFAICT), my EM code scales within >>>>>>>> 20% or so of linearly up to 15 compute nodes, which is as far as I've >>>>>>>> tried. >>>>>>>> >>>>>>>> So I'm more optimistic than you, if still rather less than JL. ;) >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> Phil Hobbs >>>>>>> >>>>>>> Since a schematic has a finite number of nodes, why not have one CPU >>>>>>> per node? >>>>>> >>>>>> Doing what, exactly? >>>>> >>>>> Computing the node voltage for the next time step. >>>> >>>> Right, but exactly how? >>>>> >>>>>> Given that the circuit topology forms an irregular >>>>>> sparse matrix, there would be a gigantic communication bottleneck in >>>>>> general. >>>>> >>>>> Shared ram. Most nodes only need to see a few neighbors, plainly >>>>> visible on the schematic. >>>> >>>> "Shared ram" is all virtual, though--you don't have N-port memory >>>> really. >>> >>> FPGAs do. >>> >>>> It has to be connected somehow, and all the caches kept >>>> coherent. That causes communications traffic that grows very rapidly >>>> with the number of cores--about N**4 if it's done in symmetrical (SMP) >>>> fashion. >>> >>> Then don't cache the node voltages; put them in sram. Mux the ram >>> accesses cleverly. >>> >>>> >>>>> The ram memory map could be clever that way. >>>> >>>> Sure, that's what the JIT FPGA approach does, but the memory layout >>>> doesn't solve the communications bottleneck with a normal CPU or GPU. >>>> >>>>>> Somebody has to decide on the size of the next time step, for >>>>>> instance, which is a global property that has to be properly >>>>>> disseminated after computation. >>>>> >>>>> Step when the slowest CPU is done processing its node. >>>> >>>> But then it has to decide what to do next. The coefficients of the next >>>> iteration depend on the global time step, so there's no purely >>>> node-local method for doing adaptive step size. >>> >>> Proceed when all the nodes are done their computation. Then each reads >>> the global node ram to get its inputs for the next step. >>> >>> This would all work in a FPGA that had a lot of CPUs on chip. Let the >>> FPGA hardware do the node ram and access paths. >> >> Yeah, the key is that the circuit topology gets handled by the FPGA, >> which is more or less my point. (Not that I'm the one doing it.) >> >> Large sparse matrices don't map well onto purely general-purpose hardware. > > > Then stop thinking of circuit simulation in terms of matrix math.
Oh, come _on_. The problem is a large sparse system of nonlinear ODEs with some bags hung onto the side for Tlines and such. How you write it out doesn't change what has to be done--the main issue is the irregularity and unpredictibility of the circuit topology.
> > >>> >>> I've advocated for such a chip as a general OS host. One CPU per >>> process, with absolute hardware protections. >> >> Still has the SMP problem if the processes need to know about each other >> at all. > > There needs to be the clever multiport common SRAM, and one global > DONE line.
Yes, "clever" in the sense of "magic happens here."
> > One shared register could be readable by all CPUs. It could have some > management bits.
For some reasonable number of "all CPUs", sure. Not an unlimited number.
> > I sure hope we're not still running Windows on 86 and linux on ARM a > hundred years from now.
I certainly won't be. ;) Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
On a sunny day (Fri, 1 Oct 2021 16:15:23 +0100) it happened Tom Gardner
<spamjunk@blueyonder.co.uk> wrote in <sj78mb$h21$2@dont-email.me>:

>If being really snarky I'd attribute that to the C/C++ >community having to allocate their brainpower to problems >that are a consequence of their tools' characteristics and >historic choices.
Mmm if I read this right... C is a simple language. There are methods to program in C that are very fast, I really do not know of a singe example that cannot be done in C. Lots of blah blah about object oriented languages.. When I use linked lists in C I basically already have all of that. Not a day goes by without some new back door or hack is found in all those high level written applications that DID chose the high level language just to protect the programmer from that, If you cannot drive a car then driving bus does not make it safer ... C has libraries available for just about anything, some are good and have been debugged to a large extent so a C programmer can use those if needed, No need to reference other peoples silly ideas how to do X in bloat-55.9 And -after all- a computah program is just like a step by step explanation to find the way to some address in a big city. One mistake and you end up nowhere, Writing a novel that shows your hero's adventures recovering from ever again losing his/her/its way and calling it a new language does not help. I really do not see were the problem is other than that those who get lost have no clue of the basics. Maybe that is why I like asm, most things can be done in asm with a 32 bit math library. In a fraction of the code size2, a fraction of the power consumption and a fraction of the hardware. And open source, read it, use it, learn from it. And publish yours.
On 10/1/2021 13:56, Don Y wrote:
> On 10/1/2021 3:04 AM, Jeroen Belleman wrote: >> There's also Wirth's observation: "Software gets slower faster >> than hardware gets faster." > > You've got to have a bit of sympathy for folks who write desktop > applications; they have virtually no control over the environment > in which their code executes.
Given what today's popular OS-s look - and the amount of space the *need* to work - I have more than sympathy for them. So much human resource wasted on coping with legacies built on half-baked ideas etc. But this is how life works I suppose.
> > They can "recommend" a particular hardware/OS configuration. > But, few actually *enforce* those minimum requirements. >
And what have they to choose from? Windows or Linux, x86 and ARM.... Soon that risk-v will come into the picture, same thing of course. Power - by far the most advanced and "fully baked" architecture humans have produced - is well hidden from the wider public. Fortunately I can still get some processors to do what I want to but what if it dies? The only processors left will be this or that version of some little-endian half-baked "shite". And don't get me started with programming languages which went totally C - which most if not all kids nowadays deem as "low level" ... How this came to be is known, so what. I have been going my path for nearly 30 years now, I have vpa (to become renamed to MIA, for Machine Independent Assembly once I have its 64 bit version) and the dps environment with its object maintenance system etc. etc., a world of its own incomparably more efficient than the popular bloats - but I am unlikely to live long enough to be able to win with it against the rest of the world... Not that I care much about it lately. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 01/10/21 16:35, Phil Hobbs wrote:
> Tom Gardner wrote: >> On 01/10/21 15:46, Phil Hobbs wrote: >>> Tom Gardner wrote: >>>> On 01/10/21 14:05, Phil Hobbs wrote: >>>>> Jeroen Belleman wrote: >>>>>> On 2021-10-01 14:11, Gerhard Hoffmann wrote: >>>>>>> Am 01.10.21 um 12:04 schrieb Jeroen Belleman: >>>>>>> >>>>>>> >>>>>>>> There's also Wirth's observation: "Software gets slower >>>>>>>> faster than hardware gets faster." >>>>>>> >>>>>>> When asked how his name should be pronounced he said: >>>>>>> >>>>>>> "You can call me by name:&nbsp; that's Wirth or you can call me by >>>>>>> value: that's worth" >>>>>>> >>>>>>> >>>>>>>>> Cheers, Gerhard >>>>>> >>>>>> I still don't get why C++ had to add call by reference. Big mistake, in my >>>>>> view. >>>>>> >>>>>> Jeroen Belleman >>>>> >>>>> Why?&nbsp; Smart pointers didn't exist in 1998 iirc, and reducing the number of >>>>> bare pointers getting passed around has to be a good >>>>> thing, surely? >>>> >>>> "Smart pointer" is a much older concept given a new name. >>> >>> Hmm, interesting.&nbsp; Got a link?&nbsp; You sort of need destructors to make >>> smart pointers work properly. >> >> No, I don't have a specific link. >> >> I'm not thinking of anything that had specific built-in >> compiler support, but which was implemented by library >> or environment support. That implied it was application >> specific based on a programming convention. > > Okay, it _could_ have been implemented way back, sure.&nbsp; Cfront had > destructors--it's all a matter of bookkeeping,
That's the key point. I was thinking of the way people tended to (naively) implement referencing counting GC, usually in C.
> and could have been done in > Algol, I expect.
My experience is limited to Elliott Algol-60, my first HLL, when I was 16. I later understood the compiler was written by a hero of mine, CAR (Tony) Hoare, of quicksort fame and "billion dollar mistake" (his description) infamy - the null reference :)
> I'm not aware of anybody having actually done it, and > apparently you aren't either.
My view is more like "it is an obvious technique that nobody bothered to crow about, I remember seeing it done, can't be bothered to find a link".
>> In a similar vein, I found I had triumphantly invented >> a primitive form of OOP for a realtime project. When I >> later used OOP terminology to explain the concepts, people >> readily understood it (and its limitations). > > Sure, OOP is a tool, like a Crescent wrench (adjustable spanner OYSOTP). &nbsp;It's a > particularly well suited tool for most of the programming I do--simulators, > embedded stuff, and instrument control--so I like it a lot.
Yup. Simulators was my first use of OOP, although I was attracted to OOP concepts that addressed clients' statements: - "I'd like another of those" and - "I'd like one just like that except..."
> (Other stuff is mostly scripts, for which I generally use Rexx.&nbsp; Rexx isn't > fussy about tabs v. spaces and doesn't change the meaning of a program when it's > reformatted.&nbsp; Could use more library support, though.)
ISTR briefly looking at Rexx, and thinking it didn't offer /me/ enough benefit for the learning curve.
>> I'll add a standard dig, which is overstated but contains >> more than a little validity.... >> >> If you look at academic papers on C++ (and to a lesser >> extent C), they refer to other C++/C papers. If you look >> at academic papers on other languages, they refer to many >> different languages. In other words, the C/C++ academic >> community has less awareness of other communities and their >> progress. >> >> If being really snarky I'd attribute that to the C/C++ >> community having to allocate their brainpower to problems >> that are a consequence of their tools' characteristics and >> historic choices. > > But you wouldn't do such a thing, of course. ;)
Perish the thought :)
On a sunny day (Fri, 1 Oct 2021 11:51:51 -0400) it happened Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote in
<bb70982a-9bfe-8fbd-9874-444f16eca48f@electrooptical.net>:

>I sort of doubt that you or anyone on this group actually knows "what >really happens" when a 2021-vintage compiler maps your source code onto >2010s or 2020s-vintage hardware. See Chisnall's classic 2018 paper, >"C is not a low-level language. Your computer is not a fast PDP-11." ><https://dl.acm.org/doi/abs/10.1145/3212477.3212479>
Of course C is not a low level language, neither is asm, asm for me at least (sigh), is also a high level language. The problem people have it seems it that asm may have hundreds of instructions so a learning curve a bit higher than C or BASIC or whatever snake language I programmed in binary to make my first EPROM programmer, Even then there is no difference,
>but what we were talking about was references vs. pointers. ;)
really. I am neural net I not use spice beep. :-) But everything always works. :-) Motivation counts You earthlings confused
On 10/1/2021 10:05 AM, Jan Panteltje wrote:
> On a sunny day (Fri, 1 Oct 2021 16:15:23 +0100) it happened Tom Gardner > <spamjunk@blueyonder.co.uk> wrote in <sj78mb$h21$2@dont-email.me>: > >> If being really snarky I'd attribute that to the C/C++ >> community having to allocate their brainpower to problems >> that are a consequence of their tools' characteristics and >> historic choices. > > Mmm if I read this right... > > C is a simple language.
ASM is a simple language.
> There are methods to program in C that are very fast,
There are methods to program in ASM that are very fast.
> I really do not know of a singe example that cannot be done in C.
I really don't know of a single example that cannot be done in ASM. The appeal of HLLs is that of abstraction -- freeing the developer from dealing with the minutiae of "running the CPU". And, of providing more information to the toolchain to enable it to ensure you are doing what you *should* be doing ("No, you probably don't want to multiply "Fred" by 9.7302, even if that is what you wrote!") The problem with HLLs is that they don't always conveniently map onto our thought processes. Just like developing parallel algorithms requires a "special effort" or synchronizing multiple actions, etc. [When asked to explain how to do something, we tend to think in terms of sequential operations -- and thinking about what *could* be done simultaneously requires a special effort. Even deciding what *order* is "required" can be a challenge (think petri net)]
On Friday, October 1, 2021 at 9:05:42 AM UTC-4, Phil Hobbs wrote:
> Jeroen Belleman wrote:
> > I still don't get why C++ had to add call by reference. Big > > mistake, in my view.
> Why? Smart pointers didn't exist in 1998 iirc, and reducing the number > of bare pointers getting passed around has to be a good thing, surely?
Apple's MacOS used 'handles' even back in the eighties. <https://en.wikipedia.org/wiki/Classic_Mac_OS_memory_management>