Electronics-Related.com
Forums

supercomputer progress

Started by Unknown April 26, 2022
On Friday, April 29, 2022 at 4:39:05 AM UTC-4, Martin Brown wrote:
> On 28/04/2022 18:47, Jeroen Belleman wrote: > > On 2022-04-28 18:26, boB wrote: > > [...] > >> I would love to have a super computer to run LTspice. > >> > >> boB > > > > In fact, what you have on your desk *is* a super computer, > > in the 1970's meaning of the words. It's just that it's > > bogged down running bloatware. > Indeed. The Cray X-MP in its 4 CPU configuration with a 105MHz clock and > a whopping for the time 128MB of fast core memory with 40GB of disk. The > one I used had an amazing for the time 1TB tape cassette backing store. > It did 600 MFLOPs with the right sort of parallel vector code. > > That was back in the day when you needed special permission to use more > than 4MB of core on the timesharing IBM 3081 (approx 7 MIPS). > > Current Intel 12 gen CPU desktops are ~4GHz, 16GB ram and >1TB of disk. > (and the upper limits are even higher) That combo does ~66,000 MFLOPS. > > Spice simulation doesn't scale particularly well to large scale > multiprocessor environments to many long range interractions.
The Crays were nice if you had a few million dollars to spend. I worked for a startup building more affordable supercomputers in the same ball park of performance at a fraction of the price. Star Technologies, ST-100 supported 100 MFLOPS and 32 MB of memory, costing around $200,000 with 256 KB of RAM was a fraction of the cost of the only slightly faster Cray X-MP, available at the same time. -- Rick C. + Get 1,000 miles of free Supercharging + Tesla referral code - https://ts.la/richard11209
Mike Monett wrote:
> Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> wrote: > >> John Larkin wrote: >>> On Thu, 28 Apr 2022 12:01:59 -0500, Dennis <dennis@none.none> wrote: >>> >>>> On 4/28/22 11:26, boB wrote: >>>> >>>>> I would love to have a super computer to run LTspice. >>>>> >>>> I thought one of the problems with LTspice (and spice in general) >>>> performance is that the algorithms don't parallelize very well. >>> >>> LT runs on multiple cores now. I'd love the next gen LT Spice to run >>> on an Nvidia card. 100x at least. >>> >> >> The "number of threads" setting doesn't do anything very dramatic, >> though, at least last time I tried. Splitting up the calculation >> between cores would require all of them to communicate a couple of times >> per time step, but lots of other simulation codes do that. >> >> The main trouble is that the matrix defining the connectivity between >> nodes is highly irregular in general. >> >> Parallellizing that efficiently might well need a special-purpose >> compiler, sort of similar to the profile-guided optimizer in the guts of >> the FFTW code for computing DFTs. Probably not at all impossible, but >> not that straightforward to implement. >> >> Cheers >> >> Phil Hobbs > > Supercomputers have thousands or hundreds of thousands of cores. > > Quote: > > "Cerebras Systems has unveiled its new Wafer Scale Engine 2 processor with > a record-setting 2.6 trillion transistors and 850,000 AI-optimized cores. > It&rsquo;s built for supercomputing tasks, and it&rsquo;s the second time since 2019 > that Los Altos, California-based Cerebras has unveiled a chip that is > basically an entire wafer." > > https://venturebeat.com/2021/04/20/cerebras-systems-launches-new-ai- > supercomputing-processor-with-2-6-trillion-transistors/
Number of cores isn't the problem. For fairly tightly-coupled tasks such as simulations, the issue is interconnect latency between cores, and the required bandwidth goes roughly as the cube or Moore's law, so it ran out of gas long ago. One thing that zillions of cores could do for SPICE is to do all the stepped parameter runs simultaneously. At that point all you need is infinite bandwidth to disk.
> Man, I wish I were back living in Los Altos again.
I couldn't get out of there fast enough, and have never looked back. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
Martin Brown wrote:
> On 29/04/2022 07:09, Phil Hobbs wrote: >> John Larkin wrote: >>> On Thu, 28 Apr 2022 12:01:59 -0500, Dennis <dennis@none.none> wrote: >>> >>>> On 4/28/22 11:26, boB wrote: >>>> >>>>> I would love to have a super computer to run LTspice. >>>>> >>>> I thought one of the problems with LTspice (and spice in general) >>>> performance is that the algorithms don't parallelize very well. >>> >>> LT runs on multiple cores now. I'd love the next gen LT Spice to run >>> on an Nvidia card. 100x at least. >>> >> >> The "number of threads" setting doesn't do anything very dramatic, >> though, at least last time I tried.&nbsp; Splitting up the calculation >> between cores would require all of them to communicate a couple of >> times per time step, but lots of other simulation codes do that. > > If it is anything like chess problems then the memory bandwidth will > saturate long before all cores+threads are used to optimum effect. After > that point the additional threads merely cause it to run hotter. > > I found setting max threads to about 70% of those notionally available > produced the most computing power with the least heat. After that the > performance gain per thread was negligible but the extra heat was not. > > Having everything running full bore was actually slower and much hotter! >> >> The main trouble is that the matrix defining the connectivity between >> nodes is highly irregular in general. >> >> Parallellizing that efficiently might well need a special-purpose >> compiler, sort of similar to the profile-guided optimizer in the guts >> of the FFTW code for computing DFTs.&nbsp; Probably not at all impossible, >> but not that straightforward to implement. > > I'm less than impressed with profile guided optimisers in compilers. The > only time I tried it in anger the instrumentation code interfered with > the execution of the algorithms to such an extent as to be meaningless.
It wouldn't need to be as general as that--one could simply sort for the most-connected nodes, and sort by weighted graph distance so as to minimize the number of connections across the chunks of netlist, then adjust the data structures for communication appropriately. It also wouldn't parallellize as well as FDTD, say, because there's less computation going on per time step, so the communication overhead is proportionately much greater.
> > One gotcha I have identified in the latest MSC is that when it uses > higher order SSE2, AVX, and AVX-512 implicitly in its code generation it > does not align them on the stack properly so that sometimes they are > split across two cache lines. I see two distinct speeds for each > benchmark code segment depending on how the cache allignment falls. > > Basically the compiler forces stack alignment to 8 bytes and cache lines > are 64 bytes but the compiler generated objects in play are 16 bytes, 32 > bytes or 64 bytes. Alignment failure fractions 1:4, 2:4 and 3:4. > > If you manually allocate such objects you can use pragmas to force > optimal alignment but when the code generator chooses to use them > internally you have no such control. Even so the MS compiler does > generate blisteringly fast code compared to either Intel or GCC. >
The FFTW profiler works pretty well IME, but I agree, doing it with the whole program isn't trivial. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
Jeroen Belleman wrote:
> On 2022-04-28 18:26, boB wrote: > [...] >> I would love to have a super computer to run LTspice. >> >> boB >> > > > In fact, what you have on your desk *is* a super computer, > in the 1970's meaning of the words. It's just that it's > bogged down running bloatware. > > Jeroen Belleman
In the 1990s meaning of the words, in fact. My 2011-vintage desktop box runs 250 Gflops peak (2x 12-core Magny Cours, 64G main memory, RAID5 disks). My phone is a supercomputer by 1970s standards. ;) Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics Briarcliff Manor NY 10510 http://electrooptical.net http://hobbs-eo.com
On Friday, April 29, 2022 at 10:12:30 AM UTC-4, Phil Hobbs wrote:
> Jeroen Belleman wrote: > > On 2022-04-28 18:26, boB wrote: > > [...] > >> I would love to have a super computer to run LTspice. > >> > >> boB > >> > > > > > > In fact, what you have on your desk *is* a super computer, > > in the 1970's meaning of the words. It's just that it's > > bogged down running bloatware. > > > > Jeroen Belleman > In the 1990s meaning of the words, in fact. My 2011-vintage desktop box > runs 250 Gflops peak (2x 12-core Magny Cours, 64G main memory, RAID5 disks). > > My phone is a supercomputer by 1970s standards. ;)
And no more possible to build at that time than in ancient Rome. It's amazing how rapidly technology changes when spurred by the profit motive. -- Rick C. -- Get 1,000 miles of free Supercharging -- Tesla referral code - https://ts.la/richard11209
On Fri, 29 Apr 2022 02:09:19 -0400, Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

>John Larkin wrote: >> On Thu, 28 Apr 2022 12:01:59 -0500, Dennis <dennis@none.none> wrote: >> >>> On 4/28/22 11:26, boB wrote: >>> >>>> I would love to have a super computer to run LTspice. >>>> >>> I thought one of the problems with LTspice (and spice in general) >>> performance is that the algorithms don't parallelize very well. >> >> LT runs on multiple cores now. I'd love the next gen LT Spice to run >> on an Nvidia card. 100x at least. >> > >The "number of threads" setting doesn't do anything very dramatic, >though, at least last time I tried. Splitting up the calculation >between cores would require all of them to communicate a couple of times >per time step, but lots of other simulation codes do that. > >The main trouble is that the matrix defining the connectivity between >nodes is highly irregular in general. > >Parallellizing that efficiently might well need a special-purpose >compiler, sort of similar to the profile-guided optimizer in the guts of >the FFTW code for computing DFTs. Probably not at all impossible, but >not that straightforward to implement. > >Cheers > >Phil Hobbs
Climate simulation uses enormous multi-CPU supercomputer rigs. OK, I suppose that makes your point. -- Anybody can count to one. - Robert Widlar
On Thu, 28 Apr 2022 19:47:03 +0200, Jeroen Belleman
<jeroen@nospam.please> wrote:

>On 2022-04-28 18:26, boB wrote: >[...] >> I would love to have a super computer to run LTspice. >> >> boB >> > > >In fact, what you have on your desk *is* a super computer, >in the 1970's meaning of the words. It's just that it's >bogged down running bloatware. > >Jeroen Belleman
My phone probably has more compute power than all the computers in the world about 1960. -- Anybody can count to one. - Robert Widlar
On Friday, April 29, 2022 at 10:32:07 AM UTC-4, jla...@highlandsniptechnology.com wrote:
> On Thu, 28 Apr 2022 19:47:03 +0200, Jeroen Belleman > <jer...@nospam.please> wrote: > > >On 2022-04-28 18:26, boB wrote: > >[...] > >> I would love to have a super computer to run LTspice. > >> > >> boB > >> > > > > > >In fact, what you have on your desk *is* a super computer, > >in the 1970's meaning of the words. It's just that it's > >bogged down running bloatware. > > > >Jeroen Belleman > My phone probably has more compute power than all the computers in the > world about 1960.
And lets you watch cat videos anywhere you go. -- Rick C. -+ Get 1,000 miles of free Supercharging -+ Tesla referral code - https://ts.la/richard11209
On 29/04/2022 14:46, Ricky wrote:
> On Friday, April 29, 2022 at 4:39:05 AM UTC-4, Martin Brown wrote: >> On 28/04/2022 18:47, Jeroen Belleman wrote: >>> On 2022-04-28 18:26, boB wrote: [...] >>>> I would love to have a super computer to run LTspice. >>>> >>>> boB >>> >>> In fact, what you have on your desk *is* a super computer, in the >>> 1970's meaning of the words. It's just that it's bogged down >>> running bloatware. >> Indeed. The Cray X-MP in its 4 CPU configuration with a 105MHz >> clock and a whopping for the time 128MB of fast core memory with >> 40GB of disk. The one I used had an amazing for the time 1TB tape >> cassette backing store. It did 600 MFLOPs with the right sort of >> parallel vector code. >> >> That was back in the day when you needed special permission to use >> more than 4MB of core on the timesharing IBM 3081 (approx 7 MIPS). >> >> Current Intel 12 gen CPU desktops are ~4GHz, 16GB ram and >1TB of >> disk. (and the upper limits are even higher) That combo does >> ~66,000 MFLOPS. >> >> Spice simulation doesn't scale particularly well to large scale >> multiprocessor environments to many long range interractions. > > The Crays were nice if you had a few million dollars to spend. I > worked for a startup building more affordable supercomputers in the > same ball park of performance at a fraction of the price. Star > Technologies, ST-100 supported 100 MFLOPS and 32 MB of memory, > costing around $200,000 with 256 KB of RAM was a fraction of the cost > of the only slightly faster Cray X-MP, available at the same time.
At the time I was doing that stuff the FPS-120 array processor attached to a PDP-11 or Vax was the poor man's supercomputer. Provided you had the right sort of problem it was very good indeed for price performance. (it was still fairly pricey) I got to port our code to everything from a humble Z80 (where it could only solve trivial toy problems) upwards to the high end Cray. The more expensive the computer the more tolerant of IBM extensions they tended to be. The Z80 FORTRAN IV I remember as being a stickler for the rules. -- Regards, Martin Brown
On 29/04/2022 15:30, jlarkin@highlandsniptechnology.com wrote:
> On Fri, 29 Apr 2022 02:09:19 -0400, Phil Hobbs > <pcdhSpamMeSenseless@electrooptical.net> wrote: > >> John Larkin wrote: >>> On Thu, 28 Apr 2022 12:01:59 -0500, Dennis <dennis@none.none> wrote: >>> >>>> On 4/28/22 11:26, boB wrote: >>>> >>>>> I would love to have a super computer to run LTspice. >>>>> >>>> I thought one of the problems with LTspice (and spice in general) >>>> performance is that the algorithms don't parallelize very well. >>> >>> LT runs on multiple cores now. I'd love the next gen LT Spice to run >>> on an Nvidia card. 100x at least. >>> >> >> The "number of threads" setting doesn't do anything very dramatic, >> though, at least last time I tried. Splitting up the calculation >> between cores would require all of them to communicate a couple of times >> per time step, but lots of other simulation codes do that. >> >> The main trouble is that the matrix defining the connectivity between >> nodes is highly irregular in general. >> >> Parallellizing that efficiently might well need a special-purpose >> compiler, sort of similar to the profile-guided optimizer in the guts of >> the FFTW code for computing DFTs. Probably not at all impossible, but >> not that straightforward to implement. >> >> Cheers >> >> Phil Hobbs > > Climate simulation uses enormous multi-CPU supercomputer rigs.
They are basically fluid in cell models with a fair number of parameters per cell but depending on your exact choice of geometry only 6 nearest neighbours in a 3D cubic computational grid (worst case 26 cells). That is a very regular interconnectivity and lends itself to vector processing (which is why we were using them) though for another problem. A handful of FLIC practitioners used tetrahedral or hexagonal close packed grids (4 nearest neighbours or 12 nearest neighbours).
> OK, I suppose that makes your point.
When I was involved in such codes for relativistic particle beams we used its cylindrical symmetry to make the problem more tractable in 2D. The results agreed remarkably well with experiments so I see no need to ridicule other FLIC models as used in weather and climate research. -- Regards, Martin Brown