Electronics-Related.com
Forums

LTspice speed

Started by dalai lamah September 21, 2023
As you probably know, in many occasions LTspice cannot take advantage of
multiple CPU cores because many operations are not easily parallelizable.
In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4
cores/8 threads).

However, running more processes of LTspice to execute different simulations
at the same time should overcome this limitation: each simulation is
distinct, they can be fully paralleled. If I run two simulations that
individually would use the 20% of CPU and last 10 minutes, I should see a
40% CPU occupation but they still should take 10 minutes to complete. Maybe
a little more for the Windows scheduler overhead.

Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
both simulations would take almost exactly twice as much to complete, 20
minutes.

I've already tried to manually fiddle with Task Manager and the processor
affinities, for example assigning two cores to a process and two other
cores to the other process. No difference.

Why? Is this some crappy Windows scheduler behavior, or do I miss something
else?

-- 
Fletto i muscoli e sono nel vuoto.
On Thu, 21 Sep 2023 14:22:38 +0200, dalai lamah
<antonio12358@hotmail.com> wrote:

>As you probably know, in many occasions LTspice cannot take advantage of >multiple CPU cores because many operations are not easily parallelizable. >In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >cores/8 threads). > >However, running more processes of LTspice to execute different simulations >at the same time should overcome this limitation: each simulation is >distinct, they can be fully paralleled. If I run two simulations that >individually would use the 20% of CPU and last 10 minutes, I should see a >40% CPU occupation but they still should take 10 minutes to complete. Maybe >a little more for the Windows scheduler overhead. > >Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >both simulations would take almost exactly twice as much to complete, 20 >minutes. > >I've already tried to manually fiddle with Task Manager and the processor >affinities, for example assigning two cores to a process and two other >cores to the other process. No difference. > >Why? Is this some crappy Windows scheduler behavior, or do I miss something >else?
In theory, a sim could be broken into a bunch of small subsystems connected by a few wires, and each would run faster. Small matrix on a dedicated CPU.
On 9/21/2023 5:22 AM, dalai lamah wrote:
> As you probably know, in many occasions LTspice cannot take advantage of > multiple CPU cores because many operations are not easily parallelizable. > In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 > cores/8 threads). > > However, running more processes of LTspice to execute different simulations > at the same time should overcome this limitation: each simulation is > distinct, they can be fully paralleled. If I run two simulations that > individually would use the 20% of CPU and last 10 minutes, I should see a > 40% CPU occupation but they still should take 10 minutes to complete. Maybe > a little more for the Windows scheduler overhead. > > Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but > both simulations would take almost exactly twice as much to complete, 20 > minutes. > > I've already tried to manually fiddle with Task Manager and the processor > affinities, for example assigning two cores to a process and two other > cores to the other process. No difference. > > Why? Is this some crappy Windows scheduler behavior, or do I miss something > else?
My bet: each sim is causing the other's data to be evicted from the cache. If you could disable the cache completely, you could benchmark 1 vs. 2 and verify this. [Or, you have way too little RAM and the machine is thrashing -- but, you would likely notice that]
Un bel giorno Don Y digit&#4294967295;:

>> As you probably know, in many occasions LTspice cannot take advantage of >> multiple CPU cores because many operations are not easily parallelizable. >> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >> cores/8 threads). >> >> However, running more processes of LTspice to execute different simulations >> at the same time should overcome this limitation: each simulation is >> distinct, they can be fully paralleled. If I run two simulations that >> individually would use the 20% of CPU and last 10 minutes, I should see a >> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >> a little more for the Windows scheduler overhead. >> >> Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >> both simulations would take almost exactly twice as much to complete, 20 >> minutes. >> >> I've already tried to manually fiddle with Task Manager and the processor >> affinities, for example assigning two cores to a process and two other >> cores to the other process. No difference. >> >> Why? Is this some crappy Windows scheduler behavior, or do I miss something >> else? > > My bet: each sim is causing the other's data to be evicted from the cache.
Yes, I think this is it: cache misses and probably also I/O overhead. In absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) but the I/O operations are in the millions. Moreover, I've just noticed that every LTspice process uses a lot of threads, even if you limit the "max threads" parameter from the LTspice control panel. At least ten. Right now I'm running three simulations at once, and in total there are 46 LTspice threads running... I think that LTspice is quite similar to AAA games: the number of cores does not matter much, and clock speed is king. -- Fletto i muscoli e sono nel vuoto.
On Thu, 21 Sep 2023 17:22:10 +0200, dalai lamah
<antonio12358@hotmail.com> wrote:

>Un bel giorno Don Y digit&#4294967295;: > >>> As you probably know, in many occasions LTspice cannot take advantage of >>> multiple CPU cores because many operations are not easily parallelizable. >>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>> cores/8 threads). >>> >>> However, running more processes of LTspice to execute different simulations >>> at the same time should overcome this limitation: each simulation is >>> distinct, they can be fully paralleled. If I run two simulations that >>> individually would use the 20% of CPU and last 10 minutes, I should see a >>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >>> a little more for the Windows scheduler overhead. >>> >>> Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>> both simulations would take almost exactly twice as much to complete, 20 >>> minutes. >>> >>> I've already tried to manually fiddle with Task Manager and the processor >>> affinities, for example assigning two cores to a process and two other >>> cores to the other process. No difference. >>> >>> Why? Is this some crappy Windows scheduler behavior, or do I miss something >>> else? >> >> My bet: each sim is causing the other's data to be evicted from the cache. > >Yes, I think this is it: cache misses and probably also I/O overhead. In >absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) >but the I/O operations are in the millions. > >Moreover, I've just noticed that every LTspice process uses a lot of >threads, even if you limit the "max threads" parameter from the LTspice >control panel. At least ten. Right now I'm running three simulations at >once, and in total there are 46 LTspice threads running... > >I think that LTspice is quite similar to AAA games: the number of cores >does not matter much, and clock speed is king.
A biggish circuit generates gigabytes of .RAW file and can bog down a slow hard drive. SS drives help, as does limiting the data that is saved. .SAVE has the disadvantage that you can't freely probe after the sim is done. .SAVE V(*) will save only voltages. LT Spice doesn't allow a fixed or minimum time step, does it?
Un bel giorno John Larkin digit&#4294967295;:

>>>> As you probably know, in many occasions LTspice cannot take advantage of >>>> multiple CPU cores because many operations are not easily parallelizable. >>>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>>> cores/8 threads). >>>> >>>> However, running more processes of LTspice to execute different simulations >>>> at the same time should overcome this limitation: each simulation is >>>> distinct, they can be fully paralleled. If I run two simulations that >>>> individually would use the 20% of CPU and last 10 minutes, I should see a >>>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >>>> a little more for the Windows scheduler overhead. >>>> >>>> Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>>> both simulations would take almost exactly twice as much to complete, 20 >>>> minutes. >>>> >>>> I've already tried to manually fiddle with Task Manager and the processor >>>> affinities, for example assigning two cores to a process and two other >>>> cores to the other process. No difference. >>>> >>>> Why? Is this some crappy Windows scheduler behavior, or do I miss something >>>> else? >>> >>> My bet: each sim is causing the other's data to be evicted from the cache. >> >>Yes, I think this is it: cache misses and probably also I/O overhead. In >>absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) >>but the I/O operations are in the millions. >> >>Moreover, I've just noticed that every LTspice process uses a lot of >>threads, even if you limit the "max threads" parameter from the LTspice >>control panel. At least ten. Right now I'm running three simulations at >>once, and in total there are 46 LTspice threads running... >> >>I think that LTspice is quite similar to AAA games: the number of cores >>does not matter much, and clock speed is king. > > A biggish circuit generates gigabytes of .RAW file and can bog down a > slow hard drive. SS drives help, as does limiting the data that is > saved.
Yes, I have a SSD and each RAW file grows around 15 GB. Unfortunately I need all the data and also some precision; I've set the maximum timestep to 10 ns, it's still slightly inadequate, but I need the simulations to end within a day. :)
> .SAVE has the disadvantage that you can't freely probe after the sim > is done. .SAVE V(*) will save only voltages. > > LT Spice doesn't allow a fixed or minimum time step, does it?
There would be the spice option "dtmin", but I don't know if LTspice supports it. I've never tried it. -- Fletto i muscoli e sono nel vuoto.
On 21/09/2023 13:22, dalai lamah wrote:
> As you probably know, in many occasions LTspice cannot take advantage of > multiple CPU cores because many operations are not easily parallelizable. > In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 > cores/8 threads).
Even with code that is optimised for multiprocessor operation like chess engines a rule of thumb is that about 75% of fast cores running flat out you saturate memory bandwidth and so allowing more than 6 cores out of 8 to run merely increases power consumption and may even slow down the computation. Chess is even more insidious in that certain pruning techniques don't lend themselves to parallelism so you lose both ways.
> > However, running more processes of LTspice to execute different simulations > at the same time should overcome this limitation: each simulation is > distinct, they can be fully paralleled. If I run two simulations that > individually would use the 20% of CPU and last 10 minutes, I should see a > 40% CPU occupation but they still should take 10 minutes to complete. Maybe > a little more for the Windows scheduler overhead. > > Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but > both simulations would take almost exactly twice as much to complete, 20 > minutes.
The computation is almost certainly memory constrained. The matrix solver needs to have plenty of cache to solve the sparse equations and is likely making assumptions about cache lines remaining in cache. Two processes trying to do the same sort of thing will fight like hell for the available resources. I expect LT Spice is very cache aware even if it is only single processor friendly.
> I've already tried to manually fiddle with Task Manager and the processor > affinities, for example assigning two cores to a process and two other > cores to the other process. No difference. > > Why? Is this some crappy Windows scheduler behavior, or do I miss something > else?
Try looking at resource manager and I expect you will find memory access pegged to the maximum. I'm pretty sure it would be the same on any OS. -- Martin Brown
On 9/21/2023 1:31 PM, Martin Brown wrote:
> On 21/09/2023 13:22, dalai lamah wrote: >> As you probably know, in many occasions LTspice cannot take advantage of >> multiple CPU cores because many operations are not easily parallelizable. >> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >> cores/8 threads). > > Even with code that is optimised for multiprocessor operation like chess > engines a rule of thumb is that about 75% of fast cores running flat out > you saturate memory bandwidth and so allowing more than 6 cores out of 8 > to run merely increases power consumption and may even slow down the > computation. Chess is even more insidious in that certain pruning > techniques don't lend themselves to parallelism so you lose both ways. >> >> However, running more processes of LTspice to execute different >> simulations >> at the same time should overcome this limitation: each simulation is >> distinct, they can be fully paralleled. If I run two simulations that >> individually would use the 20% of CPU and last 10 minutes, I should see a >> 40% CPU occupation but they still should take 10 minutes to complete. >> Maybe >> a little more for the Windows scheduler overhead. >> >> Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >> both simulations would take almost exactly twice as much to complete, 20 >> minutes. > > The computation is almost certainly memory constrained. The matrix > solver needs to have plenty of cache to solve the sparse equations and > is likely making assumptions about cache lines remaining in cache. > > Two processes trying to do the same sort of thing will fight like hell > for the available resources. I expect LT Spice is very cache aware even > if it is only single processor friendly.
What about disk access? AFAIK an LTSpice instance by default saves its work to disk as it goes along, see e.g. <https://groups.google.com/g/sci.electronics.cad/c/EnqyB0hUSvo/m/QGxt1uTN1AkJ>
On 9/21/2023 8:22 AM, dalai lamah wrote:
> Un bel giorno Don Y digit&ograve;: > >>> As you probably know, in many occasions LTspice cannot take advantage of >>> multiple CPU cores because many operations are not easily parallelizable. >>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>> cores/8 threads). >>> >>> However, running more processes of LTspice to execute different simulations >>> at the same time should overcome this limitation: each simulation is >>> distinct, they can be fully paralleled. If I run two simulations that >>> individually would use the 20% of CPU and last 10 minutes, I should see a >>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >>> a little more for the Windows scheduler overhead. >>> >>> Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>> both simulations would take almost exactly twice as much to complete, 20 >>> minutes. >>> >>> I've already tried to manually fiddle with Task Manager and the processor >>> affinities, for example assigning two cores to a process and two other >>> cores to the other process. No difference. >>> >>> Why? Is this some crappy Windows scheduler behavior, or do I miss something >>> else? >> >> My bet: each sim is causing the other's data to be evicted from the cache. > > Yes, I think this is it: cache misses and probably also I/O overhead. In > absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) > but the I/O operations are in the millions.
Unless it's flushing the buffers to disk after EVERY write, that's just code-like-any-other-code (i.e., with infinite cache, would speed up just like any other).
> Moreover, I've just noticed that every LTspice process uses a lot of > threads, even if you limit the "max threads" parameter from the LTspice > control panel. At least ten. Right now I'm running three simulations at > once, and in total there are 46 LTspice threads running...
Same as above. What you are looking for is some "scarce resource" that both processes want and has a fixed bandwidth available -- the disk (*if* it was being hammered) or cache are the two that come to mind. [My bet on the cache because spice is lousy for locality of data references]
> I think that LTspice is quite similar to AAA games: the number of cores > does not matter much, and clock speed is king.
I wonder why it's not been ported to a GPU; that seems the obvious migration path (not for the parallelism as much as the raw throughput)
On 9/21/2023 10:31 AM, Martin Brown wrote:
> On 21/09/2023 13:22, dalai lamah wrote: >> As you probably know, in many occasions LTspice cannot take advantage of >> multiple CPU cores because many operations are not easily parallelizable. >> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >> cores/8 threads). > > Even with code that is optimised for multiprocessor operation like chess > engines a rule of thumb is that about 75% of fast cores running flat out you > saturate memory bandwidth and so allowing more than 6 cores out of 8 to run > merely increases power consumption and may even slow down the computation. > Chess is even more insidious in that certain pruning techniques don't lend > themselves to parallelism so you lose both ways.
Didn't Amdahl predict 5X for 8 cores? For well-behaved loads?
>> However, running more processes of LTspice to execute different simulations >> at the same time should overcome this limitation: each simulation is >> distinct, they can be fully paralleled. If I run two simulations that >> individually would use the 20% of CPU and last 10 minutes, I should see a >> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >> a little more for the Windows scheduler overhead. >> >> Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >> both simulations would take almost exactly twice as much to complete, 20 >> minutes. > > The computation is almost certainly memory constrained. The matrix solver needs > to have plenty of cache to solve the sparse equations and is likely making > assumptions about cache lines remaining in cache.
Exactly. It wants to *eat* all of the cache -- as does it's sister process. I suspect turning off the cache and measuring execution time of *1* and then 2 processes would be enlightening. Amusing that even the large caches that are now available are still not large enough for ALL applications. You get spoiled seeing the speedup on nominal problems and are surprised when that doesn't generalize!
> Two processes trying to do the same sort of thing will fight like hell for the > available resources. I expect LT Spice is very cache aware even if it is only > single processor friendly. > >> I've already tried to manually fiddle with Task Manager and the processor >> affinities, for example assigning two cores to a process and two other >> cores to the other process. No difference. >> >> Why? Is this some crappy Windows scheduler behavior, or do I miss something >> else? > > Try looking at resource manager and I expect you will find memory access pegged > to the maximum. I'm pretty sure it would be the same on any OS.