Electronics-Related.com
Forums

a couple of LC filter progs

Started by John Larkin April 5, 2019
On Wed, 10 Apr 2019 11:26:48 +0200, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

>Am 10.04.19 um 06:40 schrieb upsidedown@downunder.com: > >> Yes, TMS 9900 was interesting, but with main memory cycle times in >> the order of one microsecond it was s_l_o_w. However, these days with >> modern caches and cache management it would make sense. >> > >No, never ever. That design is bad to the bone. > >There is a direct and unbreakable link between the computer's fastest >and slowest operations. > > >In the time one can do sub (Rdest,flags), Rsrc1, Rsrc2 > >one has to decide in addition: >- It is in L1 cache > if not: is it in L2 cache > if not: is it in L3 cache > if not: is it at least in L1 page tables...L2 page tables > if not: is swapped out -> trap, needs software to handle > if not: does it exist at all -> trap > >All that 3 times for a simple instruction. In addition to the normal >complications for pipelining, multi issue, speculative execution.
While this is true to any register in the register set after switching workspace pointer, a read into cache usually reads more than a single register but instead loads a full cache line, containing multiple or even all registers. After this initial load, all active registers are already in the cache and no more loads from main memory is needed. Better yet, the loading a new value into the workspace pointer should automatically load the full register set in one or more cache lines. This would be better for cache consistency. An other issue about cache consistency is how to restore the modified values into main memory. Either use "write through" to immediately save individual modified cached register values or "write back" the whole register set into main memory just prior to workspace pointer reload.
>Even large register files or a huge number of renaming registers or >too large caches at a certain level are bad. Not having constant >replacement pressure means that the resource is too fat and therefore >to slow.
Register renaming is just reloading register blocks, often with two or more workspace pointers.
>Some registers are badly needed to keep the complexity out of EVERY >instruction.
Using bulk load of the whole register set does just that.
> >cheers, >Gerhard
On Apr 9, 2019, Tim Williams wrote
(in article <q8it1k$3qi$1@dont-email.me>):

> "Joseph Gwinn" <joegwinn@comcast.net> wrote in message > news:0001HW.225C3CD800A246FA70000E6E32EF@news.giganews.com... > > They coded in plain C, inspected the generated assembly code, and > tweaked the C code until the assembly code was clean and fast. It turned out > > that the resulting code was largely portable in that all C compilers > generated clean, fast code from the same tweaked C source code, after the > source code was tweaked to the first two compilers. > > Expanding on this some more -- > > Just in my humble experience alone, your writing style can massively affect > code generation. > > The optimizer is terribly, terribly far from exhaustive. (It /could/ be > exhaustive -- but then users would complain of hours or years of compilation > time for almost no benefit, and that's no good!) If it doesn't figure out > any simple tricks, it's just going to pick the best, mediocre solution and > let that be that. > > And mind this affects execution time about as much as it does code size. > Often, compact code runs faster, especially on simpler embedded platforms. > (Yeah, when pipelines and caches get involved, unrolled and inlined > operations get more attractive, and the discrepancy between compact and fast > code can grow.) > > Things the optimizer is likely to check, can range from modestly unrolling > or reordering loops, to factoring numerical expressions, to inlining > functions and operating on the resulting mega-function, and more. All of > these grow quickly in complexity, however, and the pursuit can become > self-defeating. > > A recent example was a bit-packing function, on an 8-bit platform with > hardware multiply. I wrote this a few different ways. By far, the worst > was a mega-expression: between macros and carefully indented and inspected > sub-expressions, the whole operation can be expressed entirely numerically. > That's technically fine, but the compiler really throws up its proverbial > hands and basically ends up writing out the expression long-form without any > reuse of sub-expressions, or registers even(!). > > The next best was using a bunch of variables (which are allocated on the > stack normally, but these are optimized out quickly when there are enough > free registers to put them there instead, which was the case here) to hold > intermediate steps, and repeating common steps in a short loop. But keep in > mind that, if the variables are allocated in registers... you can't loop > over them. Doing it with a loop, forces it to allocate stack, get the > pointer, and index the variables. Plus the memory accesses themselves, > which are slower. It is not without overhead! (This is a much better deal > on, say, classic 16-bit machines (x86, 68k), and most everything since.) It > might even try it with the loop and array, then try it unrolled with > registers, and keep the unrolled version because it's simply better! > > I forget what I ended up with; I think I sliced the bit pattern differently, > still using a loop but getting better reuse. It's still ugly, like hundreds > of bytes for something nearly trivial if the data were byte aligned. > > All of this optimizing is subject to the constrains of the functions > executed within each expression or statement. C functions can do literally > anything. Side effects are the bane of optimization. If the optimizer > can't reason about being able to move a function up or down the expression > tree, it's simply forced to treat that as a sequence point. (Sure, it could > reason about the function's contents as if inlining it, but that would be > extremely costly.) > > As far as I know, the optimizer is bad at guessing what functions do, in > terms of side effects, so it can help greatly (and this is why they put the > features in there) to add hints about the nature of the function (e.g., > using const with parameters, writing pure functions when possible, etc.). > > (FYI, most of my experience centers around GCC. Most of this is motivated > by my own observations, illuminated by some of the official documentation > about the optimizing step.)
Yes, this matches my experience as well. Assembly code rules! Joe Gwinn