On Apr 9, 2019, Tim Williams wrote
(in article <q8it1k$3qi$1@dont-email.me>):
> "Joseph Gwinn" <joegwinn@comcast.net> wrote in message
> news:0001HW.225C3CD800A246FA70000E6E32EF@news.giganews.com...
> > They coded in plain C, inspected the generated assembly code, and
> tweaked the C code until the assembly code was clean and fast. It turned out
>
> that the resulting code was largely portable in that all C compilers
> generated clean, fast code from the same tweaked C source code, after the
> source code was tweaked to the first two compilers.
>
> Expanding on this some more --
>
> Just in my humble experience alone, your writing style can massively affect
> code generation.
>
> The optimizer is terribly, terribly far from exhaustive. (It /could/ be
> exhaustive -- but then users would complain of hours or years of compilation
> time for almost no benefit, and that's no good!) If it doesn't figure out
> any simple tricks, it's just going to pick the best, mediocre solution and
> let that be that.
>
> And mind this affects execution time about as much as it does code size.
> Often, compact code runs faster, especially on simpler embedded platforms.
> (Yeah, when pipelines and caches get involved, unrolled and inlined
> operations get more attractive, and the discrepancy between compact and fast
> code can grow.)
>
> Things the optimizer is likely to check, can range from modestly unrolling
> or reordering loops, to factoring numerical expressions, to inlining
> functions and operating on the resulting mega-function, and more. All of
> these grow quickly in complexity, however, and the pursuit can become
> self-defeating.
>
> A recent example was a bit-packing function, on an 8-bit platform with
> hardware multiply. I wrote this a few different ways. By far, the worst
> was a mega-expression: between macros and carefully indented and inspected
> sub-expressions, the whole operation can be expressed entirely numerically.
> That's technically fine, but the compiler really throws up its proverbial
> hands and basically ends up writing out the expression long-form without any
> reuse of sub-expressions, or registers even(!).
>
> The next best was using a bunch of variables (which are allocated on the
> stack normally, but these are optimized out quickly when there are enough
> free registers to put them there instead, which was the case here) to hold
> intermediate steps, and repeating common steps in a short loop. But keep in
> mind that, if the variables are allocated in registers... you can't loop
> over them. Doing it with a loop, forces it to allocate stack, get the
> pointer, and index the variables. Plus the memory accesses themselves,
> which are slower. It is not without overhead! (This is a much better deal
> on, say, classic 16-bit machines (x86, 68k), and most everything since.) It
> might even try it with the loop and array, then try it unrolled with
> registers, and keep the unrolled version because it's simply better!
>
> I forget what I ended up with; I think I sliced the bit pattern differently,
> still using a loop but getting better reuse. It's still ugly, like hundreds
> of bytes for something nearly trivial if the data were byte aligned.
>
> All of this optimizing is subject to the constrains of the functions
> executed within each expression or statement. C functions can do literally
> anything. Side effects are the bane of optimization. If the optimizer
> can't reason about being able to move a function up or down the expression
> tree, it's simply forced to treat that as a sequence point. (Sure, it could
> reason about the function's contents as if inlining it, but that would be
> extremely costly.)
>
> As far as I know, the optimizer is bad at guessing what functions do, in
> terms of side effects, so it can help greatly (and this is why they put the
> features in there) to add hints about the nature of the function (e.g.,
> using const with parameters, writing pure functions when possible, etc.).
>
> (FYI, most of my experience centers around GCC. Most of this is motivated
> by my own observations, illuminated by some of the official documentation
> about the optimizing step.)
Yes, this matches my experience as well. Assembly code rules!
Joe Gwinn