This moves both Gfx::CanonicalCode::write_symbol() and
Compress::CanonicalCode::write_symbol() inline.
It also adds `__attribute__((always_inline))` on the arguments
to visit() in the latter. (ALWAYS_INLINE doesn't work on lambdas.)
Numbers with `ministat`: I ran once:
Build/lagom/bin/image -o test.bmp Base/res/wallpapers/sunset-retro.png
and then ran to bench:
~/src/hack/bench.py -n 20 -o bench_foo1.txt \
Build/lagom/bin/image -o test.webp test.bmp
...and then `ministat bench_foo1.txt bench_foo2.txt` to compare.
The previous commit increased the time for this command by 38% compared
to the before state.
With this, it's an 8.6% regression. So still a regression, but a smaller
one.
Or, in other words, this commit reduces times by 21% compared to the
previous commit.
Numbers with hyperfine are similar -- with this on top of the previous
commit, this is a 7-11% regression, instead of an almost 50% regression.
(A local branch that changes how we compute CanonicalCodes so that we
actually compress a bit is perf-neutral since the image writing code
doesn't change.)
`hyperfine 'image -o test.webp test.bmp'`:
* Before: 23.7 ms ± 0.7 ms (116 runs)
* Previous commit: 33.2 ms ± 0.8 ms (82 runs)
* This commit: 25.5 ms ± 0.7 ms (102 runs)
`hyperfine 'animation -o wow.webp giphy.gif'`:
* Before: 85.5 ms ± 2.0 ms (34 runs)
* Previous commit: 127.7 ms ± 4.4 ms (22 runs)
* This commit: 95.3 ms ± 2.1 ms (31 runs)
`hyperfine 'animation -o wow.webp 7z7c.gif'`:
* Before: 12.6 ms ± 0.6 ms (198 runs)
* Previous commit: 16.5 ms ± 0.9 ms (153 runs)
* This commit: 13.5 ms ± 0.6 ms (186 runs)
We still construct the code length codes manually, and now we also
construct a PrefixCodeGroup manually that assigns 8 bits to all
symbols (except for fully-opaque alpha channels, and for the
unused distance codes, like before). But now we use the CanonicalCodes
from that PrefixCodeGroup for writing.
No behavior change at all, the output is bit-for-bit identical to
before. But this is a step towards actually huffman-coding symbols.
This is however a pretty big perf regression. For
`image -o test.webp test.bmp` (where test.bmp is retro-sunset.png
re-encoded as bmp), time goes from 23.7 ms to 33.2 ms.
`animation -o wow.webp giphy.gif` goes from 85.5 ms to 127.7 ms.
`animation -o wow.webp 7z7c.gif` goes from 12.6 ms to 16.5 ms.
No behavior change. No measurable performance different either.
(I tried `hyperfine 'Build/lagom/bin/image --no-output foo.webp'`
for a few input images before and after this change, and I didn't
see a difference. I also tried if moving both
Gfx::CanonicalCode::read_symbol() and
Compress::CanonicalCode::read_symbol() inline, and that didn't
help either.)