profile
viewpoint
Felix Handte felixhandte @facebook New York. NY felixhandte.com Software Engineer on @facebook's Data Compression Team

felixhandte/draft-handte-hybi-zstd-pmce 5

A Zstandard Per-Message Compression Extension for WebSocket

felixhandte/LaTeX-Grapher 3

Turn CSVs into pretty LaTeX/TiKZ graphs.

felixhandte/lz4 2

Extremely Fast Compression algorithm

felixhandte/Enigma 1

A Real-World Compatible Enigma Machine Simulator

felixhandte/homedir 1

Some dotfiles and such.

felixhandte/zstd 1

Zstandard - Fast real-time compression algorithm

felixhandte/chromium 0

The official GitHub mirror of the Chromium source

pull request commentfacebook/zstd

Makefile: sort all wildcard file list expansions

I'll wait until the builds all succeed, but this looks good to me!

kanavin

comment created time in an hour

CommitCommentEvent

issue closedfacebook/zstd

A possible divide by zero bug?

Hi, in function ZSTD_buildBlockEntropyStats_sequences, we have the following code:

size_t const nbSeq = seqStorePtr->sequences - seqStorePtr->sequencesStart;
...
stats = ZSTD_buildSequencesStatistics(seqStorePtr, nbSeq,...)

If seqStorePtr->sequences == seqStorePtr->sequencesStart, nbSeq will be equal to 0 and passed to the function ZSTD_buildSequencesStatistics. Eventually, nbSeq will be used as the divisor of a division operation without checking.

Here is the full trace for triggering this problem (with links pointing to specific lines in the code): ZSTD_buildBlockEntropyStats_sequences --> ZSTD_buildSequencesStatistics --> ZSTD_selectEncodingType --> ZSTD_entropyCost, finally leading to a division in ZSTD_entropyCost:

unsigned norm = (unsigned)((256 * count[s]) / total); // here total equals to nbSeq, which may equal to zero

closed time in 9 days

yiyuaner

issue commentfacebook/zstd

Generic error returned but only in 32 bits and only in the 1.5.0 release

Struct-packing is not standards-compliant. It's hard enough to make sure zstd is correct in all the hairy corner cases of standards-compliant platforms. We're not going to try to support this.

Ben136KBC

comment created time in 9 days

pull request commentfacebook/zstd

Fix SPM warning: umbrella header for module 'libzstd' does not include header 'xxx.h'

Is it possible we can just do the following?

module libzstd [extern_c] {
    umbrella "../"
    export *
}

That seems to be how you would use umbrella, from my naive read of the docs. Does swift need a specific root header? And can that be zstd.h?

cntrump

comment created time in 9 days

push eventfacebook/zstd

W. Felix Handte

commit sha 66079085f06c526e55e3f97cbe17d04d11960698

Determinism: Avoid Mapping Window into Reserved Indices during Reduction PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It succeeded in addressing that source of non-determinism, but introduced a new one: it was possible, when index reduction occurred, to map indices in the window to the reserved value, which would cause them to be zeroed, potentially altering parsing of the input. This PR addresses this issue. It makes sure that the bottom of the window is always `>= ZSTD_WINDOW_START_INDEX`. I'm not sure if this makes #2850 redundant. I think it's probably still valuable to have that protection as well. Credit to OSS-Fuzz for discovering this issue.

view details

Felix Handte

commit sha c2c6a4ab40fcc327e79d5364f9c2ab1e41e6a7f8

Merge pull request #2869 from felixhandte/oss-fuzz-fix-41005 Determinism: Avoid Mapping Window into Reserved Indices during Reduction

view details

push time in 14 days

PR merged facebook/zstd

Determinism: Avoid Mapping Window into Reserved Indices during Reduction bug CLA Signed

PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It succeeded in addressing that source of non-determinism, but introduced a new one: it was possible, when index reduction occurred, to map indices in the window to the reserved value, which would cause them to be zeroed, potentially altering parsing of the input.

This PR addresses this issue. It makes sure that the bottom of the window is always >= ZSTD_WINDOW_START_INDEX.

I'm not sure if this makes #2850 redundant. I think it's probably still valuable to have that protection as well.

Credit to OSS-Fuzz for discovering this issue.

+22 -10

0 comment

1 changed file

felixhandte

pr closed time in 14 days

PR opened facebook/zstd

Determinism: Avoid Mapping Window into Reserved Indices during Reduction

PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It succeeded in addressing that source of non-determinism, but introduced a new one: it was possible, when index reduction occurred, to map indices in the window to the reserved value, which would cause them to be zeroed, potentially altering parsing of the input.

This PR addresses this issue. It makes sure that the bottom of the window is always >= ZSTD_WINDOW_START_INDEX.

I'm not sure if this makes #2850 redundant. I think it's probably still valuable to have that protection as well.

Credit to OSS-Fuzz for discovering this issue.

+22 -10

0 comment

1 changed file

pr created time in 15 days

create barnchfelixhandte/zstd

branch : oss-fuzz-fix-41005

created branch time in 15 days

push eventfelixhandte/zstd

Ma Lin

commit sha b10357ce65fa7fc908a06713c4f23f69dfc7ba8a

ZSTD_copy16() uses SSE2 instructions This accelerates the decompression speed of MSVC build.

view details

Kevin Svetlitski

commit sha 0665d4c1c2357fb7f90478994f8dedfb31dce164

Display command line parameters with concrete values in verbose mode

view details

W. Felix Handte

commit sha 61765cacd08103d47ce7c6709135f340d1074568

Avoid Reducing Indices to Reserved Values Previously, if an index was equal to `reducerValue + 1`, it would get remapped during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the parsing of the input slightly, by causing tree nodes to be nullified when they otherwise wouldn't be. This hardly matters from a correctness or efficiency perspective, but it does impact determinism. So this commit changes index reduction to avoid mapping indices to collide with `ZSTD_DUBT_UNSORTED_MARK`.

view details

W. Felix Handte

commit sha 48572f52b1c6fe2c7e57206b063ea85980e6c00b

Rewrite Fix to Still Auto-Vectorize

view details

binhdvo

commit sha 931778ed9b769db6257f18a74c5b60f591a0aa30

Fix fullbench CI failure (#2851)

view details

Felix Handte

commit sha a071e006964038ab50c2a1b648dfcf0e3e6e2468

Merge pull request #2850 from felixhandte/oss-fuzz-fix-40829-for-real-this-time Fix Determinism Bug: Avoid Reducing Indices to Reserved Values

view details

Kevin Svetlitski

commit sha 63fe6198ed9af6bf30de086238aa7e97009ac52a

Display --zstd= subparameters in command-line ready form in verbose mode

view details

Kevin Svetlitski

commit sha df9b7755cb38799e47c073fc5c03e146d23565fa

Fix const-ness of FIO_displayCompressionParameters

view details

Yann Collet

commit sha 9ba07907c886870c292c5f8d9eed12b4690c8b04

Merge pull request #2836 from animalize/copy16 ZSTD_copy16() uses ZSTD_memcpy()

view details

Kevin Svetlitski

commit sha f6ffd392302df82e348a1ab42db3c1374bb29605

Add test case for detailed compression parameter verbose output

view details

Kevin Svetlitski

commit sha 365c91194ce308650197fa08ecee9d802f7225a1

Ensure print*CParams functions are only defined when used

view details

Kevin Svetlitski

commit sha 375e3aad6c35be2605acbec873d8ad6c54f49434

Ensure formatting directives for displaying size_t are portable

view details

Kevin Svetlitski

commit sha 7fbd126e089a1359ce7c9729be91ee546312af1a

Suppress spurious unused parameter warning

view details

Kevin Svetlitski

commit sha 9b28c26cbff0d43eb72d90ba5ef2fea27659638e

Integrate verbose mode tests into playTests.sh

view details

Yann Collet

commit sha ddae153947beb03b9c9b64dc0ecf43b37e924e4d

Merge pull request #2847 from Svetlitski-FB/improve-verbose-output-2 Display command line parameters with concrete values in verbose mode

view details

push time in 15 days

issue commentfacebook/zstd

HTTP custom dictionary auto discovery

Hi @boenrobot,

I'm glad to hear this is something you're passionate about! I too really want to see this happen.

Unfortunately, there are non-trivial challenges that have hindered progress.

  • Security: Dictionary-based compression opens a whole can of worms in terms of security. I've heard from pretty much all of the relevant parties that this is a blocking issue. I've been slowly working on an RFC to get some clarity on the problem and hopefully get agreement on what would make such a scheme compatible with the internet's security goals.
  • Complexity: The mechanisms, especially on the client side, are potentially complex. To name one issue, there are lots of complicated cache interactions.
  • Consensus: Driving consensus in the internet is like herding cats. :)

So we are pursuing an incremental strategy:

Step 1 is to get dictionary-less zstd into browsers. There's been some recent activity on this front in Chrome and at the W3C TPAC, so I'm hoping we see movement on this in the near future.

Step 2 is to ship a set of static dictionaries and standardize a means of using them. I hope to investigate this soon.

Step 3 will be to pursue dynamic/custom dictionaries. While we've deployed a custom scheme at Facebook (you can see some discussion here: mitmproxy/mitmproxy#4394), much work is required to turn it into a viable protocol for the open internet.

But I will take all the help I can get! If you'd like to pitch in, there are a lot of different ways to do that. Probably the simplest is to make your voice heard in these forums (HTTPWG, etc.) and let folks know that this is something you want to see.

boenrobot

comment created time in 21 days

push eventfacebook/zstd

W. Felix Handte

commit sha 61765cacd08103d47ce7c6709135f340d1074568

Avoid Reducing Indices to Reserved Values Previously, if an index was equal to `reducerValue + 1`, it would get remapped during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the parsing of the input slightly, by causing tree nodes to be nullified when they otherwise wouldn't be. This hardly matters from a correctness or efficiency perspective, but it does impact determinism. So this commit changes index reduction to avoid mapping indices to collide with `ZSTD_DUBT_UNSORTED_MARK`.

view details

W. Felix Handte

commit sha 48572f52b1c6fe2c7e57206b063ea85980e6c00b

Rewrite Fix to Still Auto-Vectorize

view details

Felix Handte

commit sha a071e006964038ab50c2a1b648dfcf0e3e6e2468

Merge pull request #2850 from felixhandte/oss-fuzz-fix-40829-for-real-this-time Fix Determinism Bug: Avoid Reducing Indices to Reserved Values

view details

push time in 22 days

PR merged facebook/zstd

Fix Determinism Bug: Avoid Reducing Indices to Reserved Values bug CLA Signed

Previously, if an index was equal to reducerValue + 1, it would get remapped during index reduction to 1 i.e. ZSTD_DUBT_UNSORTED_MARK. This can affect the parsing of the input slightly, by causing tree nodes to be nullified when they otherwise wouldn't be. This hardly matters from a correctness or efficiency perspective, but it does impact determinism.

So this Pull Request changes index reduction to avoid mapping indices to collide with ZSTD_DUBT_UNSORTED_MARK.

I am somewhat concerned that ZSTD_reduceTable_internal() will be slower now. I'm not sure how important the speed of this is in practive, but it looks like it's written to be auto-vectorized, and I'm not sure the new version is similarly vectorizable.

Credit to OSS-Fuzz for discovery.

+12 -5

2 comments

1 changed file

felixhandte

pr closed time in 22 days

pull request commentfacebook/zstd

Avoid Reducing Indices to Reserved Values

I've put up a new version that does a superfluous write, but that helps the compiler vectorize the loop (https://godbolt.org/z/Kq4oc139n).

felixhandte

comment created time in 23 days

push eventfelixhandte/zstd

W. Felix Handte

commit sha 48572f52b1c6fe2c7e57206b063ea85980e6c00b

Rewrite Fix to Still Auto-Vectorize

view details

push time in 23 days

PR opened facebook/zstd

Avoid Reducing Indices to Reserved Values bug

Previously, if an index was equal to reducerValue + 1, it would get remapped during index reduction to 1 i.e. ZSTD_DUBT_UNSORTED_MARK. This can affect the parsing of the input slightly, by causing tree nodes to be nullified when they otherwise wouldn't be. This hardly matters from a correctness or efficiency perspective, but it does impact determinism.

So this Pull Request changes index reduction to avoid mapping indices to collide with ZSTD_DUBT_UNSORTED_MARK.

I am somewhat concerned that ZSTD_reduceTable_internal() will be slower now. I'm not sure how important the speed of this is in practive, but it looks like it's written to be auto-vectorized, and I'm not sure the new version is similarly vectorizable.

Credit to OSS-Fuzz for discovery.

+7 -5

0 comment

1 changed file

pr created time in 24 days

create barnchfelixhandte/zstd

branch : oss-fuzz-fix-40829-for-real-this-time

created branch time in 24 days

push eventfelixhandte/zstd

Ma Lin

commit sha cc22042da0819b37cf69175bd92fe26a2975eea7

Fix a C89 error in msvc Variables (r) must be declared at the beginning of a code block. This causes msvc2012 to fail to compile 64-bit build.

view details

Ma Lin

commit sha 95f492ea17834a6fb21013d2db0d03e0f12db025

Don't initialize the first parameter of _BitScanReverse* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64

view details

Ma Lin

commit sha e5ba858270929080f03be821c53c93b8a05a42e0

Don't initialize the first parameter of _BitScanForward* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64

view details

Ma Lin

commit sha ae986fcdb861c44ee4267fc4b3dcf0a4cd2ee724

Use __assume(0) for unreachable code path in msvc msvc will optimize away the condition check.

view details

Ma Lin

commit sha 894f05e88d07dd1c791597540e91f1aa6149b67e

Fix ZSTD_countTrailingZeros() bug `>> 3` is wrong.

view details

W. Felix Handte

commit sha 258c0623e1e4e49a8d6ac3471dda3dd4030a2191

Extract Single-Segment Variant of ZSTD_dfast

view details

W. Felix Handte

commit sha 1bdf0410713b25a5c3419e12f32ab16eac62ba86

Track Step Rather than Recalculating (+0.5% Speed)

view details

W. Felix Handte

commit sha 072ffaad67b67ea4aef50b7370268fe28fa3b7e1

Extract Working Variables

view details

W. Felix Handte

commit sha a1ac7205d031a7f27ad1d5eae9deced4bba938f1

Pull Match Found Stuff Out of the Loop

view details

W. Felix Handte

commit sha db4e1b5479c1df569732aa8f486633ed4098bd8c

Hash Long One Position Ahead (+2.5% Speed) Aside from maybe a latency win in the loop, this means that when we find a short match, we've already done the hash we need to check the next long match.

view details

W. Felix Handte

commit sha 39f2491bfc70db883d45265af0f764d1e80cfa69

Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed) This costs a little ratio, unfortunately.

view details

W. Felix Handte

commit sha 2ddef7c872c123383aa8b998a3b428bf359a8ce0

Write Back Advanced Hash in Long Matches as Well (+Ratio) Since we're now hashing the position ahead even if we find a long match and don't search that next position, we can write it back into the hashtable even in long matches. This seems to cost us no speed, and improves compression ratio slightly!

view details

W. Felix Handte

commit sha 6ae44c0db8d9832dd6d1ac8cdd3435dd686c0d0a

Advance Long Index Lookup (+0.5% Speed) This lookup can be advanced to before the short match check because either way we will use it (in the next loop iter or in `_search_next_long`).

view details

W. Felix Handte

commit sha 2cdfad538c24e8fd108e1c46101f4e7ec645663c

Search One Last Position

view details

W. Felix Handte

commit sha 47fd762eccb3d8273a8e9c20239fd069271c3d0d

Nit: Unnest Blocks that Don't Declare Anything

view details

W. Felix Handte

commit sha fcab4841aa0df69de0dfe867f9a74510ad7a459c

Nit: Rename Function

view details

W. Felix Handte

commit sha 051b473e7ebce26f0e2ea7d2ced994f2ec2bfb59

Fall Back in _extDict to New _noDict Rather than Old Merged Impl

view details

W. Felix Handte

commit sha 62536ef7da6023a1a0d533351b402b5418c4199a

Simplify DMS Implementation by Removing noDict Support

view details

W. Felix Handte

commit sha c2c32839dc8cc16ea8a75c2f552018623def8464

Update results.csv

view details

W. Felix Handte

commit sha 168d0a3c89dd7d74fc682859520e895b6f1b521a

Fix Flaky Test This test depended on `_extDict` and `_noDict` compressing identically, which is not a guarantee we make, AFAIK.

view details

push time in 24 days

create barnchfelixhandte/zstd

branch : oss-fuzz-fix-40829

created branch time in 24 days

issue commentfacebook/zstd

[question] why the zstd don't preserve original file name and modification date?

@cschanzlenist, yes, this is a bug that was introduced in v1.5.0: #2739. We've landed a fix that will go out in the next release: #2742.

Thanks for the report and sorry for the churn!

stokito

comment created time in a month

push eventfacebook/zstd

W. Felix Handte

commit sha 258c0623e1e4e49a8d6ac3471dda3dd4030a2191

Extract Single-Segment Variant of ZSTD_dfast

view details

W. Felix Handte

commit sha 1bdf0410713b25a5c3419e12f32ab16eac62ba86

Track Step Rather than Recalculating (+0.5% Speed)

view details

W. Felix Handte

commit sha 072ffaad67b67ea4aef50b7370268fe28fa3b7e1

Extract Working Variables

view details

W. Felix Handte

commit sha a1ac7205d031a7f27ad1d5eae9deced4bba938f1

Pull Match Found Stuff Out of the Loop

view details

W. Felix Handte

commit sha db4e1b5479c1df569732aa8f486633ed4098bd8c

Hash Long One Position Ahead (+2.5% Speed) Aside from maybe a latency win in the loop, this means that when we find a short match, we've already done the hash we need to check the next long match.

view details

W. Felix Handte

commit sha 39f2491bfc70db883d45265af0f764d1e80cfa69

Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed) This costs a little ratio, unfortunately.

view details

W. Felix Handte

commit sha 2ddef7c872c123383aa8b998a3b428bf359a8ce0

Write Back Advanced Hash in Long Matches as Well (+Ratio) Since we're now hashing the position ahead even if we find a long match and don't search that next position, we can write it back into the hashtable even in long matches. This seems to cost us no speed, and improves compression ratio slightly!

view details

W. Felix Handte

commit sha 6ae44c0db8d9832dd6d1ac8cdd3435dd686c0d0a

Advance Long Index Lookup (+0.5% Speed) This lookup can be advanced to before the short match check because either way we will use it (in the next loop iter or in `_search_next_long`).

view details

W. Felix Handte

commit sha 2cdfad538c24e8fd108e1c46101f4e7ec645663c

Search One Last Position

view details

W. Felix Handte

commit sha 47fd762eccb3d8273a8e9c20239fd069271c3d0d

Nit: Unnest Blocks that Don't Declare Anything

view details

W. Felix Handte

commit sha fcab4841aa0df69de0dfe867f9a74510ad7a459c

Nit: Rename Function

view details

W. Felix Handte

commit sha 051b473e7ebce26f0e2ea7d2ced994f2ec2bfb59

Fall Back in _extDict to New _noDict Rather than Old Merged Impl

view details

W. Felix Handte

commit sha 62536ef7da6023a1a0d533351b402b5418c4199a

Simplify DMS Implementation by Removing noDict Support

view details

W. Felix Handte

commit sha c2c32839dc8cc16ea8a75c2f552018623def8464

Update results.csv

view details

W. Felix Handte

commit sha 168d0a3c89dd7d74fc682859520e895b6f1b521a

Fix Flaky Test This test depended on `_extDict` and `_noDict` compressing identically, which is not a guarantee we make, AFAIK.

view details

W. Felix Handte

commit sha 79ca83076620d69a419e167b2a49ca01557285f8

Style: Add Comments to Variables and Move a Couple into the Loop

view details

W. Felix Handte

commit sha 0bfc935add6f47630c28b4e9215026e5860cb85e

Convert Outer Control Structure to Loop

view details

Felix Handte

commit sha 23c1a2d260e6b50279fbb25d5e13e52516c6d19b

Merge pull request #2774 from felixhandte/zstd-dfast-pipelined-single Pipelined Implementation of ZSTD_dfast

view details

push time in 2 months

PR merged facebook/zstd

Pipelined Implementation of ZSTD_dfast CLA Signed optimization

This PR takes the ideas from #2749 and applies them to the double-fast implementation.

Description

We start by pulling a single-segment copy out so that we can work on it separately from the DMS implementation.

This implementation makes two changes to how the input is parsed:

  1. Instead of checking ip + 1 when we find a short match, we check ip + step. This is a pretty minimal change to the parsing behavior, since step is almost always 1.
  2. We write back ip + 1 into the hash table even when we take a long match at ip (instead of only in the short match path). It costs us basically nothing to do this because we've already hashed it. This improves compression ratio.

Unlike the fast implementation, whose pipelining includes speculative work that we might throw away, this implementation doesn't do any additional work. It just moves some of it earlier. In particular, the crucial observation is that when we do not take a long match at the current position, we are guaranteed to inspect the next long position, either by taking a short match and checking the next one or by not taking the short match and moving on to the next position. So we can frontload that loading work some.

Benchmarks

<details> <summary>Silesia Results Table</summary>

dickens     gcc-4.8    3 |   99.9  100.3 ( +0.400%) |  2.769  2.779 ( +0.361%)
dickens     gcc-5      3 |   99.1   99.2 ( +0.101%) |  2.769  2.779 ( +0.361%)
dickens     gcc-6      3 |  101.5   99.0 ( -2.463%) |  2.769  2.779 ( +0.361%)
dickens     gcc-7      3 |  101.8   99.7 ( -2.063%) |  2.769  2.779 ( +0.361%)
dickens     gcc-8      3 |   96.5   97.9 ( +1.451%) |  2.769  2.779 ( +0.361%)
dickens     gcc-10     3 |  100.4   99.5 ( -0.896%) |  2.769  2.779 ( +0.361%)
dickens     clang-6.0  3 |  103.7  103.0 ( -0.675%) |  2.769  2.779 ( +0.361%)
dickens     clang-7    3 |  100.4   98.0 ( -2.390%) |  2.769  2.779 ( +0.361%)
dickens     clang-8    3 |  100.7   99.1 ( -1.589%) |  2.769  2.779 ( +0.361%)
dickens     clang-9    3 |  102.3   94.5 ( -7.625%) |  2.769  2.779 ( +0.361%)
dickens     clang-11   3 |  100.8  100.2 ( -0.595%) |  2.769  2.779 ( +0.361%)
dickens     clang-12   3 |  100.7   99.2 ( -1.490%) |  2.769  2.779 ( +0.361%)
dickens     gcc-4.8    4 |  101.9  101.8 ( -0.098%) |  2.827  2.841 ( +0.495%)
dickens     gcc-5      4 |   98.5   95.8 ( -2.741%) |  2.827  2.841 ( +0.495%)
dickens     gcc-6      4 |   98.9   96.9 ( -2.022%) |  2.827  2.841 ( +0.495%)
dickens     gcc-7      4 |   98.8   97.6 ( -1.215%) |  2.827  2.841 ( +0.495%)
dickens     gcc-8      4 |   98.6  101.7 ( +3.144%) |  2.827  2.841 ( +0.495%)
dickens     gcc-10     4 |   95.1  100.7 ( +5.889%) |  2.827  2.841 ( +0.495%)
dickens     clang-6.0  4 |  102.1  100.1 ( -1.959%) |  2.827  2.841 ( +0.495%)
dickens     clang-7    4 |   97.9   97.9 ( +0.000%) |  2.827  2.841 ( +0.495%)
dickens     clang-8    4 |  100.8   98.6 ( -2.183%) |  2.827  2.841 ( +0.495%)
dickens     clang-9    4 |   96.8   97.3 ( +0.517%) |  2.827  2.841 ( +0.495%)
dickens     clang-11   4 |  100.0   97.1 ( -2.900%) |  2.827  2.841 ( +0.495%)
dickens     clang-12   4 |   98.9   98.1 ( -0.809%) |  2.827  2.841 ( +0.495%)
enwik8      gcc-4.8    3 |  109.3  110.5 ( +1.098%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-5      3 |  104.8  106.3 ( +1.431%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-6      3 |  107.1  106.2 ( -0.840%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-7      3 |  105.0  106.1 ( +1.048%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-8      3 |  105.7  108.4 ( +2.554%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-10     3 |  105.1  108.1 ( +2.854%) |  2.809  2.820 ( +0.392%)
enwik8      clang-6.0  3 |  111.7  111.1 ( -0.537%) |  2.809  2.820 ( +0.392%)
enwik8      clang-7    3 |  107.6  109.6 ( +1.859%) |  2.809  2.820 ( +0.392%)
enwik8      clang-8    3 |  109.2  110.2 ( +0.916%) |  2.809  2.820 ( +0.392%)
enwik8      clang-9    3 |  109.0  109.0 ( +0.000%) |  2.809  2.820 ( +0.392%)
enwik8      clang-11   3 |  112.5  111.8 ( -0.622%) |  2.809  2.820 ( +0.392%)
enwik8      clang-12   3 |  107.5  109.9 ( +2.233%) |  2.809  2.820 ( +0.392%)
enwik8      gcc-4.8    4 |  103.1  106.5 ( +3.298%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-5      4 |  100.3   99.7 ( -0.598%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-6      4 |  101.9  104.4 ( +2.453%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-7      4 |  102.8  100.8 ( -1.946%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-8      4 |   99.7  100.9 ( +1.204%) |  2.864  2.877 ( +0.454%)
enwik8      gcc-10     4 |   99.3  104.3 ( +5.035%) |  2.864  2.877 ( +0.454%)
enwik8      clang-6.0  4 |  106.5  105.7 ( -0.751%) |  2.864  2.877 ( +0.454%)
enwik8      clang-7    4 |  102.4  103.5 ( +1.074%) |  2.864  2.877 ( +0.454%)
enwik8      clang-8    4 |  104.2  104.6 ( +0.384%) |  2.864  2.877 ( +0.454%)
enwik8      clang-9    4 |  102.4  106.4 ( +3.906%) |  2.864  2.877 ( +0.454%)
enwik8      clang-11   4 |  106.8  103.6 ( -2.996%) |  2.864  2.877 ( +0.454%)
enwik8      clang-12   4 |  105.2  104.1 ( -1.046%) |  2.864  2.877 ( +0.454%)
enwik9      gcc-4.8    3 |  121.7  117.3 ( -3.615%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-5      3 |  113.0  122.1 ( +8.053%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-6      3 |  120.3  123.2 ( +2.411%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-7      3 |  123.3  125.4 ( +1.703%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-8      3 |  124.2  121.7 ( -2.013%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-10     3 |  123.2  125.1 ( +1.542%) |  3.191  3.203 ( +0.376%)
enwik9      clang-6.0  3 |  124.8  125.1 ( +0.240%) |  3.191  3.203 ( +0.376%)
enwik9      clang-7    3 |  119.1  123.5 ( +3.694%) |  3.191  3.203 ( +0.376%)
enwik9      clang-8    3 |  119.5  119.2 ( -0.251%) |  3.191  3.203 ( +0.376%)
enwik9      clang-9    3 |  120.0  120.2 ( +0.167%) |  3.191  3.203 ( +0.376%)
enwik9      clang-11   3 |  123.0  124.6 ( +1.301%) |  3.191  3.203 ( +0.376%)
enwik9      clang-12   3 |  120.8  123.8 ( +2.483%) |  3.191  3.203 ( +0.376%)
enwik9      gcc-4.8    4 |  114.4  111.9 ( -2.185%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-5      4 |  110.1  115.8 ( +5.177%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-6      4 |  115.5  114.4 ( -0.952%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-7      4 |  117.8  117.3 ( -0.424%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-8      4 |  113.5  116.7 ( +2.819%) |  3.253  3.267 ( +0.430%)
enwik9      gcc-10     4 |  111.1  119.1 ( +7.201%) |  3.253  3.267 ( +0.430%)
enwik9      clang-6.0  4 |  117.7  120.0 ( +1.954%) |  3.253  3.267 ( +0.430%)
enwik9      clang-7    4 |  114.7  115.7 ( +0.872%) |  3.253  3.267 ( +0.430%)
enwik9      clang-8    4 |  113.8  118.9 ( +4.482%) |  3.253  3.267 ( +0.430%)
enwik9      clang-9    4 |  116.2  117.2 ( +0.861%) |  3.253  3.267 ( +0.430%)
enwik9      clang-11   4 |  117.1  118.4 ( +1.110%) |  3.253  3.267 ( +0.430%)
enwik9      clang-12   4 |  110.6  114.9 ( +3.888%) |  3.253  3.267 ( +0.430%)
mozilla     gcc-4.8    3 |  148.5  152.1 ( +2.424%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-5      3 |  147.3  152.5 ( +3.530%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-6      3 |  145.2  151.6 ( +4.408%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-7      3 |  149.7  154.8 ( +3.407%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-8      3 |  150.3  152.4 ( +1.397%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-10     3 |  147.5  154.4 ( +4.678%) |  2.768  2.771 ( +0.108%)
mozilla     clang-6.0  3 |  156.4  150.0 ( -4.092%) |  2.768  2.771 ( +0.108%)
mozilla     clang-7    3 |  147.5  153.0 ( +3.729%) |  2.768  2.771 ( +0.108%)
mozilla     clang-8    3 |  145.8  153.8 ( +5.487%) |  2.768  2.771 ( +0.108%)
mozilla     clang-9    3 |  151.1  149.0 ( -1.390%) |  2.768  2.771 ( +0.108%)
mozilla     clang-11   3 |  146.5  152.5 ( +4.096%) |  2.768  2.771 ( +0.108%)
mozilla     clang-12   3 |  145.4  151.8 ( +4.402%) |  2.768  2.771 ( +0.108%)
mozilla     gcc-4.8    4 |  136.0  139.3 ( +2.426%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-5      4 |  135.2  140.3 ( +3.772%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-6      4 |  129.5  139.3 ( +7.568%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-7      4 |  140.2  142.4 ( +1.569%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-8      4 |  135.2  140.9 ( +4.216%) |  2.798  2.801 ( +0.107%)
mozilla     gcc-10     4 |  137.0  141.9 ( +3.577%) |  2.798  2.801 ( +0.107%)
mozilla     clang-6.0  4 |  140.9  137.1 ( -2.697%) |  2.798  2.801 ( +0.107%)
mozilla     clang-7    4 |  133.5  139.8 ( +4.719%) |  2.798  2.801 ( +0.107%)
mozilla     clang-8    4 |  136.9  139.5 ( +1.899%) |  2.798  2.801 ( +0.107%)
mozilla     clang-9    4 |  137.0  137.6 ( +0.438%) |  2.798  2.801 ( +0.107%)
mozilla     clang-11   4 |  138.7  139.5 ( +0.577%) |  2.798  2.801 ( +0.107%)
mozilla     clang-12   4 |  136.6  144.4 ( +5.710%) |  2.798  2.801 ( +0.107%)
mr          gcc-4.8    3 |  117.3  115.8 ( -1.279%) |  2.811  2.810 ( -0.036%)
mr          gcc-5      3 |  115.4  117.5 ( +1.820%) |  2.811  2.810 ( -0.036%)
mr          gcc-6      3 |  118.3  115.9 ( -2.029%) |  2.811  2.810 ( -0.036%)
mr          gcc-7      3 |  120.4  119.2 ( -0.997%) |  2.811  2.810 ( -0.036%)
mr          gcc-8      3 |  118.4  119.6 ( +1.014%) |  2.811  2.810 ( -0.036%)
mr          gcc-10     3 |  116.3  116.2 ( -0.086%) |  2.811  2.810 ( -0.036%)
mr          clang-6.0  3 |  119.6  115.1 ( -3.763%) |  2.811  2.810 ( -0.036%)
mr          clang-7    3 |  112.6  115.0 ( +2.131%) |  2.811  2.810 ( -0.036%)
mr          clang-8    3 |  114.0  117.1 ( +2.719%) |  2.811  2.810 ( -0.036%)
mr          clang-9    3 |  114.6  114.1 ( -0.436%) |  2.811  2.810 ( -0.036%)
mr          clang-11   3 |  109.2  114.2 ( +4.579%) |  2.811  2.810 ( -0.036%)
mr          clang-12   3 |  115.1  114.2 ( -0.782%) |  2.811  2.810 ( -0.036%)
mr          gcc-4.8    4 |  111.8  108.7 ( -2.773%) |  2.861  2.859 ( -0.070%)
mr          gcc-5      4 |  114.0  112.5 ( -1.316%) |  2.861  2.859 ( -0.070%)
mr          gcc-6      4 |  110.2  114.1 ( +3.539%) |  2.861  2.859 ( -0.070%)
mr          gcc-7      4 |  111.2  110.0 ( -1.079%) |  2.861  2.859 ( -0.070%)
mr          gcc-8      4 |  110.8  115.3 ( +4.061%) |  2.861  2.859 ( -0.070%)
mr          gcc-10     4 |  109.4  107.0 ( -2.194%) |  2.861  2.859 ( -0.070%)
mr          clang-6.0  4 |  115.7  109.3 ( -5.532%) |  2.861  2.859 ( -0.070%)
mr          clang-7    4 |  108.9  109.8 ( +0.826%) |  2.861  2.859 ( -0.070%)
mr          clang-8    4 |  110.1  108.4 ( -1.544%) |  2.861  2.859 ( -0.070%)
mr          clang-9    4 |  109.0  108.3 ( -0.642%) |  2.861  2.859 ( -0.070%)
mr          clang-11   4 |  115.2  107.4 ( -6.771%) |  2.861  2.859 ( -0.070%)
mr          clang-12   4 |  112.0  111.9 ( -0.089%) |  2.861  2.859 ( -0.070%)
nci         gcc-4.8    3 |  419.4  412.4 ( -1.669%) | 11.740 11.800 ( +0.511%)
nci         gcc-5      3 |  409.7  415.5 ( +1.416%) | 11.740 11.800 ( +0.511%)
nci         gcc-6      3 |  413.8  415.2 ( +0.338%) | 11.740 11.800 ( +0.511%)
nci         gcc-7      3 |  417.2  413.6 ( -0.863%) | 11.740 11.800 ( +0.511%)
nci         gcc-8      3 |  410.4  413.1 ( +0.658%) | 11.740 11.800 ( +0.511%)
nci         gcc-10     3 |  416.2  408.7 ( -1.802%) | 11.740 11.800 ( +0.511%)
nci         clang-6.0  3 |  424.2  399.5 ( -5.823%) | 11.740 11.800 ( +0.511%)
nci         clang-7    3 |  419.5  422.3 ( +0.667%) | 11.740 11.800 ( +0.511%)
nci         clang-8    3 |  433.3  413.4 ( -4.593%) | 11.740 11.800 ( +0.511%)
nci         clang-9    3 |  433.1  424.2 ( -2.055%) | 11.740 11.800 ( +0.511%)
nci         clang-11   3 |  438.4  412.1 ( -5.999%) | 11.740 11.800 ( +0.511%)
nci         clang-12   3 |  426.4  423.3 ( -0.727%) | 11.740 11.800 ( +0.511%)
nci         gcc-4.8    4 |  423.8  424.0 ( +0.047%) | 11.750 11.800 ( +0.426%)
nci         gcc-5      4 |  420.2  422.8 ( +0.619%) | 11.750 11.800 ( +0.426%)
nci         gcc-6      4 |  389.7  409.8 ( +5.158%) | 11.750 11.800 ( +0.426%)
nci         gcc-7      4 |  425.4  421.3 ( -0.964%) | 11.750 11.800 ( +0.426%)
nci         gcc-8      4 |  418.1  421.9 ( +0.909%) | 11.750 11.800 ( +0.426%)
nci         gcc-10     4 |  425.0  418.0 ( -1.647%) | 11.750 11.800 ( +0.426%)
nci         clang-6.0  4 |  435.3  394.6 ( -9.350%) | 11.750 11.800 ( +0.426%)
nci         clang-7    4 |  427.3  424.4 ( -0.679%) | 11.750 11.800 ( +0.426%)
nci         clang-8    4 |  444.6  417.7 ( -6.050%) | 11.750 11.800 ( +0.426%)
nci         clang-9    4 |  441.1  419.2 ( -4.965%) | 11.750 11.800 ( +0.426%)
nci         clang-11   4 |  437.2  419.2 ( -4.117%) | 11.750 11.800 ( +0.426%)
nci         clang-12   4 |  426.8  406.0 ( -4.873%) | 11.750 11.800 ( +0.426%)
ooffice     gcc-4.8    3 |   99.6  107.4 ( +7.831%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-5      3 |   98.6  108.6 (+10.142%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-6      3 |  101.1  103.9 ( +2.770%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-7      3 |  101.9  108.6 ( +6.575%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-8      3 |  100.3  107.8 ( +7.478%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-10     3 |  101.0  109.6 ( +8.515%) |  1.956  1.957 ( +0.051%)
ooffice     clang-6.0  3 |  108.1  107.8 ( -0.278%) |  1.956  1.957 ( +0.051%)
ooffice     clang-7    3 |   93.5  105.5 (+12.834%) |  1.956  1.957 ( +0.051%)
ooffice     clang-8    3 |   92.6  104.3 (+12.635%) |  1.956  1.957 ( +0.051%)
ooffice     clang-9    3 |   97.0  107.5 (+10.825%) |  1.956  1.957 ( +0.051%)
ooffice     clang-11   3 |   96.0  108.0 (+12.500%) |  1.956  1.957 ( +0.051%)
ooffice     clang-12   3 |   98.3  104.4 ( +6.205%) |  1.956  1.957 ( +0.051%)
ooffice     gcc-4.8    4 |   94.2   98.5 ( +4.565%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-5      4 |   94.2   98.6 ( +4.671%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-6      4 |   93.1   96.2 ( +3.330%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-7      4 |   92.8   99.7 ( +7.435%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-8      4 |   90.9   98.5 ( +8.361%) |  2.003  2.004 ( +0.050%)
ooffice     gcc-10     4 |   94.1   99.6 ( +5.845%) |  2.003  2.004 ( +0.050%)
ooffice     clang-6.0  4 |   98.0   99.4 ( +1.429%) |  2.003  2.004 ( +0.050%)
ooffice     clang-7    4 |   89.1   97.7 ( +9.652%) |  2.003  2.004 ( +0.050%)
ooffice     clang-8    4 |   90.8   94.9 ( +4.515%) |  2.003  2.004 ( +0.050%)
ooffice     clang-9    4 |   90.2   97.4 ( +7.982%) |  2.003  2.004 ( +0.050%)
ooffice     clang-11   4 |   91.6   99.3 ( +8.406%) |  2.003  2.004 ( +0.050%)
ooffice     clang-12   4 |   91.0  101.3 (+11.319%) |  2.003  2.004 ( +0.050%)
osdb        gcc-4.8    3 |  142.6  145.0 ( +1.683%) |  2.867  2.876 ( +0.314%)
osdb        gcc-5      3 |  134.2  148.3 (+10.507%) |  2.867  2.876 ( +0.314%)
osdb        gcc-6      3 |  140.9  145.1 ( +2.981%) |  2.867  2.876 ( +0.314%)
osdb        gcc-7      3 |  138.7  142.1 ( +2.451%) |  2.867  2.876 ( +0.314%)
osdb        gcc-8      3 |  136.4  143.4 ( +5.132%) |  2.867  2.876 ( +0.314%)
osdb        gcc-10     3 |  136.8  145.7 ( +6.506%) |  2.867  2.876 ( +0.314%)
osdb        clang-6.0  3 |  141.3  145.4 ( +2.902%) |  2.867  2.876 ( +0.314%)
osdb        clang-7    3 |  137.9  150.4 ( +9.065%) |  2.867  2.876 ( +0.314%)
osdb        clang-8    3 |  132.5  147.7 (+11.472%) |  2.867  2.876 ( +0.314%)
osdb        clang-9    3 |  135.6  139.3 ( +2.729%) |  2.867  2.876 ( +0.314%)
osdb        clang-11   3 |  134.9  151.0 (+11.935%) |  2.867  2.876 ( +0.314%)
osdb        clang-12   3 |  129.2  141.1 ( +9.211%) |  2.867  2.876 ( +0.314%)
osdb        gcc-4.8    4 |  127.3  132.4 ( +4.006%) |  2.885  2.895 ( +0.347%)
osdb        gcc-5      4 |  123.3  135.7 (+10.057%) |  2.885  2.895 ( +0.347%)
osdb        gcc-6      4 |  124.5  133.6 ( +7.309%) |  2.885  2.895 ( +0.347%)
osdb        gcc-7      4 |  125.1  133.7 ( +6.875%) |  2.885  2.895 ( +0.347%)
osdb        gcc-8      4 |  121.4  136.8 (+12.685%) |  2.885  2.895 ( +0.347%)
osdb        gcc-10     4 |  124.8  142.6 (+14.263%) |  2.885  2.895 ( +0.347%)
osdb        clang-6.0  4 |  132.6  135.4 ( +2.112%) |  2.885  2.895 ( +0.347%)
osdb        clang-7    4 |  129.4  134.9 ( +4.250%) |  2.885  2.895 ( +0.347%)
osdb        clang-8    4 |  130.9  135.2 ( +3.285%) |  2.885  2.895 ( +0.347%)
osdb        clang-9    4 |  120.0  132.5 (+10.417%) |  2.885  2.895 ( +0.347%)
osdb        clang-11   4 |  129.3  138.6 ( +7.193%) |  2.885  2.895 ( +0.347%)
osdb        clang-12   4 |  122.2  131.8 ( +7.856%) |  2.885  2.895 ( +0.347%)
reymont     gcc-4.8    3 |  127.7  117.1 ( -8.301%) |  3.392  3.413 ( +0.619%)
reymont     gcc-5      3 |  123.5  124.1 ( +0.486%) |  3.392  3.413 ( +0.619%)
reymont     gcc-6      3 |  130.6  131.0 ( +0.306%) |  3.392  3.413 ( +0.619%)
reymont     gcc-7      3 |  127.5  129.9 ( +1.882%) |  3.392  3.413 ( +0.619%)
reymont     gcc-8      3 |  127.1  122.1 ( -3.934%) |  3.392  3.413 ( +0.619%)
reymont     gcc-10     3 |  124.3  126.0 ( +1.368%) |  3.392  3.413 ( +0.619%)
reymont     clang-6.0  3 |  127.6  127.6 ( +0.000%) |  3.392  3.413 ( +0.619%)
reymont     clang-7    3 |  125.3  126.8 ( +1.197%) |  3.392  3.413 ( +0.619%)
reymont     clang-8    3 |  127.1  126.7 ( -0.315%) |  3.392  3.413 ( +0.619%)
reymont     clang-9    3 |  126.1  124.5 ( -1.269%) |  3.392  3.413 ( +0.619%)
reymont     clang-11   3 |  124.5  125.5 ( +0.803%) |  3.392  3.413 ( +0.619%)
reymont     clang-12   3 |  122.8  125.9 ( +2.524%) |  3.392  3.413 ( +0.619%)
reymont     gcc-4.8    4 |  127.7  119.0 ( -6.813%) |  3.429  3.453 ( +0.700%)
reymont     gcc-5      4 |  123.6  125.6 ( +1.618%) |  3.429  3.453 ( +0.700%)
reymont     gcc-6      4 |  128.9  135.8 ( +5.353%) |  3.429  3.453 ( +0.700%)
reymont     gcc-7      4 |  128.7  130.0 ( +1.010%) |  3.429  3.453 ( +0.700%)
reymont     gcc-8      4 |  133.3  119.9 (-10.053%) |  3.429  3.453 ( +0.700%)
reymont     gcc-10     4 |  124.7  124.4 ( -0.241%) |  3.429  3.453 ( +0.700%)
reymont     clang-6.0  4 |  130.1  129.6 ( -0.384%) |  3.429  3.453 ( +0.700%)
reymont     clang-7    4 |  128.6  126.2 ( -1.866%) |  3.429  3.453 ( +0.700%)
reymont     clang-8    4 |  129.0  127.8 ( -0.930%) |  3.429  3.453 ( +0.700%)
reymont     clang-9    4 |  129.6  122.3 ( -5.633%) |  3.429  3.453 ( +0.700%)
reymont     clang-11   4 |  127.9  127.1 ( -0.625%) |  3.429  3.453 ( +0.700%)
reymont     clang-12   4 |  125.7  126.8 ( +0.875%) |  3.429  3.453 ( +0.700%)
samba       gcc-4.8    3 |  201.9  206.8 ( +2.427%) |  4.320  4.342 ( +0.509%)
samba       gcc-5      3 |  201.4  211.6 ( +5.065%) |  4.320  4.342 ( +0.509%)
samba       gcc-6      3 |  205.6  208.4 ( +1.362%) |  4.320  4.342 ( +0.509%)
samba       gcc-7      3 |  205.3  205.6 ( +0.146%) |  4.320  4.342 ( +0.509%)
samba       gcc-8      3 |  205.7  210.4 ( +2.285%) |  4.320  4.342 ( +0.509%)
samba       gcc-10     3 |  204.8  202.8 ( -0.977%) |  4.320  4.342 ( +0.509%)
samba       clang-6.0  3 |  209.9  201.9 ( -3.811%) |  4.320  4.342 ( +0.509%)
samba       clang-7    3 |  201.3  207.8 ( +3.229%) |  4.320  4.342 ( +0.509%)
samba       clang-8    3 |  196.2  200.8 ( +2.345%) |  4.320  4.342 ( +0.509%)
samba       clang-9    3 |  200.8  204.5 ( +1.843%) |  4.320  4.342 ( +0.509%)
samba       clang-11   3 |  202.7  207.8 ( +2.516%) |  4.320  4.342 ( +0.509%)
samba       clang-12   3 |  202.0  200.6 ( -0.693%) |  4.320  4.342 ( +0.509%)
samba       gcc-4.8    4 |  194.5  200.7 ( +3.188%) |  4.349  4.373 ( +0.552%)
samba       gcc-5      4 |  190.0  206.1 ( +8.474%) |  4.349  4.373 ( +0.552%)
samba       gcc-6      4 |  198.1  193.5 ( -2.322%) |  4.349  4.373 ( +0.552%)
samba       gcc-7      4 |  199.3  187.9 ( -5.720%) |  4.349  4.373 ( +0.552%)
samba       gcc-8      4 |  190.6  192.5 ( +0.997%) |  4.349  4.373 ( +0.552%)
samba       gcc-10     4 |  194.9  193.9 ( -0.513%) |  4.349  4.373 ( +0.552%)
samba       clang-6.0  4 |  196.0  188.5 ( -3.827%) |  4.349  4.373 ( +0.552%)
samba       clang-7    4 |  188.4  196.8 ( +4.459%) |  4.349  4.373 ( +0.552%)
samba       clang-8    4 |  180.4  192.2 ( +6.541%) |  4.349  4.373 ( +0.552%)
samba       clang-9    4 |  195.8  194.1 ( -0.868%) |  4.349  4.373 ( +0.552%)
samba       clang-11   4 |  196.2  195.2 ( -0.510%) |  4.349  4.373 ( +0.552%)
samba       clang-12   4 |  192.1  195.2 ( +1.614%) |  4.349  4.373 ( +0.552%)
sao         gcc-4.8    3 |   75.3   84.7 (+12.483%) |  1.306  1.306 ( +0.000%)
sao         gcc-5      3 |   76.7   86.6 (+12.907%) |  1.306  1.306 ( +0.000%)
sao         gcc-6      3 |   76.4   81.4 ( +6.545%) |  1.306  1.306 ( +0.000%)
sao         gcc-7      3 |   73.7   85.8 (+16.418%) |  1.306  1.306 ( +0.000%)
sao         gcc-8      3 |   74.8   81.2 ( +8.556%) |  1.306  1.306 ( +0.000%)
sao         gcc-10     3 |   73.8   78.6 ( +6.504%) |  1.306  1.306 ( +0.000%)
sao         clang-6.0  3 |   81.4   84.4 ( +3.686%) |  1.306  1.306 ( +0.000%)
sao         clang-7    3 |   71.7   84.7 (+18.131%) |  1.306  1.306 ( +0.000%)
sao         clang-8    3 |   71.3   83.1 (+16.550%) |  1.306  1.306 ( +0.000%)
sao         clang-9    3 |   72.5   84.0 (+15.862%) |  1.306  1.306 ( +0.000%)
sao         clang-11   3 |   73.4   86.9 (+18.392%) |  1.306  1.306 ( +0.000%)
sao         clang-12   3 |   72.9   85.9 (+17.833%) |  1.306  1.306 ( +0.000%)
sao         gcc-4.8    4 |   69.7   77.9 (+11.765%) |  1.337  1.337 ( +0.000%)
sao         gcc-5      4 |   69.6   77.4 (+11.207%) |  1.337  1.337 ( +0.000%)
sao         gcc-6      4 |   70.5   74.8 ( +6.099%) |  1.337  1.337 ( +0.000%)
sao         gcc-7      4 |   68.7   75.1 ( +9.316%) |  1.337  1.337 ( +0.000%)
sao         gcc-8      4 |   69.0   74.3 ( +7.681%) |  1.337  1.337 ( +0.000%)
sao         gcc-10     4 |   66.8   71.8 ( +7.485%) |  1.337  1.337 ( +0.000%)
sao         clang-6.0  4 |   73.8   74.7 ( +1.220%) |  1.337  1.337 ( +0.000%)
sao         clang-7    4 |   65.4   74.8 (+14.373%) |  1.337  1.337 ( +0.000%)
sao         clang-8    4 |   64.8   73.0 (+12.654%) |  1.337  1.337 ( +0.000%)
sao         clang-9    4 |   62.0   77.4 (+24.839%) |  1.337  1.337 ( +0.000%)
sao         clang-11   4 |   67.7   77.3 (+14.180%) |  1.337  1.337 ( +0.000%)
sao         clang-12   4 |   68.2   76.0 (+11.437%) |  1.337  1.337 ( +0.000%)
webster     gcc-4.8    3 |  126.6  126.8 ( +0.158%) |  3.403  3.420 ( +0.500%)
webster     gcc-5      3 |  117.1  124.2 ( +6.063%) |  3.403  3.420 ( +0.500%)
webster     gcc-6      3 |  122.9  124.0 ( +0.895%) |  3.403  3.420 ( +0.500%)
webster     gcc-7      3 |  126.3  125.5 ( -0.633%) |  3.403  3.420 ( +0.500%)
webster     gcc-8      3 |  127.6  124.4 ( -2.508%) |  3.403  3.420 ( +0.500%)
webster     gcc-10     3 |  123.5  124.7 ( +0.972%) |  3.403  3.420 ( +0.500%)
webster     clang-6.0  3 |  128.3  122.5 ( -4.521%) |  3.403  3.420 ( +0.500%)
webster     clang-7    3 |  124.3  125.2 ( +0.724%) |  3.403  3.420 ( +0.500%)
webster     clang-8    3 |  121.0  120.2 ( -0.661%) |  3.403  3.420 ( +0.500%)
webster     clang-9    3 |  123.1  122.3 ( -0.650%) |  3.403  3.420 ( +0.500%)
webster     clang-11   3 |  120.7  118.8 ( -1.574%) |  3.403  3.420 ( +0.500%)
webster     clang-12   3 |  117.9  122.1 ( +3.562%) |  3.403  3.420 ( +0.500%)
webster     gcc-4.8    4 |  124.4  122.4 ( -1.608%) |  3.455  3.475 ( +0.579%)
webster     gcc-5      4 |  114.5  121.4 ( +6.026%) |  3.455  3.475 ( +0.579%)
webster     gcc-6      4 |  118.2  118.1 ( -0.085%) |  3.455  3.475 ( +0.579%)
webster     gcc-7      4 |  120.9  119.8 ( -0.910%) |  3.455  3.475 ( +0.579%)
webster     gcc-8      4 |  121.0  122.4 ( +1.157%) |  3.455  3.475 ( +0.579%)
webster     gcc-10     4 |  124.3  119.6 ( -3.781%) |  3.455  3.475 ( +0.579%)
webster     clang-6.0  4 |  124.6  120.1 ( -3.612%) |  3.455  3.475 ( +0.579%)
webster     clang-7    4 |  122.6  119.1 ( -2.855%) |  3.455  3.475 ( +0.579%)
webster     clang-8    4 |  120.2  118.0 ( -1.830%) |  3.455  3.475 ( +0.579%)
webster     clang-9    4 |  118.9  121.1 ( +1.850%) |  3.455  3.475 ( +0.579%)
webster     clang-11   4 |  116.2  120.7 ( +3.873%) |  3.455  3.475 ( +0.579%)
webster     clang-12   4 |  121.4  121.2 ( -0.165%) |  3.455  3.475 ( +0.579%)
xml         gcc-4.8    3 |  313.2  308.7 ( -1.437%) |  8.357  8.363 ( +0.072%)
xml         gcc-5      3 |  306.9  313.1 ( +2.020%) |  8.357  8.363 ( +0.072%)
xml         gcc-6      3 |  300.0  304.7 ( +1.567%) |  8.357  8.363 ( +0.072%)
xml         gcc-7      3 |  313.5  312.9 ( -0.191%) |  8.357  8.363 ( +0.072%)
xml         gcc-8      3 |  315.3  315.4 ( +0.032%) |  8.357  8.363 ( +0.072%)
xml         gcc-10     3 |  308.6  310.1 ( +0.486%) |  8.357  8.363 ( +0.072%)
xml         clang-6.0  3 |  318.1  311.3 ( -2.138%) |  8.357  8.363 ( +0.072%)
xml         clang-7    3 |  309.8  308.6 ( -0.387%) |  8.357  8.363 ( +0.072%)
xml         clang-8    3 |  311.6  310.6 ( -0.321%) |  8.357  8.363 ( +0.072%)
xml         clang-9    3 |  313.1  312.9 ( -0.064%) |  8.357  8.363 ( +0.072%)
xml         clang-11   3 |  321.2  318.7 ( -0.778%) |  8.357  8.363 ( +0.072%)
xml         clang-12   3 |  315.9  315.1 ( -0.253%) |  8.357  8.363 ( +0.072%)
xml         gcc-4.8    4 |  313.2  317.0 ( +1.213%) |  8.384  8.390 ( +0.072%)
xml         gcc-5      4 |  305.4  313.7 ( +2.718%) |  8.384  8.390 ( +0.072%)
xml         gcc-6      4 |  292.7  311.8 ( +6.525%) |  8.384  8.390 ( +0.072%)
xml         gcc-7      4 |  310.0  312.3 ( +0.742%) |  8.384  8.390 ( +0.072%)
xml         gcc-8      4 |  316.9  313.2 ( -1.168%) |  8.384  8.390 ( +0.072%)
xml         gcc-10     4 |  310.3  308.9 ( -0.451%) |  8.384  8.390 ( +0.072%)
xml         clang-6.0  4 |  319.9  315.7 ( -1.313%) |  8.384  8.390 ( +0.072%)
xml         clang-7    4 |  310.3  309.6 ( -0.226%) |  8.384  8.390 ( +0.072%)
xml         clang-8    4 |  316.3  292.8 ( -7.430%) |  8.384  8.390 ( +0.072%)
xml         clang-9    4 |  319.4  317.2 ( -0.689%) |  8.384  8.390 ( +0.072%)
xml         clang-11   4 |  331.8  317.2 ( -4.400%) |  8.384  8.390 ( +0.072%)
xml         clang-12   4 |  315.3  319.7 ( +1.395%) |  8.384  8.390 ( +0.072%)
x-ray       gcc-4.8    3 |   71.3   77.0 ( +7.994%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-5      3 |   69.8   77.5 (+11.032%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-6      3 |   70.6   77.1 ( +9.207%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-7      3 |   68.6   75.6 (+10.204%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-8      3 |   67.4   74.4 (+10.386%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-10     3 |   68.6   72.3 ( +5.394%) |  1.393  1.393 ( +0.000%)
x-ray       clang-6.0  3 |   75.7   76.3 ( +0.793%) |  1.393  1.393 ( +0.000%)
x-ray       clang-7    3 |   64.9   79.3 (+22.188%) |  1.393  1.393 ( +0.000%)
x-ray       clang-8    3 |   70.9   77.4 ( +9.168%) |  1.393  1.393 ( +0.000%)
x-ray       clang-9    3 |   66.1   74.8 (+13.162%) |  1.393  1.393 ( +0.000%)
x-ray       clang-11   3 |   68.2   75.3 (+10.411%) |  1.393  1.393 ( +0.000%)
x-ray       clang-12   3 |   66.3   76.8 (+15.837%) |  1.393  1.393 ( +0.000%)
x-ray       gcc-4.8    4 |   65.8   68.4 ( +3.951%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-5      4 |   65.2   68.0 ( +4.294%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-6      4 |   64.9   67.0 ( +3.236%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-7      4 |   62.0   64.9 ( +4.677%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-8      4 |   62.4   66.3 ( +6.250%) |  1.484  1.484 ( +0.000%)
x-ray       gcc-10     4 |   62.7   67.0 ( +6.858%) |  1.484  1.484 ( +0.000%)
x-ray       clang-6.0  4 |   67.9   65.3 ( -3.829%) |  1.484  1.484 ( +0.000%)
x-ray       clang-7    4 |   64.5   69.0 ( +6.977%) |  1.484  1.484 ( +0.000%)
x-ray       clang-8    4 |   66.1   70.8 ( +7.110%) |  1.484  1.484 ( +0.000%)
x-ray       clang-9    4 |   61.9   67.7 ( +9.370%) |  1.484  1.484 ( +0.000%)
x-ray       clang-11   4 |   64.7   67.4 ( +4.173%) |  1.484  1.484 ( +0.000%)
x-ray       clang-12   4 |   62.4   67.0 ( +7.372%) |  1.484  1.484 ( +0.000%)
silesia.tar gcc-4.8    3 |  146.2  149.0 ( +1.915%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-5      3 |  142.0  139.1 ( -2.042%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-6      3 |  146.6  150.0 ( +2.319%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-7      3 |  143.5  147.6 ( +2.857%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-8      3 |  144.8  145.5 ( +0.483%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-10     3 |  143.1  146.2 ( +2.166%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-6.0  3 |  147.7  147.3 ( -0.271%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-7    3 |  142.6  148.0 ( +3.787%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-8    3 |  141.3  150.4 ( +6.440%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-9    3 |  143.6  150.2 ( +4.596%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-11   3 |  143.7  149.5 ( +4.036%) |  3.179  3.187 ( +0.252%)
silesia.tar clang-12   3 |  142.8  149.4 ( +4.622%) |  3.179  3.187 ( +0.252%)
silesia.tar gcc-4.8    4 |  135.9  139.8 ( +2.870%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-5      4 |  134.4  138.1 ( +2.753%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-6      4 |  134.2  139.7 ( +4.098%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-7      4 |  135.5  140.1 ( +3.395%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-8      4 |  137.0  140.0 ( +2.190%) |  3.237  3.246 ( +0.278%)
silesia.tar gcc-10     4 |  138.3  136.0 ( -1.663%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-6.0  4 |  141.4  138.2 ( -2.263%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-7    4 |  137.9  142.2 ( +3.118%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-8    4 |  134.8  140.7 ( +4.377%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-9    4 |  140.0  140.7 ( +0.500%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-11   4 |  135.9  138.3 ( +1.766%) |  3.237  3.246 ( +0.278%)
silesia.tar clang-12   4 |  138.2  132.2 ( -4.342%) |  3.237  3.246 ( +0.278%)

Benchmarked on, as usual, an Intel Xeon E5-2680 v4 @ 2.40GHz.

</details>

On the whole we see improvements in ratio, and improvements on speed on less-compressible inputs. It seems like very compressible inputs are neutral on speed of even maybe slightly slower.

Status

This PR is believed to be speed-positive, ratio-positive, and correct.

To-Do:

  • [x] Correctness.
  • [x] Match or improve ratio.
  • [x] Match or improve speed.
  • [x] Simplify DFast DMS implementation.
  • [x] Benchmark.
+486 -331

3 comments

3 changed files

felixhandte

pr closed time in 2 months

push eventfelixhandte/zstd

W. Felix Handte

commit sha 0bfc935add6f47630c28b4e9215026e5860cb85e

Convert Outer Control Structure to Loop

view details

push time in 2 months

push eventfelixhandte/zstd

Clément Chigot

commit sha 399849e236bdaa1215cf90e41606e222be23735c

Makefile: add AIX support For lib, AIX linker doesn't allow --soname.

view details

Clément Chigot

commit sha 6ef6cd79995cd0f4e50a989210729b59d017b735

test: avoid /dev/full on AIX

view details

W. Felix Handte

commit sha ab8aa49b8d2383adfa9ef86ab4bd4a1a0d43c7fa

Fix Benchmark Corruption Display

view details

W. Felix Handte

commit sha 80bc12b33a039d5bef59d7ca4ce8809feabbfbaa

Initial Pipelined Implementation for ZSTD_fast

view details

W. Felix Handte

commit sha bc768bccc036fe0d60ff63fe240dfca7474636d9

Track Step Size Statefully, Rather than Recalculating Every Time

view details

W. Felix Handte

commit sha 387840af79a86660c8f83bcd6cc4de584de9f3a6

Re-Order Operations for Slightly Better Performance

view details

W. Felix Handte

commit sha b092dd75b7e0ed48dc94fe58391aac7b805cb178

Shrink Pipeline from 4 Positions to 3

view details

W. Felix Handte

commit sha 35932ab2f1e129dce0d19bfa82787dc4dc262eed

Prefetch Input in Incompressible Sections (+0.25% Speed)

view details

W. Felix Handte

commit sha 7c24c3e6ce8faa5a6d23ef9d11c71a0a106b3bb3

Give Up on Searching End of Block Amusingly, it seems to be a non-trivial performance hit to add in final searches or even hash table insertions during cleanup. So let's not. It seems to not make any meaningful difference in compression ratio.

view details

W. Felix Handte

commit sha 8706bc115a28757a7a684c4a6bcf435ad9e5eb03

Nit: Dedup idx0 and idx1

view details

W. Felix Handte

commit sha 991d660ea9fc0e0453a6fa580831352478b56104

Nit: Only Store 2 Hash Variables

view details

W. Felix Handte

commit sha 57a100f6dcb46fff20eacdfc9fc000b0f226b76f

Add `ip1 + 128` Prefetch; Tiny Cleanup

view details

W. Felix Handte

commit sha 24fcccd05c6a3609715b9d9d1020129105c55116

Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed) Unrolling the loop to handle 2 positions in each iteration allows us to reduce the frequency of some operations that don't need to happen at every position. One such operation is the step calculation, which is a very rough heuristic anyways. It's fine if we do this a position later. The other operation is the repcode check. But since the repcode check already tries expanding back one position, we're really not missing much of importance by only trying it every other position. This commit also slightly reorders some operations.

view details

W. Felix Handte

commit sha 64054dec442a99e4c065be1319202e18bd4b8d8a

Tweak Step

view details

W. Felix Handte

commit sha 15e67bfa7e7ec1384e42001ef1eeb5af9a896f02

Deduplicate Implementations This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new `ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is functionally a no-op.

view details

W. Felix Handte

commit sha 98d3df326b8dfddd11786e45e7ba8406ffc08942

Change Target Size in Fuzzer It's a bit strange, because this is hitting the dictionary special case where the dictionary is contiguous with the input and still runs in the single- segment path. We should probably change that to hit the `extDict` path instead?

view details

W. Felix Handte

commit sha d6fd7761c963db8b88c14210f9ca1f972fe7fd71

Fix VS Build: Explicitly Cast to Narrow Ints

view details

W. Felix Handte

commit sha b0977e4ed2e58a2db2eaf6be6721393fe964daa9

Update results.csv

view details

senhuang42

commit sha 414e24becfa2f5127bd6583216755d8d9149a6ed

Add 8 bytes to FSE workspace

view details

Yann Collet

commit sha 70d89e5a128c73f81d362332f4b6b1b2b8764f15

minor rebalancing of level 13 This new setup is slighly better on `silesia.tar` : Ratio : 3.649 -> 3.655 Speed : 11.9 MB/s -> 12.2 MB/s At the cost of more memory : 24 MB -> 32 MB The new memory budget is a reasonable interpolation between neighboring levels 12 and 14: level 12 : 24 MB level 13 : 32 MB (increased from 24 MB) level 14 : 48 MB Window size remains unaffected (4 MB)

view details

push time in 2 months

push eventfelixhandte/zstd

Clément Chigot

commit sha 399849e236bdaa1215cf90e41606e222be23735c

Makefile: add AIX support For lib, AIX linker doesn't allow --soname.

view details

Clément Chigot

commit sha 6ef6cd79995cd0f4e50a989210729b59d017b735

test: avoid /dev/full on AIX

view details

Yann Collet

commit sha eab692211eca2943439b7894b210a85e6931bdd9

removed pretty-print of sizes in benchmark This is less appropriate for this mode : benchmark is about accuracy, it's important to read the exact values.

view details

Yann Collet

commit sha f0fc8cb3e19e4685bec012bd0bc881b2d5664899

Disable console notification by default within the library As a library, the default shouldn't be to write anything on console. `cover` and `fastcover` have a `g_displayLevel` variable to control this behavior. It's now set to 0 (no display) by default. Setting notification to a higher level should be an explicit operation by a console application.

view details

Yann Collet

commit sha 27a8bbe26539cca301da3e013955b57e7275d82c

new initializer for ll price

view details

Yann Collet

commit sha 23a9368c45a0c87ec1a34f8a7fbeff3807dfe967

new starting offcode table for zstd_opt

view details

Yann Collet

commit sha 08ceda3dfc9c1e4ae7e35d210e0318a696f7f394

new statistics update policy small general compression ratio improvement for btopt+ strategies/

view details

Yann Collet

commit sha b096a5c62632b39ec6ad6eb68bc3bafc50be8df4

updated regression tests

view details

Yann Collet

commit sha 42a3ed752a473d813e84ac9cf5d86589060a54ce

removed frequency booster for stat initialization of btultra2 used to be necessary to counter-balance the fixed-weight frequency update which has been recently changed for an adaptive rate (targeting stable starting frequency stats).

view details

Yann Collet

commit sha ef78611c269bc96cf88748cdd0e6d9e5c3ad74b7

change update rate to 11/10/10/10 better for larger blocks, very small inefficiency on small block.

view details

Yann Collet

commit sha 7fce9a41b599eab4c5e586a984e436a38eee7121

change update rate to 12/11/11/11 better for large files, and sources with relatively "stable" entropy, like silesia.tar. slightly worse for files with rapidly changing entropy, like Calgary.tar/. Updated small files tests in fuzzer

view details

Yann Collet

commit sha 4f0b1b9ee5b6e1000c36e4696758ecd57cdb39b7

update regression tests

view details

Yann Collet

commit sha b7f46ebc234ba4ee691a2f19233179b3c90676f6

use ZSTD_memcpy() for better portability notably within kernel space

view details

Yann Collet

commit sha 5449ede2e6e4f15d10b30f0908c8bd7b4182ced6

make automated-benchmarking faster by employing parallel compilation of object files.

view details

Eli Schwartz

commit sha 193aa49673f98c3ad5791dae7024a1c3ee791f38

meson: fix type error for integer option meson forgave using the wrong type, but this isn't guaranteed. muon simply failed.

view details

Yann Collet

commit sha c10067c44e1349d699641d8ffb836e113952e8fe

Merge pull request #2775 from eli-schwartz/meson meson: fix type error for integer option

view details

Yann Collet

commit sha 640c5b1f7740193733bb6322a783903f18150a4b

fix automated_benchmarking make it able to process text output sent into either stdout or stderr

view details

Yann Collet

commit sha f58e63bee735ca73f90fb08426e84085a28d3659

Merge branch 'dev' into opt_investigation

view details

Yann Collet

commit sha b6b2855b8060d20053dc81d0aa5dd1a0b822a8de

updated regression tests

view details

Sen Huang

commit sha 1daf3c8dbc71027b2d57ac6a01f161247fdb70aa

Use 32 buckets for log2 bucketing in huffman sort

view details

push time in 2 months

issue commentfacebook/zstd

Ways to allow projects to use a minimal decompression-nodict implementation

I think so. :)

So it sounds like there are a set of code changes we could merge, and the rest would be a mechanical conversion that we could automate (and again, run in continuous testing). That works for me!

I guess keep us updated about what the busybox folks say?

nolange

comment created time in 2 months

push eventfacebook/zstd

Norbert Lange

commit sha 02296cac8293bdc68cd39f998dbba05a301461f3

decompress: conditionally remove legacy members from context Remove the then unneeded variables from the struct, and all accesses to them.

view details

Norbert Lange

commit sha 0d455406950cbd5c2ab737e0a9d03a5d86c8e603

decompress: conditionally remove bmi2 from context Use an helper function, which will just return 0 in case the feature is disabled. Allows constant propagation and removal of dead code.

view details

Norbert Lange

commit sha 6763f403318034372b8d52de73a23066d04ec750

zstd_decompress: use a helper function for context create Multiple ZSTD_createDCtx* functions call other (public) ZSTD_createDCtx* functions, this makes it harder for humans and compilers to throw out code that is not used. This farms out the logic into a static function, if a program only uses a single ZSTD_createDCtx variant, all others can be easily dropped and the remaining implementation can be specialized.

view details

Felix Handte

commit sha 8b7a19fcd467e84c1c5269335d68b0ece45054a7

Merge pull request #2805 from nolange/smaller_code_with_disabled_features Smaller code with disabled features

view details

push time in 2 months

PR merged facebook/zstd

Smaller code with disabled features CLA Signed

Hello,

Since zstd is everywhere nowadays, it would be time to get it into projects that pragmatically aim at code-size. The aim of this series if to allow the developer a relatively easy way of dropping unused code, and the compiler to propagate constants, partially specialize functions and remove a bigger junk of then dead code.

Since I expect some discussions covering all of those changes I put them into one pull request.

+38 -16

3 comments

4 changed files

nolange

pr closed time in 2 months

PullRequestReviewEvent

issue commentfacebook/zstd

Ways to allow projects to use a minimal decompression-nodict implementation

Hi @nolange,

I'm really excited to see your interest in bringing zstd to busybox! We'd be very happy to be included in the toolkit.

It sounds like the overriding need for busybox is to minimize code size. As you may have noticed, this is not the situation that our implementation primarily targets. :smile: The overwhelming priority for most of our users is getting the most performance possible.

But we certainly recognize that different use cases have different needs, and to the extent that we can support less common use cases in the reference zstd implementation, we'd like to. So if practical, we'd be happy to host the necessary changes in the mainline. That way we can get a test set up for your use case in our continuous-testing infrastructure, so we make sure future changes work for you.

And it would be nice to do this in part because busybox is also not the only use case where binary size is really important! We've had a number of folks wanting to do zstd decompression in the browser via WASM, and lib size is an important optimization axis there as well. So this work would benefit them as well.

The question is just how to serve this different profile in a way that co-exists with the main performance-optimized zstd.

So far it looks like everything you're doing is introducing preprocessor conditionals. This is great in terms of knowing that when not activated, it won't have any impact at all on the default configuration. The cost is just codebase complexity. But hopefully we can keep that not too crazy. And good contbuild coverage will help. #2805 seems fine to me. Even b40046fb707b96b4a08c79321586612ee74a37f2 isn't toooo terrible to my eyes, although I have some questions.

What do you imagine the total scope of changes would be that you would want to make?

nolange

comment created time in 2 months

more