profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/felixhandte/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Felix Handte felixhandte @facebook New York. NY felixhandte.com Software Engineer on @facebook's Data Compression Team

felixhandte/draft-handte-hybi-zstd-pmce 5

A Zstandard Per-Message Compression Extension for WebSocket

felixhandte/LaTeX-Grapher 3

Turn CSVs into pretty LaTeX/TiKZ graphs.

felixhandte/lz4 2

Extremely Fast Compression algorithm

felixhandte/Enigma 1

A Real-World Compatible Enigma Machine Simulator

felixhandte/homedir 1

Some dotfiles and such.

felixhandte/zstd 1

Zstandard - Fast real-time compression algorithm

felixhandte/chromium 0

The official GitHub mirror of the Chromium source

PR opened facebook/zstd

Fix DDSS Load

This PR fixes an incorrect comparison in figuring out minChain in ZSTD_dedicatedDictSearch_lazy_loadDictionary(). This incorrect comparison had been masked by the fact that idx was always 1, until @terrelln changed that in #2726.

Credit-to: OSS-Fuzz

+1 -1

0 comment

1 changed file

pr created time in 3 days

create barnchfelixhandte/zstd

branch : fix-ddss-load

created branch time in 3 days

PR opened Cyan4973/xxHash

Use `alignas()` in C++11

The previous macro test only detected C11 and failed in modern C++, which actually goes one step further and makes alignas a keyword. It's not clear that this actually improves the situation with respect to #543, but it should be slightly more correct in some sense.

+4 -1

0 comment

1 changed file

pr created time in 3 days

create barnchfelixhandte/xxHash

branch : c++-alignas

created branch time in 3 days

issue commentCyan4973/xxHash

Easier implementation for software running on "normal" architectures

C11 says:

If an object's alignment is made stricter (larger) than max_align_t using _Alignas, it has extended alignment requirement. A struct or union type whose member has extended alignment is an over-aligned type. It is implementation-defined if over-aligned types are supported, and their support may be different in each kind of storage duration.

Which, if I'm reading correctly, is not particularly encouraging. It looks like implementations are free to ignore alignas(64)...

I'm not sure if there's a good solution available, but one small possible improvement would be: is it possible the XXH_ALIGN() macro does not interact well with C++? It only uses alignas() when it detects C11, but doesn't check for C++11. We should consider adding || (defined(__cplusplus) && (__cplusplus >= 201103L).

fcorbelli

comment created time in 3 days

PullRequestReviewEvent

issue closedfacebook/zstd

ZSTD_decodeSequence: Coverity recommends to initialize structure to 0

In commit 2777cf4466 (zstd: Initialize seq_t structure fully) GRUB, using Zstandard 1.3.8 from 2018, changed the current Zstandard code below

https://github.com/facebook/zstd/blob/eace4abc2559eabb9300b9ca7e6769f620f0446e/lib/decompress/zstd_decompress_block.c#L941-L954

because Coverity recommended it.

-    seq_t seq;
+    seq_t seq = {0};

closed time in 8 days

paulmenzel

issue commentfacebook/zstd

ZSTD_decodeSequence: Coverity recommends to initialize structure to 0

I believe this is resolved.

paulmenzel

comment created time in 8 days

issue closedfacebook/zstd

--patch-from last in the race

Windows 7, zstd 1.4.9, hdiffz 3.1.1, jdiff 0.8.5, xdelta 3.1.0

Source: binaries.7z <sup>SHA256: 2c1c8b1c4093e071de64b2dd9aebf7a1003712e39c0e5473c04a17ae781f81f5</sup>

$ zstd -19 --patch-from binary.old binary.new -o patch.zdiff
$ hdiffz -c-zlib-9  binary.old binary.new patch.hdiff
$ jdiff -jb binary.old binary.new patch.jdiff
$ xdelta -e -9 -s binary.old binary.new patch.xdelta

Computation speed is about the same, but patch size…

$ stat -c "%s %n" patch.* | sort -n | column -t
72    patch.jdiff    
100   patch.hdiff    
247   patch.xdelta   
1952  patch.zdiff     

closed time in 8 days

sergeevabc

issue closedfacebook/zstd

Now RFC 8478 obsoleted by RFC 8878

Since February 2021 according to datatracker.ietf.org.

closed time in 15 days

data-man

issue commentfacebook/zstd

HTTP "Content-Encoding" window size wrt. RFC 8878

Specifying a hard limit on the window size for HTTP transfers is not a good solution, IMHO. A global limit that is conservative enough to be of use to resource-constrained endpoints will be harmfully restrictive for use cases which (by prior arrangement or expectation) transport large zstd-compressed objects.

I would much rather have standardized an option on the Accept-Encoding item ("Accept-Encoding: zstd;w=20"). This would provide a standard mechanism for recipients to make their limitations known to senders. Unfortunately we did not do so, and are not planning to revise the RFC again anytime soon.

Frankly, we might encourage folks to use such a mechanism anyways. If we were to specify a w option in the future, I can only imagine that w=N would mean that the maximum window size the recipient is prepared to handle is 2**N bytes. So ad hoc use should be forwards-compatible...

Otherwise, in the absence of a coordination mechanism, I think implementations should just follow Postel's Principle: "be conservative in what you send, be liberal in what you accept".

I'm curious to hear more about the experiences you've seen where this is a problem.

klauspost

comment created time in 15 days

issue commentfacebook/zstd

Streaming decompress into fragmented (but stable) output without history buffer?

This isn't ideal because we need to reserve based on a worst case scenarios (up to some limit), and more important, this memory in pinned forever, even though clients may never send us compressed data.

If you just malloc() it but never use it, actual memory will never get allocated to back it. So you'll just be using the address space. Although I guess you're saying you're running out of that. Which really surprises me! Are you using the stock libc allocator? You might want to try https://github.com/jemalloc/jemalloc. It's pretty great.

dotnwat

comment created time in 15 days

issue commentfacebook/zstd

Streaming decompress into fragmented (but stable) output without history buffer?

@dotnwat,

Thanks for providing so much context!

I'm sorry to say that in general, the answer to your question is no. And unfortunately, it's not just an API limitation: the decoder implementation is written in such a way that it cannot decompress with a fragmented history buffer.

Fundamentally the zstd decoder is performing two operations in a loop, decoding sequences and executing them.

  1. Decoding a sequence involves recovering the LZ77 literal_length, match_length, and match_offset values from the entropy-encoded representation they're stored in (ANS Encoding via FSE).
  2. Executing a sequence is straightforward LZ77 decoding, and involves copying the next literal_length bytes from the decoded literals buffer (previously recovered from their Huffman-encoded representation) onto the tail of the output and then copying match_length bytes from match_offset bytes back in the history of the stream onto the tail of the output.

The relevant part here is the implementation of the match copy operation. Since this is part of the hot loop / core of the most performance-sensitive part of Zstd, we want the lookup of the match position from the decoded offset to be as fast as possible, basically just current_position - match_offset. The cost of this fast mapping is that it requires that the whole window is contiguous...

Except technically, we do actually implement an exception to this. The history buffer is allowed to have a single discontinuity. In order to efficiently maintain a window-sized view of an arbitrarily large stream, the internal history buffer is a circular buffer (sized to the window size), which as it wraps around will map the window into two chunks. So the decoder is implemented to handle that. That's probably not sufficient for your use case, though, even if that support were plumbed through to external history buffers.

Someone (you? us?) could potentially write a decoder implementation that supported arbitrary fragmentation at the cost of slower execution, but from an external view, the approach you're taking now is probably your most realistic option.

I hope that helps!

dotnwat

comment created time in 15 days

PR closed facebook/zstd

[WIP] Allocate Huge Pages for the Compression Workspace CLA Signed optimization

This PR is experimental and is not suitable for merge. This is a demonstrator / testbench for the effects of backing the cwksp with huge pages.

A quick experiment showed the following performance improvement (huge page speeds on the left):

$ ./zstd -b1e19
 1: 9.54 MiB -> 3.01 MiB (3.170),  336.6 MB/s vs 336.6 MB/s
 2: 9.54 MiB -> 2.99 MiB (3.194),  242.4 MB/s vs 242.1 MB/s
 3: 9.54 MiB -> 3.08 MiB (3.095),  128.7 MB/s vs 115.4 MB/s
 4: 9.54 MiB -> 3.19 MiB (2.988),  117.6 MB/s vs 110.0 MB/s
 5: 9.54 MiB -> 3.13 MiB (3.046),   82.5 MB/s vs  75.5 MB/s
 6: 9.54 MiB -> 3.13 MiB (3.047),   80.8 MB/s vs  76.3 MB/s
 7: 9.54 MiB -> 3.12 MiB (3.058),   69.6 MB/s vs  65.2 MB/s
 8: 9.54 MiB -> 3.11 MiB (3.064),   62.5 MB/s vs  61.9 MB/s
 9: 9.54 MiB -> 3.16 MiB (3.022),   56.1 MB/s vs  53.6 MB/s
10: 9.54 MiB -> 3.19 MiB (2.987),   52.2 MB/s vs  49.0 MB/s
11: 9.54 MiB -> 3.20 MiB (2.979),   48.0 MB/s vs  41.7 MB/s
12: 9.54 MiB -> 3.20 MiB (2.980),   46.7 MB/s vs  37.7 MB/s
13: 9.54 MiB -> 3.20 MiB (2.980),   16.1 MB/s vs  13.0 MB/s
14: 9.54 MiB -> 3.20 MiB (2.981),   11.5 MB/s vs  9.80 MB/s
15: 9.54 MiB -> 3.20 MiB (2.982),   9.17 MB/s vs  8.00 MB/s
16: 9.54 MiB -> 2.94 MiB (3.243),   8.26 MB/s vs  8.00 MB/s
17: 9.54 MiB -> 3.00 MiB (3.183),   4.44 MB/s vs  3.89 MB/s
18: 9.54 MiB -> 3.00 MiB (3.177),   3.72 MB/s vs  2.35 MB/s
19: 9.54 MiB -> 3.00 MiB (3.179),   2.82 MB/s vs  2.03 MB/s

Which is pretty attractive!

Running This

You may want to first run e.g.,

echo 128 | sudo tee /proc/sys/vm/nr_hugepages

In order to attempt to get the kernel to reserve a bunch hugepages in advance. Otherwise if your system has fairly fragmented memory the mmap() call will simply fail.

The recommended make arguments to play around with this are DEBUGLEVEL=2 MOREFLAGS="-DZSTD_CWKSP_ALLOC_HUGEPAGE=1. That way you'll get a message printed if the mmap() fails.

While running ./zstd -b3 you should then observe something like:

$ cat /proc/meminfo
[...]
HugePages_Total:     128
HugePages_Free:      127
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

This indicates you're successfully using a hugepage!

Future Directions

To be discussed.

+83 -17

2 comments

1 changed file

felixhandte

pr closed time in 18 days

pull request commentfacebook/zstd

[WIP] Allocate Huge Pages for the Compression Workspace

Pursuing a different approach (doing huge-page customization in caller framework rather than in zstd itself).

felixhandte

comment created time in 18 days

issue commentfacebook/zstd

Efficient read of input sparse files

Hi @anon00000000,

Something like tar is probably the appropriate place to address this need. While the zstd command line understands the idea of files and so on, the compressed format that it writes out does not. A Zstandard compressed blob (a "frame") simply encodes a stream of bytes. So there isn't really a good place in a Zstandard frame to transport the metadata around where the holes are etc., certainly not in a way that is compatible with the standard and the existing decoder. So even if the CLI were able to recognize and skip the holes at compression-time, they would not be preserved and would not be regenerated at decompression-time, which would presumably corrupt your data.

Does that make sense?

anon00000000

comment created time in a month

push eventfacebook/zstd

Usuario

commit sha 8bdce1ff97fddd82e27389eb2a55fbd6a75c1a3f

lib/Makefile: Fix small typo in ZSTD_FORCE_DECOMPRESS_* build macros

view details

Felix Handte

commit sha 628f65b79c15051f5bddc6a374e6a70187151264

Merge pull request #2714 from luisdallos/build-macros-typos lib/Makefile: Fix small typo in ZSTD_FORCE_DECOMPRESS_* build macros

view details

push time in a month

PR merged facebook/zstd

lib/Makefile: Fix small typo in ZSTD_FORCE_DECOMPRESS_* build macros CLA Signed

According to lib/README.md and lib/decompress/zstd_decompress_block.c, the ZSTD_FORCE_DECOMPRESS_* macros should be named ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT and ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG.

+7 -7

1 comment

1 changed file

luisdallos

pr closed time in a month

PullRequestReviewEvent

pull request commentfacebook/zstd

lib/Makefile: Fix small typo in ZSTD_FORCE_DECOMPRESS_* build macros

Good catch, @luisdallos!

luisdallos

comment created time in a month

PR opened facebook/zstd

[WIP] Allocate Huge Pages for the Compression Workspace optimization

This PR is experimental and is not suitable for merge. This is a demonstrator / testbench for the effects of backing the cwksp with huge pages.

A quick experiment showed the following performance improvement (huge page speeds on the left):

$ ./zstd -b1e19
 1: 9.54 MiB -> 3.01 MiB (3.170),  336.6 MB/s vs 336.6 MB/s
 2: 9.54 MiB -> 2.99 MiB (3.194),  242.4 MB/s vs 242.1 MB/s
 3: 9.54 MiB -> 3.08 MiB (3.095),  128.7 MB/s vs 115.4 MB/s
 4: 9.54 MiB -> 3.19 MiB (2.988),  117.6 MB/s vs 110.0 MB/s
 5: 9.54 MiB -> 3.13 MiB (3.046),   82.5 MB/s vs  75.5 MB/s
 6: 9.54 MiB -> 3.13 MiB (3.047),   80.8 MB/s vs  76.3 MB/s
 7: 9.54 MiB -> 3.12 MiB (3.058),   69.6 MB/s vs  65.2 MB/s
 8: 9.54 MiB -> 3.11 MiB (3.064),   62.5 MB/s vs  61.9 MB/s
 9: 9.54 MiB -> 3.16 MiB (3.022),   56.1 MB/s vs  53.6 MB/s
10: 9.54 MiB -> 3.19 MiB (2.987),   52.2 MB/s vs  49.0 MB/s
11: 9.54 MiB -> 3.20 MiB (2.979),   48.0 MB/s vs  41.7 MB/s
12: 9.54 MiB -> 3.20 MiB (2.980),   46.7 MB/s vs  37.7 MB/s
13: 9.54 MiB -> 3.20 MiB (2.980),   16.1 MB/s vs  13.0 MB/s
14: 9.54 MiB -> 3.20 MiB (2.981),   11.5 MB/s vs  9.80 MB/s
15: 9.54 MiB -> 3.20 MiB (2.982),   9.17 MB/s vs  8.00 MB/s
16: 9.54 MiB -> 2.94 MiB (3.243),   8.26 MB/s vs  8.00 MB/s
17: 9.54 MiB -> 3.00 MiB (3.183),   4.44 MB/s vs  3.89 MB/s
18: 9.54 MiB -> 3.00 MiB (3.177),   3.72 MB/s vs  2.35 MB/s
19: 9.54 MiB -> 3.00 MiB (3.179),   2.82 MB/s vs  2.03 MB/s

Which is pretty attractive!

Running This

You may want to first run e.g.,

echo 128 | sudo tee /proc/sys/vm/nr_hugepages

In order to attempt to get the kernel to reserve a bunch hugepages in advance. Otherwise if your system has fairly fragmented memory the mmap() call will simply fail.

The recommended make arguments to play around with this are DEBUGLEVEL=2 MOREFLAGS="-DZSTD_CWKSP_ALLOC_HUGEPAGE=1. That way you'll get a message printed if the mmap() fails.

While running ./zstd -b3 you should then observe something like:

$ cat /proc/meminfo
[...]
HugePages_Total:     128
HugePages_Free:      127
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

This indicates you're successfully using a hugepage!

Future Directions

To be discussed.

+83 -17

0 comment

1 changed file

pr created time in a month

create barnchfelixhandte/zstd

branch : huge-page-cwksp

created branch time in a month

issue closedfacebook/zstd

Compression results are hard to read if numbers are very large

Is your feature request related to a problem? Please describe.

After compressing a file in the CLI some summary text is outputted listing the size before and after compression. If the file is sufficiently large it's very hard to determine the file size visually. This is because the size is reported in bytes. A file that is 200 megabytes has nine digits, and a file that is 20 megabyte has eight digits. Files with large sizes like this are very hard to compare.

Describe the solution you'd like

Instead of reporting the file size in bytes, instead report in a more human readable string format.

Current:
:zstd input.bin
input.bin            : 45.57%   (200836721 => 91519541 bytes, input.bin.zst)
Proposed:
:zstd input.bin
input.bin            : 45.57%   (199.1M => 89.4M bytes, input.bin.zst)

Describe alternatives you've considered

Other than training myself to visually compare 10 digit numbers (unlikely) I don't see an alternative.

closed time in a month

scottchiefbaker

push eventfelixhandte/zstd

Sen Huang

commit sha 923e5ad3f5573cd68f792ebad49c24ecaa0c3ad0

Fix entropy repeat mode bug

view details

Sen Huang

commit sha 2ff5c7b59ffe75f948ffd399a17d9c1959f124a1

Add no intrinsics fuzztest, rowhash compression size test, and S390X to travis

view details

Binh Vo

commit sha d2f31b662779f3c13871d54868e9d5839343856d

Fix --progress flag to properly control progress display and default progress display on when using -v

view details

aqrit

commit sha dd4f6aa9e6db2964cc5ff8641be21334cab97a98

Flatten ZSTD_row_getMatchMask (#2681) * Flatten ZSTD_row_getMatchMask * Remove the SIMD abstraction layer. * Add big endian support. * Align `hashTags` within `tagRow` to a 16-byte boundary. * Switch SSE2 to use aligned reads. * Optimize scalar path using SWAR. * Optimize neon path for `n == 32` * Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577) * replace memcpy with MEM_readST * silence alignment warnings * fix neon casts * Update zstd_lazy.c * unify simd preprocessor detection (#3) * remove duplicate asserts * tweak rotates * improve endian detection * add cast there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension but must be uint16_t to get gcc to generate a rotate instruction.. * more casts * fix casts better work-around for the (bogus) warning: unary minus on unsigned

view details

Goutham Krishna

commit sha 912bb9fbf3375bb0ebf5ec9445eff43854eef20c

Update README for Travis CI Badge ### Updating Badge link to the newTravis CI link. - Update badge root to `api.travis-ci.com` (new) from `travis-ci.org` (old), which was migrated.

view details

sen

commit sha dd33ec9db084a0343db8e49eb0223decd8a611dd

Merge pull request #2700 from gauthamkrishna9991/update_readme_travis_link Update README for Travis CI Badge

view details

sen

commit sha a21b9036fec3ff2e7fd93fb534cd4e8244c385db

Merge pull request #2678 from senhuang42/big_endian_no_intrinsics_fuzztest Fuzzer test with no intrinsics on S390x (big endian)

view details

Yann Collet

commit sha 05d70903a6f3472642f18636a47a1cb44171bc7d

Merge pull request #2698 from binhdvo/bootcamp Fix --progress flag to properly control progress display and default …

view details

Binh Vo

commit sha 6583fa3f0ab8b7ab7c34888a07b20af852a2ef3c

Add support for --long-param flag

view details

binhdvo

commit sha 78e16b15f14d1ae3aac4241c1e15cb2537922ed6

Merge pull request #2703 from binhdvo/bootcamp Add support for --long-param flag, fix #2104

view details

binhdvo

commit sha 325952f878b692b6fb37dcba270ae1ee0ceede53

Revert "Add support for --long-param flag, fix #2104"

view details

Yann Collet

commit sha 2962583492a39607a1b54dd9deb237087f368c09

Merge pull request #2704 from facebook/revert-2703-bootcamp Revert "Add support for --long-param flag, fix #2104"

view details

sen

commit sha d5f3568c4bb0d4d956625fd14180ec12b7f834df

Merge pull request #2697 from senhuang42/entropy_repeat_fix [bug] Fix entropy repeat mode bug

view details

Scott Baker

commit sha 26fab1d963b70b5673e7a7405735818f7b6e30bd

Make the CLI output the file sizes in human readable format

view details

Scott Baker

commit sha b70175e5ec51294115f311db3d77c104b557a14e

Put the human_size() function in util.c

view details

Scott Baker

commit sha b6b23dfe64e2ff52e52d2eeed7e68ebc70249178

Convert names to CamelCase

view details

Scott Baker

commit sha eefdbcd93aa296df9a72ba75a53201bb10a3a2ae

Make the variable types match

view details

Scott Baker

commit sha 4e0d9f1cc83a0795a2eb8401dc92369a899614c1

Move the variable declarations to the top

view details

Scott Baker

commit sha 894698d3b61583fb539f355584db576b7e0635c5

Use human_size() in the benchmark output also

view details

Scott Baker

commit sha 77001f00fb30ee63f10c96b89b4d6ccfb00078b4

Use human_size() on the "multiple files compressed" output also

view details

push time in a month

delete branch felixhandte/zstd

delete branch : human_size_output

delete time in a month

delete branch facebook/zstd

delete branch : human_size_output

delete time in a month

push eventfacebook/zstd

Scott Baker

commit sha 26fab1d963b70b5673e7a7405735818f7b6e30bd

Make the CLI output the file sizes in human readable format

view details

Scott Baker

commit sha b70175e5ec51294115f311db3d77c104b557a14e

Put the human_size() function in util.c

view details

Scott Baker

commit sha b6b23dfe64e2ff52e52d2eeed7e68ebc70249178

Convert names to CamelCase

view details

Scott Baker

commit sha eefdbcd93aa296df9a72ba75a53201bb10a3a2ae

Make the variable types match

view details

Scott Baker

commit sha 4e0d9f1cc83a0795a2eb8401dc92369a899614c1

Move the variable declarations to the top

view details

Scott Baker

commit sha 894698d3b61583fb539f355584db576b7e0635c5

Use human_size() in the benchmark output also

view details

Scott Baker

commit sha 77001f00fb30ee63f10c96b89b4d6ccfb00078b4

Use human_size() on the "multiple files compressed" output also

view details

Scott Baker

commit sha 35576e63ce5f770b99306c5730244e310c0f0aed

Convert tabs to spaces

view details

Scott Baker

commit sha e5fc830795f2966d0198bf3b6523e86984fbf4fa

human_size() should use size_t

view details

Scott Baker

commit sha 1ef6f3d079b2415c6bdea61c944a1a0cea3a9a28

Use unsigned long instead to help with some tests

view details

Scott Baker

commit sha 64385ef7cbb3388ceacb7611f29b2f0a03bdf477

Update humanSize() to skip the big numbers (it requires 64 bit)

view details

Scott Baker

commit sha 20b9b00b413c799bef54bfeb2a36256f11d65b06

Try unsigned long long

view details

Scott Baker

commit sha 376a2730a8fbcefce94b33ed485a8b5ea606eb5e

Try enabling the BIG strings now the unsigned long long is in effect

view details

Scott Baker

commit sha 1eb852854b48f7258b74d4e781e036db1b260607

Some fixes to address things @felixhandte found

view details

Scott Baker

commit sha 8e0a9695d7a28087c14ca62ed15570021311a263

Attempt to fix a failing test with help from @aqrit

view details

W. Felix Handte

commit sha bbb81c8801006d8a4bcc10d2605da2cb1eda1662

Avoid `snprintf()` in Preparing Human-Readable Sizes; Improve Formatting This produces the following formatting: Size | `zstd` | `ls -lh` ---------- | ------ | -------- 1 | 1 | 1 12 | 12 | 12 123 | 123 | 123 1234 | 1.21K | 1.3K 12345 | 12.1K | 13K 123456 | 121K | 121K 1234567 | 1.18M | 1.2M 12345678 | 11.8M | 12M 123456789 | 118M | 118M 1234567890 | 1.15G | 1.2G 999 | 999 | 999 1000 | 1000 | 1000 1001 | 1001 | 1001 1023 | 1023 | 1023 1024 | 1.000K | 1.0K 1025 | 1.00K | 1.1K 999999 | 977K | 977K 1000000 | 977K | 977K 1000001 | 977K | 977K 1023999 | 1000K | 1000K 1024000 | 1000K | 1000K 1024001 | 1000K | 1001K 1048575 | 1024K | 1.0M 1048576 | 1.000M | 1.0M 1048577 | 1.00M | 1.1M This was produced with the following invocation: ``` for N in 1 12 123 1234 12345 123456 1234567 12345678 123456789 1234567890 999 1000 1001 1023 1024 1025 999999 1000000 1000001 1023999 1024000 1024001 1048575 1048576 1048577; do head -c $N /dev/urandom > r$N done ./zstd -i1 -b1 -S r1 r12 r123 r1234 r12345 r123456 r1234567 r12345678 r123456789 r1234567890 r999 r1000 r1001 r1023 r1024 r1025 r999999 r1000000 r1000001 r1023999 r1024000 r1024001 r1048575 r1048576 r1048577 ```

view details

W. Felix Handte

commit sha 9b67219b1e961574724f2c4fb9f2796430e7fccf

Fix Integer Constants; Fix Comparison

view details

W. Felix Handte

commit sha 464bfb022ef2d1f44778717b71979b86a3ffbea5

In Verbose Mode, Preserve Full Precision Where Possible

view details

W. Felix Handte

commit sha 93bb368d744ac56e4536b739e188aeadb2aaee8e

Change Suffix (e.g., "G" -> " GB")

view details

W. Felix Handte

commit sha 7e0058848ca45d1ff8c7e846955b0f15a4635073

Fix Whitespace

view details

push time in a month

PR merged facebook/zstd

Format File Sizes Human-Readable in the CLI CLA Signed

This PR extends @scottchiefbaker's #2696. It switches zstd's CLI output to printing human-readable representations of file sizes, rather than full-precision integers.

This table shows how this PR formats various sizes in comparison to ls -lh. There are some differences, but in general I prefer this formatting over ls's, since this provides more consistent 3-4 digits of precision and rounds-to-nearest rather than always rounding-up.

Size zstd ls -lh
1 1 B 1
12 12 B 12
123 123 B 123
1234 1.21 KiB 1.3K
12345 12.1 KiB 13K
123456 121 KiB 121K
1234567 1.18 MiB 1.2M
12345678 11.8 MiB 12M
123456789 118 MiB 118M
1234567890 1.15 GiB 1.2G
999 999 B 999
1000 1000 B 1000
1001 1001 B 1001
1023 1023 B 1023
1024 1.000 KiB 1.0K
1025 1.00 KiB 1.1K
999999 977 KiB 977K
1000000 977 KiB 977K
1000001 977 KiB 977K
1023999 1000 KiB 1000K
1024000 1000 KiB 1000K
1024001 1000 KiB 1001K
1048575 1024 KiB 1.0M
1048576 1.000 MiB 1.0M
1048577 1.00 MiB 1.1M

<details> <summary>Repro Instructions:</summary>

for N in 1 12 123 1234 12345 123456 1234567 12345678 123456789 1234567890 999 1000 1001 1023 1024 1025 999999 1000000 1000001 1023999 1024000 1024001 1048575 1048576 1048577; do
  head -c $N /dev/urandom > r$N
done
./zstd -i1 -b1 -S r1 r12 r123 r1234 r12345 r123456 r1234567 r12345678 r123456789 r1234567890 r999 r1000 r1001 r1023 r1024 r1025 r999999 r1000000 r1000001 r1023999 r1024000 r1024001 r1048575 r1048576 r1048577

</details>

+145 -52

10 comments

6 changed files

felixhandte

pr closed time in a month

pull request commentfacebook/zstd

Format File Sizes Human-Readable in the CLI

The new changes include the --list command:

$ ./zstd -l *.zst
Frames  Skips  Compressed  Uncompressed  Ratio  Check  Filename
     1      0     977 KiB       977 KiB  1.000  XXH64  r1000000.zst
     1      0     977 KiB       977 KiB  1.000  XXH64  r1000001.zst
     1      0    1014   B      1000   B  0.986  XXH64  r1000.zst
     1      0    1015   B      1001   B  0.986  XXH64  r1001.zst
     1      0    1000 KiB      1000 KiB  1.000  XXH64  r1023999.zst
     1      0    1.01 KiB      1023   B  0.986  XXH64  r1023.zst
     1      0    1000 KiB      1000 KiB  1.000  XXH64  r1024000.zst
     1      0    1000 KiB      1000 KiB  1.000  XXH64  r1024001.zst
     1      0    1.01 KiB     1.000 KiB  0.987  XXH64  r1024.zst
     1      0    1.01 KiB      1.00 KiB  0.987  XXH64  r1025.zst
     1      0    1.00 MiB      1024 KiB  1.000  XXH64  r1048575.zst
     1      0    1.00 MiB     1.000 MiB  1.000  XXH64  r1048576.zst
     1      0    1.00 MiB      1.00 MiB  1.000  XXH64  r1048577.zst
     1      0    1.15 GiB      1.15 GiB  1.000  XXH64  r1234567890.zst
     1      0     118 MiB       118 MiB  1.000  XXH64  r123456789.zst
     1      0    11.8 MiB      11.8 MiB  1.000  XXH64  r12345678.zst
     1      0    1.18 MiB      1.18 MiB  1.000  XXH64  r1234567.zst
     1      0     121 KiB       121 KiB  1.000  XXH64  r123456.zst
     1      0    12.1 KiB      12.1 KiB  0.999  XXH64  r12345.zst
     1      0    1.22 KiB      1.21 KiB  0.989  XXH64  r1234.zst
     1      0     136   B       123   B  0.904  XXH64  r123.zst
     1      0      25   B        12   B  0.480  XXH64  r12.zst
     1      0      14   B         1   B  0.071  XXH64  r1.zst
     1      0     977 KiB       977 KiB  1.000  XXH64  r999999.zst
     1      0    1013   B       999   B  0.986  XXH64  r999.zst
----------------------------------------------------------------- 
    28      0    2.56 GiB      2.56 GiB  1.000  XXH64  28 files

As well as in-progress compression in various ways:

$ ./zstd -f -9 r*
Compress: 35/50 files. Current: r123456789 Read:   108 MiB /   118 MiB ==> 100%
$ ./zstd -f -v -9 r*
*** zstd command line interface 64-bits v1.5.0, by Yann Collet ***
r1                   :1400.00%   (     1   B =>     14   B, r1.zst)            .00% 
r1000                :101.40%   (  1000   B =>   1014   B, r1000.zst)          
r1000000             :100.00%   (   977 KiB =>    977 KiB, r1000000.zst)       
r1000001             :100.00%   (   977 KiB =>    977 KiB, r1000001.zst)       
...
r12345678            :100.00%   (  11.8 MiB =>   11.8 MiB, r12345678.zst)      0.00% 
(L9) Buffered :  10.5 MiB - Consumed :  21.5 MiB - Compressed :  21.5 MiB => 100.00%
$ ./zstd -f -vv -9 r*
*** zstd command line interface 64-bits v1.5.0, by Yann Collet ***
r1                   :1400.00%   (     1   B =>     14   B, r1.zst)            00.00% 
r1                   : Completed in 0.00 sec  (cpu load : 96%)
r1000                :101.40%   (  1000   B =>   1014   B, r1000.zst)          
r1000                : Completed in 0.00 sec  (cpu load : 98%)
r1000000             :100.00%   (1000000   B => 1000037   B, r1000000.zst)     
r1000000             : Completed in 0.02 sec  (cpu load : 103%)
...
r123456789           :100.00%   (123456789   B => 123459629   B, r123456789.zst)  => 100.00% 
r123456789           : Completed in 1.21 sec  (cpu load : 118%)
(L9) Buffered :10616832   B - Consumed :1134559232   B - Compressed :1134585210   B => 100.00%
felixhandte

comment created time in a month

push eventfelixhandte/zstd

W. Felix Handte

commit sha 8c00807bbcee5fdeb1db126cec4529f09375dc2f

Whitespace Fixes to Improve Cross-Line Alignment

view details

push time in a month