-
Notifications
You must be signed in to change notification settings - Fork 0
Implement snappy compression in userland code #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Output streaming is a bit challenging, as the compressed data starts with the uncompressed length:
One solution could be to do the following: use io\streams\FileOutputStream;
use io\streams\compress\Snappy;
$snappy= new Snappy();
$out= new FileOutputStream('compressed.sn');
$out->write($snappy->length(strlen($data));
$stream= $snappy->create($out);
$stream->write($data);
$stream->close();...but that feels hacky. We could overload the second parameter to open() as snappy does not use a compression level, which would give us: use io\streams\FileOutputStream;
use io\streams\compress\Snappy;
$snappy= new Snappy();
$stream= $snappy->create(new FileOutputStream('compressed.sn'), strlen($data));
$stream->write($data);
$stream->close();...but that would be inconsistent with other implementations. The classical options-approach would give us us something like this: $stream= $snappy->create(new FileOutputStream('compressed.sn'), ['length' => strlen($data)]);...but that's error prone to its "string-key" nature. We could solve this with an $stream= $snappy->create(new FileOutputStream('compressed.sn'), new Options(length: strlen($data)); |
|
Integration testing buffered vs. unbuffered snappy compression shows the implementation has bugs: # Calls compress()
$ xp snappy.script.php -c pdf.streaming > pdf.sn
pdf.streaming (2207250 -> 876717) 0.064 seconds & 2044.38 kB used / 6550.12 kB peak
# Calls open($out)
$ xp snappy.script.php -buf pdf.streaming pdf.sn
[.]
pdf.streaming (2207250 -> 876717) 0.074 seconds & 1426.95 kB used / 6832.72 kB peak
# Calls open($out, new Options(length: $size))
$ xp snappy.script.php -out pdf.streaming pdf.sn
[.]
pdf.streaming (2207250 -> 89427) 0.190 seconds & 1445.20 kB used / 1786.59 kB peakAll of these yield the following decompression error: $ snappy -d pdf.sn > pdf.return
snappy: pdf.sn: compressed block of length 876717: expecting 2207250 bytes, got 909072For comparison, this is what is expected: $ snappy pdf.streaming > pdf.sn
pdf.streaming: 2207250 -> 2199987 (99.67%) |
Discovered when integration-testing with the official test data from https://github.com/google/snappy/tree/main/testdata
|
Using https://github.com/google/snappy/tree/main/testdata files copied to ./fixtures: Integration testing for compress()for file in $(ls -1 fixtures/* | grep -v baddata); do
echo "== $file =="
xp snappy.script.php -c $file > sn
snappy -d sn > test
diff -u test $file && echo "OK"
rm sn test
done✅ Works |
|
Streaming, while being a bit slower for small files, really shines with large files: The 584 MB video file compresses in 9 seconds instead of 34, and has a peak memory usage of just 1.8 Megabytes vs. 1.1 Gigabytes! |
|
Added to MongoDB in https://github.com/xp-forge/mongodb/releases/tag/v3.6.0 |
See https://en.wikipedia.org/wiki/Snappy_(compression), https://google.github.io/snappy/ and xp-forge/mongodb#62 (comment)