Page MenuHomeFreeBSD

geom_uzip(4), mkuzip(8): Add Zstd image mode
AbandonedPublic

Authored by cem on Sat, Aug 10, 1:00 AM.

Details

Summary

The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.

mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format. Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned. This happened in about 1 out of every 512
cases. The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect. The problem was noticed and a solution attempted
in r302284, but instead of correcting the final block length, a decompressor
error just trashes the final cluster (pretends it's all zeroes).

Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).

mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.

  • Input block size validation was pulled out of individual compression init routines into main().
  • Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided.
  • A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code.
  • Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future.

mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.

Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'. '-L' is considered deprecated, but will probably never be removed.

All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.

Test Plan

mkuzip (defaults, UFS, part of the Silesia corpus in a 100MB md):

$ time mkuzip -S -Z -A lzma -o mkuzip_test.ulzma /dev/md0
compressed data to 28150272 bytes, saved 76707328 bytes, 73.15% decrease, 25009222.59 bytes/sec.
91.70s user 3.58s system 2248% cpu 4.238 total
^^^^^

$ time mkuzip -S -Z -A zlib -o mkuzip_test.uzip /dev/md0
compressed data to 32128000 bytes, saved 72729600 bytes, 69.36% decrease, 329501518.41 bytes/sec.
6.33s user 0.12s system 2002% cpu 0.322 total
^^^^

$ time mkuzip -S -Z -A zstd -o mkuzip_test.uzst /dev/md0
compressed data to 31258112 bytes, saved 73599488 bytes, 70.19% decrease, 557281704.67 bytes/sec.
4.21s user 0.11s system 2256% cpu 0.192 total
^^^^

geom_uzip (images from above):

# lzma
$ MD=$(mdconfig -a -o readonly -f ./mkuzip_test.ulzma) ; mkdir -p /mnt/$MD ; mount -o ro /dev/${MD}.uzip /mnt/$MD ; pv < /mnt/${MD}/sil1.dat > /dev/null ; umount /mnt/$MD
96.0MiB 0:00:02 [38.4MiB/s] [===>] 100%

# zlib
$ MD=$(mdconfig -a -o readonly -f ./mkuzip_test.uzip) ; mkdir -p /mnt/$MD ; mount -o ro /dev/${MD}.uzip /mnt/$MD ; pv < /mnt/${MD}/sil1.dat > /dev/null ; umount /mnt/$MD
96.0MiB 0:00:00 [ 181MiB/s] [===>] 100%

# zstd
$ MD=$(mdconfig -a -o readonly -f ./mkuzip_test.uzst) ; mkdir -p /mnt/$MD ; mount -o ro /dev/${MD}.uzip /mnt/$MD ; pv < /mnt/${MD}/sil1.dat > /dev/null ; umount /mnt/$MD
96.0MiB 0:00:00 [ 326MiB/s] [===>] 100%

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25771
Build 24344: arc lint + arc unit

Event Timeline

cem created this revision.Sat, Aug 10, 1:00 AM
cem edited the test plan for this revision. (Show Details)Sat, Aug 10, 1:12 AM