ZSTD compression for ZFS
Needs ReviewPublic

Authored by allanjude on Jun 10 2017, 2:06 AM.

Details

Summary

This patch adds ZSTD compression to ZFS

It currently supports the default 19 compression levels. Support for a subset of the negative levels is planned for a future commit.

It includes the required changes to boot from zstd compressed datasets for BIOS/LEGACY and UEFI.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 17292
Build 17129: arc lint + arc unit
allanjude created this revision.Jun 10 2017, 2:06 AM
allanjude updated this revision to Diff 30301.Jul 1 2017, 8:35 PM

Lots of cleanup

Works with Zstandard 1.2.0 now

Reduced diff against Zstandard

allanjude updated this revision to Diff 33855.Oct 10 2017, 2:53 AM

Update to support ZSTD 1.3.x

This is a much cleaner version of the patch

Depends on D10407 for libzstd changes

Some of the libstand Makefile goo will go away when Warner finishes his cleanup of sys/boot

This version works, including booting BIOS and EFI with zstd compression on the boot environment.

Still considering switching away from having the 3 compression levels (min, default, max), to just 'zstd', with a separate property that controls the level (1-19)

imp added a comment.Oct 10 2017, 3:16 AM

comments on the sys/boot / libstand parts of this.

lib/libstand/Makefile
115–116 ↗(On Diff #33855)

${SRCTOP}/ instead of ${LIBSTAND_SRC}/../../ works better and fits with changes I've made to this file.

118 ↗(On Diff #33855)

ditto

sys/boot/efi/boot1/Makefile
94–96 ↗(On Diff #33855)

Just nuke these. The first line isn't needed anymore, and the last line I've already done.

sys/boot/efi/loader/Makefile
152 ↗(On Diff #33855)

kill this, it's OBE

sys/boot/i386/gptzfsboot/Makefile
91–92 ↗(On Diff #33855)

Although this is a wrap fix, it's gratuitous. I'd avoid it. It will also conflict.

allanjude added inline comments.Oct 10 2017, 4:56 AM
sys/boot/i386/gptzfsboot/Makefile
91–92 ↗(On Diff #33855)

yes, this was unintentional, it is left over from when there was a libzstd32 for a short while before I just added zstd to libstand instead.

Just some optional performance suggestions

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zstd.c
195

Performance suggestion :

re-using an _existing_ compression context saves a lot of allocation and initialisation work.
The smaller the block size, the more measurable the savings.

217

Same logic here as for compression side :
re-using a context saves allocation and initialisation costs.

The impact is less pronounced on the decompression side, because initialisation work is much lighter.

@yann.collet.73_gmail.com : Is there any tuning we can do to get the size of the compression context down, since we know the maximum size if the input will be 16MB?

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zstd.c
195

I have not worked out had to handle that safely yet. There are 2 issues:

There are multiple kernel threads that will be calling this function concurrently, and they may be doing different compression levels. I don't want to have N unused -19 compression contexts outstanding, as they are up to 50MB each.

So preallocating a number of contexts probably won't work because of the diversity of input block sizes and compression levels, and the multi-threaded nature of the surrounding code. ZSTD won't be invoked in its multi-threaded mode, but we may be compressing many independent blocks at once.

Is there any tuning we can do to get the size of the compression context down, since we know the maximum size if the input will be 16MB?

A large proportion of the compression context size comes from hash and binary table sizes.
The associated parameters are cparams.hashlog and cparams.chainlog.
They are automatically set when invoking ZSTD_getCParams(compressionLevel, inputSize, 0).

inputSize is very important to automatically tune down compression parameters in presence of short input.
You have already described in your presentation that selecting a small inputSize has a big impact on compression context size.

For 16 MB though, no tune down will happen, since it's considered a "large" input.

In order to reduce context size, you have a few choices :

  • reduce compression level
  • reduce input size
  • manually change compression parameters

If you are interested in scenario 3, cparams.hashlog and cparams.chainlog will be your prime candidates.
Obviously, changing these values will have additional side effects, on top of reducing memory requirements.
Reducing chainlog will reduce compression ratio, and slightly increase speed.
Reducing hashlog will reduce speed, and slightly reduce compression ratio.
Some other parameters can be modified to compensate, such as increasing searchLog, which will reduce speed, and slightly increase compression ratio.
Well, many variations are possible.
When in doubt if manually selected parameters are valid, use ZSTD_adjustCParams(cparams, inputSize, 0) to get them automatically fixed.

For a selected set of custom compression parameters, ZSTD_estimateCCtxSize_advanced(cparams) will tell how much memory the compression context needs.

allanjude updated this revision to Diff 36221.Dec 5 2017, 4:41 AM
allanjude edited the summary of this revision. (Show Details)

Newer version of ZSTD (1.3.2)

Incorporate feedback from ZFS developer summit

  • Store ZSTD level in a new hidden property. This allows all 19 levels without filling up the zio_compress enum.

Note: stand/ and sys/contrib/zstd changes will be committed separately

So you only have one new compression value in the BP? Have you looked at how this interacts with L2ARC when compressed ARC is disabled? See arc_cksum_is_equal(), I don't think this will work right, because you won't be able to reproduce the on-disk compressed data without knowing the compression level.

allanjude updated this revision to Diff 38602.Jan 29 2018, 5:33 AM

Solve issues with L2ARC if compressed_arc_enabled=0

Catch up the latest -current

allanjude updated this revision to Diff 39783.Feb 27 2018, 3:46 AM

Update to r329600

Need to retest boot support

allanjude updated this revision to Diff 40780.Mar 27 2018, 12:25 AM

Rebase to r331496

This solves merge conflicts with ZFS device removal and related feature flags that have come in recently

allanjude updated this revision to Diff 42467.May 13 2018, 4:51 AM

Add support for zstd compression levels to ZFS channel programs

Also update to Sr333575

bcr added a subscriber: bcr.May 13 2018, 10:51 AM

Can you bump the .Dd on zpool-features.7 (content change)?

allanjude retitled this revision from WIP: ZSTD in ZFS to ZSTD compression for ZFS.Sat, Jun 2, 10:24 PM
allanjude updated this revision to Diff 43279.Sat, Jun 2, 10:25 PM

Update to r334532

No changes to ZSTD/ZFS

allanjude edited the summary of this revision. (Show Details)Thu, Jun 14, 1:11 AM
allanjude updated this revision to Diff 43730.Thu, Jun 14, 1:13 AM

Add a 'zfs send' feature flag, to indicate that zstd is required to receive this stream

Booting from zstd compressed pools has been confirmed working.

Known issues:

  • the zstd_compress feature flag is not being 'activated' properly.
  • the zstd_compress_level 'property' is not invisible as intended
allanjude updated this revision to Diff 43797.Fri, Jun 15, 3:56 AM

Create the ZSTD replication feature flag

  • Set for both -c (compressed send) and -e (embedded blocks)
  • Fix the activation of the ZSTD per-dataset feature flag. The first time a ZSTD compressed block is written to a dataset, it is activated
  • Correctly hide the zstd_compress_level property