Add support for compressed kernel dumps.
ClosedPublic
Actions

Authored by markj on Jul 25 2017, 12:45 AM.

Details

Reviewers

asomers
def
cem
rgrimes

Group Reviewers

manpages

Commits

rS324965: Add support for compressed kernel dumps.

Summary

Compression can be requested by passing -c to dumpon(8) or by setting
dumpcompress=YES in rc.conf. savecore(8) will automatically save
compressed vmcores with a .gz extension.

The kernel dump header struct is modified and its version is bumped. Two
new fields were added:

dumpextent, which gives the distance between the leading and trailing headers. With uncompressed dumps, this is equal to dumplength.

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.
compression, the compression algorithm used. In particular, I made an affordance for alternate compression algorithms, though we currently only support DEFLATE.

I took bytes out of the panicstr buffer, since most panic strings are
much shorter than 192 bytes.

We currently don't support simultaneous encryption and compression. The
reason is that the current implementation of encrypted dumps does not
add any padding bytes, as uncompressed dumps are always a multiple of
the device sector size in length. To support compressed dumps, though,
we need padding bytes, which complicates the implementation somewhat.
Therefore, that support is deferred to a different revision.

Test Plan

I'm using a script to test kernel dumps in bhyve. It exercises combinations of
minidumps vs. full dumps, compression, encryption, and dump device sector
size.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 10742
Build 11138: arc lint + arc unit

Event Timeline

markj created this revision.Jul 25 2017, 12:45 AM

markj edited the summary of this revision. (Show Details)Jul 25 2017, 12:49 AM

markj edited the test plan for this revision. (Show Details)

markj added reviewers: asomers, def, cem, rgrimes.

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.

I think I'd prefer we just always start compressed dumps at 0. Simple and consistent. What's the benefit of starting partway through the partition some of the time?

sbin/savecore/savecore.c
706	Isn't the space needed only `dumpextent` if compressed cores are kept compressed?
735	This comment doesn't seem to make sense right now.
836	Not sure the isencrypted part of this check makes sense. It's tautalogical now and I don't see a problem with it in the future either. I think it should just be: `if (compress \|\| iscompressed) {`
sys/kern/kern_shutdown.c
176	Perhaps `RWTUN`?
1274	Where did the di->blocksize come from?
1395	This line is redundant.
sys/sys/kerneldump.h
89–90	comment seems misleading or open to misinterpretation. dumplength represents the uncompressed size of the dump

In D11723#243490, @cem wrote:

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.

I think I'd prefer we just always start compressed dumps at 0. Simple and consistent. What's the benefit of starting partway through the partition some of the time?

The reason I've been given in the past is that there's some chance we might swap to the dump device during boot, before the core is recovered... it doesn't seem very plausible to me, but it's not impossible. savecore is one of the last rc scripts that runs.

sbin/savecore/savecore.c
706	See my other comment in kerneldump.h - dumplength is the compressed length, and dumplength <= dumpextent.
735	Yeah, this was from when I had implemented support for encryption+compression. Will fix.
836	When savecore saves an encrypted dump, it saves it to vmcore_encrypted.0 rather than vmcore.0. Since we decrypt before decompressing, I wanted to ensure that the vmcore was still called vmcore_encrypted.0 rather than vmcore.0.gz. But for now it is indeed tautological.
sys/kern/kern_shutdown.c
176	Will fix.
1274	It's for the trailing copy of the header. The comment is kind of confusing when paired with this line though. I think I'll try making an ASCII diagram of the layout on disk; that'll make the calculations more clear.
1395	Actually it isn't - kerneldump_parity() XORs the 32-bit words in the header, and the parity field comes last. So if kdh->parity = 0, then kerneldump_parity() returns the bitwise NOT of the XOR of all but the last word in the header. So after we set kdh->parity = 0; kdh->parity = kerneldump_parity(kdh), the XOR of all the words in kdh must be 0, and that's how we verify that the header is valid. This should probably be lifted into a kerneldump_parity_update() subroutine.
sys/sys/kerneldump.h
89–90	Note that we set dumplength to the compressed length in dump_final(). In particular, we always have dumplength <= dumpextent. The comment is not very useful though, I'll improve it.

markj mentioned this in D11722: Move dump offset tracking into MI code..Jul 28 2017, 12:20 AM

Address review feedback from cem and rgrimes.

Harbormaster completed remote builds in B10742: Diff 31338.Jul 29 2017, 9:52 PM

In D11723#243777, @markj wrote:

The reason I've been given in the past is that there's some chance we might swap to the dump device during boot, before the core is recovered... it doesn't seem very plausible to me, but it's not impossible. savecore is one of the last rc scripts that runs.

It seems to me like savecore should have an rc requirement of "BEFORE:" the script that enables swap. But maybe some tiny machines need swap to boot, I don't know.

sbin/savecore/savecore.c
706	Huh, I must have got it confused. What's the function of dumpextent then? If it's just a rounded up dumplength, isn't it redundant?
836	The snprintf in this clause already prints `vmcore_encrypted.last.gz` if the dump is encrypted, no? Or do you think an encrypted gzip shouldn't end in .gz? Maybe it would be more consistent to add ".encrypted" as a suffix after the .gz?
sys/kern/kern_shutdown.c
1274	Does the `2 * di->blocksize` in `dumpextent` above (dumpgzs clause) already account for that? I might just be confused. ASCII diagram would help.
1395	I see.

markj added inline comments.Jul 30 2017, 7:01 PM

sbin/savecore/savecore.c
706	The algorithm used by savecore is basically: read the last sector of the device to find the trailing header use the recorded dumpextent to find the offset of the copy of the leading header read dumplength bytes off the device, starting following the leading header and optional crypto key When the dump isn't compressed, dumpextent and dumplength are basically the same thing. But dumpextent isn't just a rounded up dumplength.
836	Yeah, it seems odd to use a .gz suffix for a file that can't be gunzip'ed. I like the idea of having vmcore.0.encrypted rather than vmcore_encrypted.0, i.e., what we have today. But changing that would require modifications to decryptcore as well, so I'd rather defer that to a different change.
sys/kern/kern_shutdown.c
1274	I added a diagram above dump_start().

cem accepted this revision.Jul 31 2017, 11:35 PM

cem added inline comments.

sbin/savecore/savecore.c
706	Ah, right. So we wouldn't need it if all dumps started at the beginning of the drive and we could just read the leading header.

This revision is now accepted and ready to land.Jul 31 2017, 11:35 PM

(I'm holding off on this for a bit longer as I re-port netdump to HEAD. netdump is a bit different from a regular dump device since it doesn't have a fixed size, and I'm trying to avoid painting myself into a corner by assuming too much about how the dump device works.)

In D11723#254802, @markj wrote:

(I'm holding off on this for a bit longer as I re-port netdump to HEAD. netdump is a bit different from a regular dump device since it doesn't have a fixed size, and I'm trying to avoid painting myself into a corner by assuming too much about how the dump device works.)

What I'd like to do is add dumper methods to write the header and encryption key. If no methods are specified, i.e., the case where the dump device is a GEOM device, the code in kern_shutdown.c will have control of the dump layout. netdump will define methods to transmit the kerneldump header and key to the server.

I don't plan to make any further changes to this revision unless there's any more feedback (or I've missed something in the existing feedback).

Closed by commit rS324965: Add support for compressed kernel dumps. (authored by markj). · Explain WhyOct 25 2017, 12:51 AM

This revision was automatically updated to reflect the committed changes.

cem added inline comments.Jun 15 2018, 6:38 PM

head/sys/sys/kerneldump.h
93–101 ↗	(On Diff #34309)	I know this is a bit late, but given this feature hasn't made it to a stable/ branch yet, would it be possible to move `dumpextent` and `compression` to after `panicstring`, to avoid changing offsets unnecessarily? Yes, it'd involve revving the KERNELDUMPVERSION again. I guess I'll post a review for the proposal and see what you think.