Page MenuHomeFreeBSD

Add support for compressed kernel dumps.
ClosedPublic

Authored by markj on Jul 25 2017, 12:45 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 12, 2:02 PM
Unknown Object (File)
Mon, Apr 8, 11:55 PM
Unknown Object (File)
Mon, Apr 8, 11:04 AM
Unknown Object (File)
Mar 11 2024, 6:23 PM
Unknown Object (File)
Mar 11 2024, 6:21 PM
Unknown Object (File)
Mar 11 2024, 6:21 PM
Unknown Object (File)
Mar 11 2024, 6:21 PM
Unknown Object (File)
Mar 8 2024, 3:09 AM
Subscribers

Details

Summary

Compression can be requested by passing -c to dumpon(8) or by setting
dumpcompress=YES in rc.conf. savecore(8) will automatically save
compressed vmcores with a .gz extension.

The kernel dump header struct is modified and its version is bumped. Two
new fields were added:

  • dumpextent, which gives the distance between the leading and trailing headers. With uncompressed dumps, this is equal to dumplength.

    Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.
  • compression, the compression algorithm used. In particular, I made an affordance for alternate compression algorithms, though we currently only support DEFLATE.

I took bytes out of the panicstr buffer, since most panic strings are
much shorter than 192 bytes.

We currently don't support simultaneous encryption and compression. The
reason is that the current implementation of encrypted dumps does not
add any padding bytes, as uncompressed dumps are always a multiple of
the device sector size in length. To support compressed dumps, though,
we need padding bytes, which complicates the implementation somewhat.
Therefore, that support is deferred to a different revision.

Test Plan

I'm using a script to test kernel dumps in bhyve. It exercises combinations of
minidumps vs. full dumps, compression, encryption, and dump device sector
size.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj edited the test plan for this revision. (Show Details)
markj added reviewers: asomers, def, cem, rgrimes.

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.

I think I'd prefer we just always start compressed dumps at 0. Simple and consistent. What's the benefit of starting partway through the partition some of the time?

sbin/savecore/savecore.c
706 ↗(On Diff #31151)

Isn't the space needed only dumpextent if compressed cores are kept compressed?

735 ↗(On Diff #31151)

This comment doesn't seem to make sense right now.

837 ↗(On Diff #31151)

Not sure the isencrypted part of this check makes sense. It's tautalogical now and I don't see a problem with it in the future either. I think it should just be: if (compress || iscompressed) {

sys/kern/kern_shutdown.c
176 ↗(On Diff #31151)

Perhaps RWTUN?

1283 ↗(On Diff #31151)

Where did the di->blocksize come from?

1386 ↗(On Diff #31151)

This line is redundant.

sys/sys/kerneldump.h
89 ↗(On Diff #31151)

comment seems misleading or open to misinterpretation. dumplength represents the uncompressed size of the dump

In D11723#243490, @cem wrote:

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.

I think I'd prefer we just always start compressed dumps at 0. Simple and consistent. What's the benefit of starting partway through the partition some of the time?

The reason I've been given in the past is that there's some chance we might swap to the dump device during boot, before the core is recovered... it doesn't seem very plausible to me, but it's not impossible. savecore is one of the last rc scripts that runs.

sbin/savecore/savecore.c
706 ↗(On Diff #31151)

See my other comment in kerneldump.h - dumplength is the compressed length, and dumplength <= dumpextent.

735 ↗(On Diff #31151)

Yeah, this was from when I had implemented support for encryption+compression. Will fix.

837 ↗(On Diff #31151)

When savecore saves an encrypted dump, it saves it to vmcore_encrypted.0 rather than vmcore.0. Since we decrypt before decompressing, I wanted to ensure that the vmcore was still called vmcore_encrypted.0 rather than vmcore.0.gz. But for now it is indeed tautological.

sys/kern/kern_shutdown.c
176 ↗(On Diff #31151)

Will fix.

1283 ↗(On Diff #31151)

It's for the trailing copy of the header.

The comment is kind of confusing when paired with this line though. I think I'll try making an ASCII diagram of the layout on disk; that'll make the calculations more clear.

1386 ↗(On Diff #31151)

Actually it isn't - kerneldump_parity() XORs the 32-bit words in the header, and the parity field comes last. So if kdh->parity = 0, then kerneldump_parity() returns the bitwise NOT of the XOR of all but the last word in the header. So after we set kdh->parity = 0; kdh->parity = kerneldump_parity(kdh), the XOR of all the words in kdh must be 0, and that's how we verify that the header is valid.

This should probably be lifted into a kerneldump_parity_update() subroutine.

sys/sys/kerneldump.h
89 ↗(On Diff #31151)

Note that we set dumplength to the compressed length in dump_final(). In particular, we always have dumplength <= dumpextent. The comment is not very useful though, I'll improve it.

Address review feedback from cem and rgrimes.

The reason I've been given in the past is that there's some chance we might swap to the dump device during boot, before the core is recovered... it doesn't seem very plausible to me, but it's not impossible. savecore is one of the last rc scripts that runs.

It seems to me like savecore should have an rc requirement of "BEFORE:" the script that enables swap. But maybe some tiny machines need swap to boot, I don't know.

sbin/savecore/savecore.c
706 ↗(On Diff #31151)

Huh, I must have got it confused. What's the function of dumpextent then? If it's just a rounded up dumplength, isn't it redundant?

837 ↗(On Diff #31151)

The snprintf in this clause already prints vmcore_encrypted.last.gz if the dump is encrypted, no?

Or do you think an encrypted gzip shouldn't end in .gz? Maybe it would be more consistent to add ".encrypted" as a suffix after the .gz?

sys/kern/kern_shutdown.c
1283 ↗(On Diff #31151)

Does the 2 * di->blocksize in dumpextent above (dumpgzs clause) already account for that? I might just be confused. ASCII diagram would help.

1386 ↗(On Diff #31151)

I see.

sbin/savecore/savecore.c
706 ↗(On Diff #31151)

The algorithm used by savecore is basically:

  1. read the last sector of the device to find the trailing header
  2. use the recorded dumpextent to find the offset of the copy of the leading header
  3. read dumplength bytes off the device, starting following the leading header and optional crypto key

When the dump isn't compressed, dumpextent and dumplength are basically the same thing. But dumpextent isn't just a rounded up dumplength.

837 ↗(On Diff #31151)

Yeah, it seems odd to use a .gz suffix for a file that can't be gunzip'ed. I like the idea of having vmcore.0.encrypted rather than vmcore_encrypted.0, i.e., what we have today. But changing that would require modifications to decryptcore as well, so I'd rather defer that to a different change.

sys/kern/kern_shutdown.c
1283 ↗(On Diff #31151)

I added a diagram above dump_start().

cem added inline comments.
sbin/savecore/savecore.c
706 ↗(On Diff #31151)

Ah, right. So we wouldn't need it if all dumps started at the beginning of the drive and we could just read the leading header.

This revision is now accepted and ready to land.Jul 31 2017, 11:35 PM

(I'm holding off on this for a bit longer as I re-port netdump to HEAD. netdump is a bit different from a regular dump device since it doesn't have a fixed size, and I'm trying to avoid painting myself into a corner by assuming too much about how the dump device works.)

(I'm holding off on this for a bit longer as I re-port netdump to HEAD. netdump is a bit different from a regular dump device since it doesn't have a fixed size, and I'm trying to avoid painting myself into a corner by assuming too much about how the dump device works.)

What I'd like to do is add dumper methods to write the header and encryption key. If no methods are specified, i.e., the case where the dump device is a GEOM device, the code in kern_shutdown.c will have control of the dump layout. netdump will define methods to transmit the kerneldump header and key to the server.

I don't plan to make any further changes to this revision unless there's any more feedback (or I've missed something in the existing feedback).

This revision was automatically updated to reflect the committed changes.
head/sys/sys/kerneldump.h
93–101

I know this is a bit late, but given this feature hasn't made it to a stable/ branch yet, would it be possible to move dumpextent and compression to after panicstring, to avoid changing offsets unnecessarily? Yes, it'd involve revving the KERNELDUMPVERSION again. I guess I'll post a review for the proposal and see what you think.