Page MenuHomeFreeBSD

Add support for compressed kernel dumps.
ClosedPublic

Authored by markj on Jul 25 2017, 12:45 AM.
Tags
None
Referenced Files
F103158142: D11723.diff
Thu, Nov 21, 5:30 PM
F103156806: D11723.id.diff
Thu, Nov 21, 5:10 PM
Unknown Object (File)
Tue, Nov 19, 4:22 PM
Unknown Object (File)
Sat, Nov 2, 12:52 AM
Unknown Object (File)
Oct 19 2024, 8:45 PM
Unknown Object (File)
Oct 18 2024, 9:00 PM
Unknown Object (File)
Sep 29 2024, 5:03 AM
Unknown Object (File)
Sep 18 2024, 10:36 AM
Subscribers

Details

Summary

Compression can be requested by passing -c to dumpon(8) or by setting
dumpcompress=YES in rc.conf. savecore(8) will automatically save
compressed vmcores with a .gz extension.

The kernel dump header struct is modified and its version is bumped. Two
new fields were added:

  • dumpextent, which gives the distance between the leading and trailing headers. With uncompressed dumps, this is equal to dumplength.

    Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.
  • compression, the compression algorithm used. In particular, I made an affordance for alternate compression algorithms, though we currently only support DEFLATE.

I took bytes out of the panicstr buffer, since most panic strings are
much shorter than 192 bytes.

We currently don't support simultaneous encryption and compression. The
reason is that the current implementation of encrypted dumps does not
add any padding bytes, as uncompressed dumps are always a multiple of
the device sector size in length. To support compressed dumps, though,
we need padding bytes, which complicates the implementation somewhat.
Therefore, that support is deferred to a different revision.

Test Plan

I'm using a script to test kernel dumps in bhyve. It exercises combinations of
minidumps vs. full dumps, compression, encryption, and dump device sector
size.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 10654
Build 11058: arc lint + arc unit

Event Timeline

markj edited the test plan for this revision. (Show Details)
markj added reviewers: asomers, def, cem, rgrimes.

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.

I think I'd prefer we just always start compressed dumps at 0. Simple and consistent. What's the benefit of starting partway through the partition some of the time?

sbin/savecore/savecore.c
706

Isn't the space needed only dumpextent if compressed cores are kept compressed?

735

This comment doesn't seem to make sense right now.

837

Not sure the isencrypted part of this check makes sense. It's tautalogical now and I don't see a problem with it in the future either. I think it should just be: if (compress || iscompressed) {

sys/kern/kern_shutdown.c
176

Perhaps RWTUN?

1283

Where did the di->blocksize come from?

1386

This line is redundant.

sys/sys/kerneldump.h
89

comment seems misleading or open to misinterpretation. dumplength represents the uncompressed size of the dump

In D11723#243490, @cem wrote:

Kernel dumps are written sequentially, but when compression is enabled we don't know how long the dump will be. To handle this, we first check to see if the uncompressed dump would fit in the dump device. If so, we start writing at offset [end of device - uncompressed length]. If the uncompressed kernel dump would not fit, we start writing the compressed dump at the beginning of the device and hope that it ends up fitting; if not, we'll get an error when dump_raw_write() attempts to write past the device boundary.

I think I'd prefer we just always start compressed dumps at 0. Simple and consistent. What's the benefit of starting partway through the partition some of the time?

The reason I've been given in the past is that there's some chance we might swap to the dump device during boot, before the core is recovered... it doesn't seem very plausible to me, but it's not impossible. savecore is one of the last rc scripts that runs.

sbin/savecore/savecore.c
706

See my other comment in kerneldump.h - dumplength is the compressed length, and dumplength <= dumpextent.

735

Yeah, this was from when I had implemented support for encryption+compression. Will fix.

837

When savecore saves an encrypted dump, it saves it to vmcore_encrypted.0 rather than vmcore.0. Since we decrypt before decompressing, I wanted to ensure that the vmcore was still called vmcore_encrypted.0 rather than vmcore.0.gz. But for now it is indeed tautological.

sys/kern/kern_shutdown.c
176

Will fix.

1283

It's for the trailing copy of the header.

The comment is kind of confusing when paired with this line though. I think I'll try making an ASCII diagram of the layout on disk; that'll make the calculations more clear.

1386

Actually it isn't - kerneldump_parity() XORs the 32-bit words in the header, and the parity field comes last. So if kdh->parity = 0, then kerneldump_parity() returns the bitwise NOT of the XOR of all but the last word in the header. So after we set kdh->parity = 0; kdh->parity = kerneldump_parity(kdh), the XOR of all the words in kdh must be 0, and that's how we verify that the header is valid.

This should probably be lifted into a kerneldump_parity_update() subroutine.

sys/sys/kerneldump.h
89

Note that we set dumplength to the compressed length in dump_final(). In particular, we always have dumplength <= dumpextent. The comment is not very useful though, I'll improve it.

Address review feedback from cem and rgrimes.

The reason I've been given in the past is that there's some chance we might swap to the dump device during boot, before the core is recovered... it doesn't seem very plausible to me, but it's not impossible. savecore is one of the last rc scripts that runs.

It seems to me like savecore should have an rc requirement of "BEFORE:" the script that enables swap. But maybe some tiny machines need swap to boot, I don't know.

sbin/savecore/savecore.c
706

Huh, I must have got it confused. What's the function of dumpextent then? If it's just a rounded up dumplength, isn't it redundant?

837

The snprintf in this clause already prints vmcore_encrypted.last.gz if the dump is encrypted, no?

Or do you think an encrypted gzip shouldn't end in .gz? Maybe it would be more consistent to add ".encrypted" as a suffix after the .gz?

sys/kern/kern_shutdown.c
1283

Does the 2 * di->blocksize in dumpextent above (dumpgzs clause) already account for that? I might just be confused. ASCII diagram would help.

1386

I see.

sbin/savecore/savecore.c
706

The algorithm used by savecore is basically:

  1. read the last sector of the device to find the trailing header
  2. use the recorded dumpextent to find the offset of the copy of the leading header
  3. read dumplength bytes off the device, starting following the leading header and optional crypto key

When the dump isn't compressed, dumpextent and dumplength are basically the same thing. But dumpextent isn't just a rounded up dumplength.

837

Yeah, it seems odd to use a .gz suffix for a file that can't be gunzip'ed. I like the idea of having vmcore.0.encrypted rather than vmcore_encrypted.0, i.e., what we have today. But changing that would require modifications to decryptcore as well, so I'd rather defer that to a different change.

sys/kern/kern_shutdown.c
1283

I added a diagram above dump_start().

cem added inline comments.
sbin/savecore/savecore.c
706

Ah, right. So we wouldn't need it if all dumps started at the beginning of the drive and we could just read the leading header.

This revision is now accepted and ready to land.Jul 31 2017, 11:35 PM

(I'm holding off on this for a bit longer as I re-port netdump to HEAD. netdump is a bit different from a regular dump device since it doesn't have a fixed size, and I'm trying to avoid painting myself into a corner by assuming too much about how the dump device works.)

(I'm holding off on this for a bit longer as I re-port netdump to HEAD. netdump is a bit different from a regular dump device since it doesn't have a fixed size, and I'm trying to avoid painting myself into a corner by assuming too much about how the dump device works.)

What I'd like to do is add dumper methods to write the header and encryption key. If no methods are specified, i.e., the case where the dump device is a GEOM device, the code in kern_shutdown.c will have control of the dump layout. netdump will define methods to transmit the kerneldump header and key to the server.

I don't plan to make any further changes to this revision unless there's any more feedback (or I've missed something in the existing feedback).

This revision was automatically updated to reflect the committed changes.
head/sys/sys/kerneldump.h
93–101 ↗(On Diff #34309)

I know this is a bit late, but given this feature hasn't made it to a stable/ branch yet, would it be possible to move dumpextent and compression to after panicstring, to avoid changing offsets unnecessarily? Yes, it'd involve revving the KERNELDUMPVERSION again. I guess I'll post a review for the proposal and see what you think.