Page MenuHomeFreeBSD

Move dump offset tracking into MI code.
ClosedPublic

Authored by markj on Jul 25 2017, 12:45 AM.

Details

Summary

All of the kernel dump implementations keep track of the current offset
within the dump device. However, except for textdumps, they all write
the dump sequentially, so we can simplify the implementations by having
the MI kernel dump code keep track of the current offset. Thus,
dump_write() is modified to exclude the offset. dump_raw_write() is now
exported, so there still exists an API for writing at a given offset. It
is used by the textdump code, and the graid dump code.

This is necessary for a reasonable implementation of compressed kernel
dumps, since with compression enabled a call to dump_write() does not
necessarily result in a write to disk.

Also simplify dump_encrypted_write() somewhat: use dump_raw_write()
instead of calling the dumper directly, which lets us remove some
duplicated checks; don't bother keeping track of the current offset,
since that's maintained in kern_shutdown.c now too.

Test Plan

I'm using a script to test kernel dumps in bhyve. It exercises combinations of
minidumps vs. full dumps, compression, encryption, and dump device sector
size.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

markj created this revision.Jul 25 2017, 12:45 AM
markj edited the test plan for this revision. (Show Details)Jul 25 2017, 12:49 AM
markj added reviewers: asomers, def, cem, rgrimes.
cem accepted this revision.Jul 27 2017, 3:49 AM

LGTM.

sys/amd64/amd64/minidump_machdep.c
95 ↗(On Diff #31150)

Stupid bikeshedding: Maybe rename dump_write to dump_append, now that the offset is tracked implicitly?

sys/ddb/db_textdump.c
248 ↗(On Diff #31150)

It might be desirable to compress or encrypt textdumps at some point. Do they really need to write at a specific offset? (Just asking — don't need or want to see that change in this review.)

This revision is now accepted and ready to land.Jul 27 2017, 3:49 AM
markj added inline comments.Jul 27 2017, 11:31 PM
sys/amd64/amd64/minidump_machdep.c
95 ↗(On Diff #31150)

Yeah, I think that'd be an improvement. And then rename dump_raw_write() to dump_write().

Alternately, follow the convention of write(2) and pwrite(2) and call them dump_write() and dump_pwrite(). Do you have any preference?

sys/ddb/db_textdump.c
248 ↗(On Diff #31150)

The comment at the beginning of db_textdump.c says that the reason is that we don't know the size of the textdump in advance. So the code writes it out in reverse. (The contents of each block are written normally, but you have to construct the final textdump by concatenating blocks starting at the highest LBA.)

With my other changes we could address that by using the dumpextent field and writing the textdump near or at the beginning of the dump device.

cem added inline comments.Jul 27 2017, 11:43 PM
sys/amd64/amd64/minidump_machdep.c
95 ↗(On Diff #31150)

Yeah, userspace write() does the implicit offset thing. My thought is that kernel interfaces like VOP_WRITE take an explicit offset rather than relying on the state. So I'd say my preference would be append/write. But it's only a weak preference.

sys/ddb/db_textdump.c
248 ↗(On Diff #31150)

Yeah, I don't understand why starting at the beginning of the partition and writing down isn't the way to go. Perhaps the gap for metadata at start of partition wasn't present in the early version of the code and they did this to avoid trampling that, if dumps were small enough? Or if you must write near to the end of the partition, have the dump_write interface automatically write back-to-front.

markj added inline comments.Jul 28 2017, 12:20 AM
sys/amd64/amd64/minidump_machdep.c
95 ↗(On Diff #31150)

Yeah, fair enough.

sys/ddb/db_textdump.c
248 ↗(On Diff #31150)

See D11723 for the comment about swapping during boot.

We could probably write back to front all the time, but I suspect that would make savecore (and the dump process itself) much slower on spinning disks since we'd no longer be doing sequential I/O? That doesn't matter for textdumps since they're usually quite small.

cem added inline comments.Jul 28 2017, 12:37 AM
sys/ddb/db_textdump.c
248 ↗(On Diff #31150)

We could probably write back to front all the time, but I suspect that would make savecore (and the dump process itself) much slower on spinning disks since we'd no longer be doing sequential I/O? That doesn't matter for textdumps since they're usually quite small.

Yeah, sounds bad with write cache disabled. WCE drives might coalesce enough to mitigate the damage.

markj updated this revision to Diff 31336.Jul 29 2017, 7:24 PM
markj edited edge metadata.
  • Address Conrad's feedback.
  • Move the comment for dump_finish() to the right place.
This revision now requires review to proceed.Jul 29 2017, 7:24 PM
def edited edge metadata.Aug 3 2017, 11:26 AM

I'm using a script to test kernel dumps in bhyve. It exercises combinations of minidumps vs. full dumps, compression, encryption, and dump device sector size.

Could you actually share this script with us? It would be useful when we'll need to test further changes in crash dump routines.

sys/kern/kern_shutdown.c
174 ↗(On Diff #31336)

dumpoff should a part of the dumperinfo structure as we operate on a di pointer rather than the global dumper variable in dump API.

1064 ↗(On Diff #31336)

As we keep track of dumplo in MI code and don't check if data are written continuously in dump_encrypted_write() anymore because we only append them I would rename dump_encrypted_write() to dump_encrypted_append() and not pass offset to it.

sys/sys/conf.h
342–349 ↗(On Diff #31336)

I would keep this list in the same order as in a source file.

markj updated this revision to Diff 31604.Aug 5 2017, 12:03 AM
  • Address review feedback from D11722.
markj marked 3 inline comments as done.Aug 5 2017, 12:04 AM
In D11722#245512, @def wrote:

I'm using a script to test kernel dumps in bhyve. It exercises combinations of minidumps vs. full dumps, compression, encryption, and dump device sector size.

Could you actually share this script with us? It would be useful when we'll need to test further changes in crash dump routines.

Yes, I plan to commit it under tools/. At the moment it's incredibly hacky and makes some assumptions about its environment, but it's a quick way to validate the large set of scenarios that come out of combinations of our kernel dump features.

markj updated this revision to Diff 33576.Sep 29 2017, 9:16 PM

Fix bugs in the last update.
dump_encrypted_write() is called by dump_append(), so it cannot use
dump_append() to write to the dump device.

cem accepted this revision.Sep 30 2017, 3:43 PM
This revision is now accepted and ready to land.Sep 30 2017, 3:43 PM
This revision was automatically updated to reflect the committed changes.