Page MenuHomeFreeBSD

List-ify kernel dump device configuration
ClosedPublic

Authored by cem on Apr 21 2019, 5:21 PM.

Details

Summary

Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails. E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device. Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

Test Plan
testvm# dumpon -l -v
kernel dumps on index: device
0: vtnet0
1: md0.nop
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

# New dump devices can be inserted as top priority:
testvm# dumpon -i 0 md1.nop
testvm# dumpon -l -v
kernel dumps on index: device
0: md1.nop
1: vtnet0
2: md0.nop
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

# Specific devices and configurations can be removed with -r
testvm# dumpon -r -s 192.168.0.1 -c 192.168.0.37 vtnet0
testvm# dumpon -l -v
kernel dumps on index: device
0: md1.nop
1: md0.nop

# Default (no -i) is append:
testvm# dumpon -s 192.168.0.1 -c 192.168.0.37 vtnet0
testvm# dumpon -l -v
kernel dumps on index: device
0: md1.nop
1: md0.nop
2: vtnet0
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

...
# Panic test; note that the netdump configuration is intentionally bogus (no
# such netdump server)
testvm# dumpon -l -v
kernel dumps on index: device
0: vtnet0
1: md1.nop
2: md0.nop
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

testvm# sync
testvm# mount -u -f -o current,ro /
testvm# sysctl debug.kdb.panic=1
...
netdump: overwriting mbuf zone pointers
netdump in progress. searching for server...
. . . failed to contact netdump server

** DUMP FAILED (ERROR 22) **

Dump failed. Partition too small.

Dump failed. Partition too small.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
cem added inline comments.Apr 25 2019, 4:54 AM
sys/geom/geom_dev.c
151 ↗(On Diff #56625)

It could. I mostly (out of habit) wanted to normalize the index to 0-based as soon as possible and pass around a 0-based index, even if there is a little duplication. Mixing up 0- and 1-based indices happens and the easiest way to avoid that is to normalize as soon as possible.

In this case, I think it would be ok to push the 1-based index down into insert_dumper.

(The motivation for 1-based indexing at all is to retain (some) ioctl ABI compatibility with 12.0, in that kda_enable/kda_index of 0 still means "disable," and kda_enable/kda_index of 1 still means "configure this dump device on.")

sys/sys/conf.h
368 ↗(On Diff #56625)

Sure, that seems like a good idea to me. Will fix.

cem updated this revision to Diff 56680.Apr 26 2019, 12:29 AM
cem marked 3 inline comments as done.

Feedback from Mark:

  • Put dumper functions in a dumper_ namespace while here
  • Push 1-indexing down into dumper_insert
  • Xr netdump 4 in manual page
markj added a comment.Apr 29 2019, 2:17 PM

Sorry for the delay, I had been traveling.

sys/kern/kern_shutdown.c
996 ↗(On Diff #56680)

You can replace the flag variable with if (di == TAILQ_FIRST(&dumper_configs)).

1278 ↗(On Diff #56680)

I know encryption and compression can't be configured simultaneously right now, but that could easily change in the future. Is there any other reason to use an else if here rather than plain if?

984 ↗(On Diff #56453)

There's a lot of other sysctl handlers which happily wire user memory without such a check, and in general unprivileged users are permitted to wire a small amount (more than a page) of physical memory anyway. Rather than having ad-hoc policies implemented in sysctl handlers, I'd prefer to handle the issue globally in vslock() (which currently gives you a very loose seatbelt).

sys/sys/conf.h
366 ↗(On Diff #56680)

Most of these parameters are extracted from a diocskerneldump_arg, why not just pass that structure directly like with dumper_remove()?

cem added inline comments.Apr 29 2019, 5:20 PM
sys/kern/kern_shutdown.c
996 ↗(On Diff #56680)

Ok

1278 ↗(On Diff #56680)

This branch isn't related to our current compression+encryption limitation. It's just asserting that a non-NONE kda_compression mode doesn't match kdcomp == NULL. Plain if would require restating kdcomp == NULL && which we inherit via else if.

984 ↗(On Diff #56453)

On the one hand, sure. On the other hand, to exceed a 4k page, even with ~255 devices (theoretical max based on uint8_t kda_index), would take just over 16 bytes per device name on average (which is much longer than typical for network devices or even disk partitions). It's just a relatively huge amount of memory relative to what is needed and also the minimum unit we can wire, so I figured it was pretty reasonable.

I'm happy to remove it, though. Will plan to do that.

sys/sys/conf.h
366 ↗(On Diff #56680)

I didn't set out to rototill the interface that much. I don't have any good argument against doing so, though. I agree that passing a struct with arguments would be more sensible than 10s of parameters which can be easily misordered, and diocskerneldump_arg is probably the logical choice. I'll make that change.

cem added a comment.Apr 29 2019, 5:26 PM

Sorry for the delay, I had been traveling.

Oh, and no problem! I appreciate your feedback and there isn't any rush on this. :-) Thanks!

cem updated this revision to Diff 56826.Apr 29 2019, 7:20 PM
cem marked 4 inline comments as done.
  • Drop arbitrary wired memory limit
  • Drop unnecessary flag in loop iteration
  • Use the more concise diocskerneldump_arg structure for dumper_insert args
emaste added a subscriber: emaste.Apr 29 2019, 8:52 PM
scottl accepted this revision.Apr 30 2019, 7:01 AM

I'm a bit lost on where the thread on sysctl_wire_old_buffer() is going. I'm fine with the code as-is, recommend not removing the call unless Mark advises otherwise.

This revision is now accepted and ready to land.Apr 30 2019, 7:01 AM
markj accepted this revision.Apr 30 2019, 2:35 PM

Looks ok to me, just a few unimportant nits.

I'm a bit lost on where the thread on sysctl_wire_old_buffer() is going. I'm fine with the code as-is, recommend not removing the call unless Mark advises otherwise.

An earlier version of the diff caused the sysctl handler to return an error if the old buffer size was larger than a page. I suggested removing that check, not the sysctl_wire_old_buffer() call.

sys/kern/kern_shutdown.c
217 ↗(On Diff #56826)

I'm not sure what "unprotected" means here.

222 ↗(On Diff #56826)

Convert to a full sentence?

1191 ↗(On Diff #56826)

I know the old code did this, but an explict memcpy() might be nicer. I've gotten confused here before because I'm not used to seeing structures being copied this way.

1315 ↗(On Diff #56826)

Is it possible to match multiple devices?

cem added a comment.Apr 30 2019, 3:31 PM

Thanks Scott and Mark!

sys/kern/kern_shutdown.c
217 ↗(On Diff #56826)

More correct would be “unprivileged.” We shouldn’t crash if root manipulates a kernel data structure concurrently with access either, but it’s slightly better. I will probably remove the adjective and just mention concurrent manipulation and the sysctl.

222 ↗(On Diff #56826)

Does “These are...” make the comment easier to understand? I doubt it, though it improves style(9) conformance. I inherited this one, just added “(s)”. Will fix for style sake.

1191 ↗(On Diff #56826)

I don’t have a strong preference for either. I’d guess there are more people confused about this than harmed by memcpy, so I’m happy to change it.

1315 ↗(On Diff #56826)

Yes - this allows things like “dumpon off” to continue to disable everything without iterating all dump configurations explicitly. Are you thinking the sys/disk.h comment could specify that more clearly? For that matter, it’s probably more appropriate in the .c implementation anyway.

markj added inline comments.Apr 30 2019, 7:44 PM
sys/kern/kern_shutdown.c
222 ↗(On Diff #56826)

I just meant to capitalize and add a period since the comment is on its own line now.

1315 ↗(On Diff #56826)

It might be worth noting that wildcard matches are permitted.

cem updated this revision to Diff 56897.May 1 2019, 4:15 AM
cem marked 5 inline comments as done.
  • Update some comments and minor style changes suggested in Mark's last review
  • Break 12 ABI and provide backwards-compatibility ioctls; happily, we can make the simplifying assumption that structure sizes line up and we can memcpy most fields.
  • As long as we're breaking ABI, reunite netdump_conf and diocskerneldump_arg.
  • I still need to update dumpon(8); TODO.
  • Tinderbox has been broken by linuxkpi changes so it's not clean, but most 64-bit archs built kernel ok.
This revision now requires review to proceed.May 1 2019, 4:15 AM
cem updated this revision to Diff 56898.May 1 2019, 4:24 AM

Trivial fix for dumpon(8)

cem planned changes to this revision.May 1 2019, 5:03 PM

Some small changes are still needed to convert some callers to the dumper_remove API; additionally, dumpon netdump support needs to be rototilled a bit.

sys/geom/raid/g_raid.h
166 ↗(On Diff #56898)

This eliminated a spurious warning/error about having a variable length array in the middle of the structure (struct dumperinfo, via g_kerneldump::di). In practice, I don't think anyone outside of kern_shutdown.c is accessing or manipulating di_devname, and kern_shutdown mallocs its own large-enough structures for the name.

It would be equally acceptable IMO to drop this g_raid_disk change and simply change dumperinfo::di_devname[] to dumperinfo::di_devname[0] (or [1]). If you feel strongly one way or the other, let me know.

sys/kern/kern_shutdown.c
977 ↗(On Diff #56898)

Tinderbox — not all kernel configurations specify PRINTF_BUFR_SIZE :-(.

1317–1319 ↗(On Diff #56898)

At least REMOVE_ALL shouldn't error; it breaks the 12 ABI to do so (unless we explicitly squash it in 12-compat callers) because dumpon(8) in 12 always does a clear first, even if no dump device is set, before setting a dump device.

sys/netinet/netdump/netdump.h
74–75 ↗(On Diff #56898)

This is a shortcut for more verbose size/offset checks of all members. Offset of ndc12_iface vs kda_iface is verified in another file.

I figured as long as I was rototilling the ABI, I might as well unify the configuration structure for netdump and disk dump.

sys/netinet/netdump/netdump_client.c
1290–1292 ↗(On Diff #56898)

I believe this explicit bzeroing came in to geom_dev via the EKCD commit(s); I included it here for parity with geom_dev configuration.

scottl accepted this revision.May 1 2019, 7:48 PM
cem updated this revision to Diff 56932.May 1 2019, 9:39 PM
  • Rototill dumpon(8) for unified diosckerneldump_arg
  • Add missing ABI compat for DIOCSKERNELDUMP_FREEBSD12 in netdump -- needed by initial 'clear' that 12.x dumpon performs
  • Two minor error number changes in netdump's ioctl interface (use ENOTTY for bad command and ENODEV for !netdump_supported)
  • Fix device-remove in netdump unload and geom orphan to use KDA_REMOVE_DEV
markj accepted this revision.May 2 2019, 6:39 PM

I can't see any problems. I find it odd that we support multiple dump configurations for the same device, especially since that introduces some complexity (KDA_REMOVE vs. KDA_REMOVE_DEV). I'm having trouble imagining a scenario where that would be useful.

sys/geom/geom_dev.c
510 ↗(On Diff #56932)

Might as well move this definition into the case statement below?

sys/sys/disk.h
192 ↗(On Diff #56932)

We will probably want to permit these to be IPv6 addresses. That can be done as a separate change before 13.0.

This revision is now accepted and ready to land.May 2 2019, 6:39 PM
cem added a comment.May 2 2019, 8:28 PM

I can't see any problems. I find it odd that we support multiple dump configurations for the same device, especially since that introduces some complexity (KDA_REMOVE vs. KDA_REMOVE_DEV). I'm having trouble imagining a scenario where that would be useful.

The case I have in mind is falling back to higher (and slower) levels of compression if the dump wouldn't fit at the faster/less compressed level. Or you could imagine falling back to a 2nd netdump server (same /dev/netdump device, same iface) if the first server can't be reached. Maybe these are a reach, but that was the idea.

sys/geom/geom_dev.c
510 ↗(On Diff #56932)

Can't — it must be in-scope after leaving that case statement's embedded block due to the fallthrough.

sys/sys/disk.h
192 ↗(On Diff #56932)

Hm, it'll be another ABI break too. That's a bit unfortunate; I'd like to avoid a second round of compatibility ABI for older versions of head. Maybe we should just make that change now (without adding ipv6 support to dumpon/netdump, just adding union types to the ABI interface)?

It'd require minor fiddling in NETDUMPGCONF_FREEBSD12, but otherwise should be straightforward, I think.

(I don't see any good named union type for in_addr+in6_addr — outside of ofed's cma_ip_addr — but it's a widely used construct in the tree — nfs, tcp_lro, tcp_fastopen, ip_fw, pf, if_llatbl all seem to have some kind of definition.) There's in_dependaddr but I'm not sure in_addr_4in6 is the most appropriate spelling of in_addr given the way we would use it. I'm leaning towards defining a new type.

markj added a comment.May 2 2019, 8:37 PM
In D19996#433713, @cem wrote:

I can't see any problems. I find it odd that we support multiple dump configurations for the same device, especially since that introduces some complexity (KDA_REMOVE vs. KDA_REMOVE_DEV). I'm having trouble imagining a scenario where that would be useful.

The case I have in mind is falling back to higher (and slower) levels of compression if the dump wouldn't fit at the faster/less compressed level. Or you could imagine falling back to a 2nd netdump server (same /dev/netdump device, same iface) if the first server can't be reached. Maybe these are a reach, but that was the idea.

Ok, makes sense.

sys/geom/geom_dev.c
510 ↗(On Diff #56932)

Ah, I thought it was permitted to refer to variables outside of their lexical scope so long as you're still in the same function.

sys/sys/disk.h
192 ↗(On Diff #56932)

We don't provide compatibility guarantees in head, so we can just break the interface. (Hence "before 13.0.")

I think we can (ab)use struct sockaddr?

cem planned changes to this revision.May 3 2019, 2:16 AM
cem marked an inline comment as done.
cem added inline comments.
sys/geom/geom_dev.c
510 ↗(On Diff #56932)

You might be right, but I wasn't confident of that and don't want to dig up the C standard to figure it out. :-)

sys/sys/disk.h
192 ↗(On Diff #56932)

Sure, but I'm only adding "12" abi compatibility for slightly older head, not because I actually care about running 12 world (dumpon in particular) on a 13 kernel. The idea is to make the transition less painful for head consumers / crashboxes that regularly up/downgrade kernels, so the single step seems better to me.

Sockaddr isn't big enough and/or doesn't represent the right thing. It's the same size as in6_addr but doesn't logically represent an in6_addr. sockaddr_storage is closer, but huge overkill. sockaddr_in / sockaddr_in6 have a lot of redundant stuff we don't care about for this use, I think. I'll put something together.

cem updated this revision to Diff 57026.May 3 2019, 11:26 PM
cem marked an inline comment as done.

Leave room in ioctl ABI for ipv6 as long as we're rototilling the interface.
Nothing actually supports non-INET netdump yet.

markj added inline comments.May 4 2019, 9:20 PM
sys/sys/disk.h
192 ↗(On Diff #56932)

I don't see what's wrong with sockaddr_storage. It's larger than is needed but that doesn't matter for a rarely executed ioctl, and it's standard to use that in code that is address family-agnostic. And sys/_sockaddr_storage.h will introduce less pollution in sys/disk.h.

cem added inline comments.May 5 2019, 12:21 AM
sys/sys/disk.h
192 ↗(On Diff #56932)

Nothing wrong with sockaddr_storage, really. Mostly it just makes the interface more general in a way that isn't helpful (requires more boilerplate to check for EINVAL input). We can use it instead, if you would prefer.

markj added inline comments.May 5 2019, 1:30 AM
sys/sys/disk.h
192 ↗(On Diff #56932)

I think I'd still prefer it to a hand-rolled structure. Is the extra boilerplate just that we have to check the AF of each field instead of one?

cem added inline comments.May 5 2019, 1:38 AM
sys/sys/disk.h
192 ↗(On Diff #56932)

Yeah. And whatever ambiguity surrounds other fields that matter for sockets but not hosts, eg flowid, port, etc.

cem added inline comments.May 5 2019, 2:46 AM
sys/sys/disk.h
192 ↗(On Diff #56932)

Also that sockaddr_storage is an array type rather than a union and all real access requires ugly casting; and we don't save anything in header includes because we need net/if.h which means we pull in the in/in6 addr types already.

cem added inline comments.May 5 2019, 2:49 AM
sys/sys/disk.h
192 ↗(On Diff #56932)

I've rototilled sockaddr_storage into just dumpon alone (to give it a shot) and it is really really ugly. And there would be a large number of additional churn involving sockaddr_in casts in netdump_client.c to finish the job. I don't think sockaddr_storage is a suitable type for this use case (and it's only used with BSD sockets for legacy API reasons).

So I don't plan to continue converting to sockaddr_storage — I'll just leave it with the address union instead.

markj accepted this revision.May 6 2019, 1:56 PM
This revision is now accepted and ready to land.May 6 2019, 1:56 PM
This revision was automatically updated to reflect the committed changes.