Page MenuHomeFreeBSD

List-ify kernel dump device configuration
ClosedPublic

Authored by cem on Apr 21 2019, 5:21 PM.
Tags
None
Referenced Files
F81530532: D19996.diff
Wed, Apr 17, 3:24 PM
Unknown Object (File)
Sun, Apr 14, 9:59 PM
Unknown Object (File)
Sat, Apr 13, 11:03 PM
Unknown Object (File)
Sat, Apr 13, 10:09 PM
Unknown Object (File)
Sat, Apr 6, 8:54 PM
Unknown Object (File)
Sat, Apr 6, 8:51 PM
Unknown Object (File)
Mar 8 2024, 2:56 AM
Unknown Object (File)
Feb 24 2024, 9:54 PM
Subscribers

Details

Summary

Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails. E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device. Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

Test Plan
testvm# dumpon -l -v
kernel dumps on index: device
0: vtnet0
1: md0.nop
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

# New dump devices can be inserted as top priority:
testvm# dumpon -i 0 md1.nop
testvm# dumpon -l -v
kernel dumps on index: device
0: md1.nop
1: vtnet0
2: md0.nop
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

# Specific devices and configurations can be removed with -r
testvm# dumpon -r -s 192.168.0.1 -c 192.168.0.37 vtnet0
testvm# dumpon -l -v
kernel dumps on index: device
0: md1.nop
1: md0.nop

# Default (no -i) is append:
testvm# dumpon -s 192.168.0.1 -c 192.168.0.37 vtnet0
testvm# dumpon -l -v
kernel dumps on index: device
0: md1.nop
1: md0.nop
2: vtnet0
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

...
# Panic test; note that the netdump configuration is intentionally bogus (no
# such netdump server)
testvm# dumpon -l -v
kernel dumps on index: device
0: vtnet0
1: md1.nop
2: md0.nop
server address: 192.168.0.1
client address: 192.168.0.37
gateway address: 192.168.0.1

testvm# sync
testvm# mount -u -f -o current,ro /
testvm# sysctl debug.kdb.panic=1
...
netdump: overwriting mbuf zone pointers
netdump in progress. searching for server...
. . . failed to contact netdump server

** DUMP FAILED (ERROR 22) **

Dump failed. Partition too small.

Dump failed. Partition too small.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 24048
Build 22933: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/geom/geom_dev.c
150

It could. I mostly (out of habit) wanted to normalize the index to 0-based as soon as possible and pass around a 0-based index, even if there is a little duplication. Mixing up 0- and 1-based indices happens and the easiest way to avoid that is to normalize as soon as possible.

In this case, I think it would be ok to push the 1-based index down into insert_dumper.

(The motivation for 1-based indexing at all is to retain (some) ioctl ABI compatibility with 12.0, in that kda_enable/kda_index of 0 still means "disable," and kda_enable/kda_index of 1 still means "configure this dump device on.")

sys/sys/conf.h
365

Sure, that seems like a good idea to me. Will fix.

cem marked 3 inline comments as done.

Feedback from Mark:

  • Put dumper functions in a dumper_ namespace while here
  • Push 1-indexing down into dumper_insert
  • Xr netdump 4 in manual page

Sorry for the delay, I had been traveling.

sys/kern/kern_shutdown.c
984

There's a lot of other sysctl handlers which happily wire user memory without such a check, and in general unprivileged users are permitted to wire a small amount (more than a page) of physical memory anyway. Rather than having ad-hoc policies implemented in sysctl handlers, I'd prefer to handle the issue globally in vslock() (which currently gives you a very loose seatbelt).

996

You can replace the flag variable with if (di == TAILQ_FIRST(&dumper_configs)).

1270

I know encryption and compression can't be configured simultaneously right now, but that could easily change in the future. Is there any other reason to use an else if here rather than plain if?

sys/sys/conf.h
363–364

Most of these parameters are extracted from a diocskerneldump_arg, why not just pass that structure directly like with dumper_remove()?

sys/kern/kern_shutdown.c
984

On the one hand, sure. On the other hand, to exceed a 4k page, even with ~255 devices (theoretical max based on uint8_t kda_index), would take just over 16 bytes per device name on average (which is much longer than typical for network devices or even disk partitions). It's just a relatively huge amount of memory relative to what is needed and also the minimum unit we can wire, so I figured it was pretty reasonable.

I'm happy to remove it, though. Will plan to do that.

996

Ok

1270

This branch isn't related to our current compression+encryption limitation. It's just asserting that a non-NONE kda_compression mode doesn't match kdcomp == NULL. Plain if would require restating kdcomp == NULL && which we inherit via else if.

sys/sys/conf.h
363–364

I didn't set out to rototill the interface that much. I don't have any good argument against doing so, though. I agree that passing a struct with arguments would be more sensible than 10s of parameters which can be easily misordered, and diocskerneldump_arg is probably the logical choice. I'll make that change.

Sorry for the delay, I had been traveling.

Oh, and no problem! I appreciate your feedback and there isn't any rush on this. :-) Thanks!

cem marked 4 inline comments as done.
  • Drop arbitrary wired memory limit
  • Drop unnecessary flag in loop iteration
  • Use the more concise diocskerneldump_arg structure for dumper_insert args

I'm a bit lost on where the thread on sysctl_wire_old_buffer() is going. I'm fine with the code as-is, recommend not removing the call unless Mark advises otherwise.

This revision is now accepted and ready to land.Apr 30 2019, 7:01 AM

Looks ok to me, just a few unimportant nits.

I'm a bit lost on where the thread on sysctl_wire_old_buffer() is going. I'm fine with the code as-is, recommend not removing the call unless Mark advises otherwise.

An earlier version of the diff caused the sysctl handler to return an error if the old buffer size was larger than a page. I suggested removing that check, not the sysctl_wire_old_buffer() call.

sys/kern/kern_shutdown.c
217

I'm not sure what "unprotected" means here.

222

Convert to a full sentence?

1189

I know the old code did this, but an explict memcpy() might be nicer. I've gotten confused here before because I'm not used to seeing structures being copied this way.

1312

Is it possible to match multiple devices?

Thanks Scott and Mark!

sys/kern/kern_shutdown.c
217

More correct would be “unprivileged.” We shouldn’t crash if root manipulates a kernel data structure concurrently with access either, but it’s slightly better. I will probably remove the adjective and just mention concurrent manipulation and the sysctl.

222

Does “These are...” make the comment easier to understand? I doubt it, though it improves style(9) conformance. I inherited this one, just added “(s)”. Will fix for style sake.

1189

I don’t have a strong preference for either. I’d guess there are more people confused about this than harmed by memcpy, so I’m happy to change it.

1312

Yes - this allows things like “dumpon off” to continue to disable everything without iterating all dump configurations explicitly. Are you thinking the sys/disk.h comment could specify that more clearly? For that matter, it’s probably more appropriate in the .c implementation anyway.

sys/kern/kern_shutdown.c
222

I just meant to capitalize and add a period since the comment is on its own line now.

1312

It might be worth noting that wildcard matches are permitted.

cem marked 5 inline comments as done.
  • Update some comments and minor style changes suggested in Mark's last review
  • Break 12 ABI and provide backwards-compatibility ioctls; happily, we can make the simplifying assumption that structure sizes line up and we can memcpy most fields.
  • As long as we're breaking ABI, reunite netdump_conf and diocskerneldump_arg.
  • I still need to update dumpon(8); TODO.
  • Tinderbox has been broken by linuxkpi changes so it's not clean, but most 64-bit archs built kernel ok.
This revision now requires review to proceed.May 1 2019, 4:15 AM

Trivial fix for dumpon(8)

cem planned changes to this revision.May 1 2019, 5:03 PM

Some small changes are still needed to convert some callers to the dumper_remove API; additionally, dumpon netdump support needs to be rototilled a bit.

sys/geom/raid/g_raid.h
166

This eliminated a spurious warning/error about having a variable length array in the middle of the structure (struct dumperinfo, via g_kerneldump::di). In practice, I don't think anyone outside of kern_shutdown.c is accessing or manipulating di_devname, and kern_shutdown mallocs its own large-enough structures for the name.

It would be equally acceptable IMO to drop this g_raid_disk change and simply change dumperinfo::di_devname[] to dumperinfo::di_devname[0] (or [1]). If you feel strongly one way or the other, let me know.

sys/kern/kern_shutdown.c
977

Tinderbox — not all kernel configurations specify PRINTF_BUFR_SIZE :-(.

1317–1319

At least REMOVE_ALL shouldn't error; it breaks the 12 ABI to do so (unless we explicitly squash it in 12-compat callers) because dumpon(8) in 12 always does a clear first, even if no dump device is set, before setting a dump device.

sys/netinet/netdump/netdump.h
74–75

This is a shortcut for more verbose size/offset checks of all members. Offset of ndc12_iface vs kda_iface is verified in another file.

I figured as long as I was rototilling the ABI, I might as well unify the configuration structure for netdump and disk dump.

sys/netinet/netdump/netdump_client.c
1310–1312

I believe this explicit bzeroing came in to geom_dev via the EKCD commit(s); I included it here for parity with geom_dev configuration.

  • Rototill dumpon(8) for unified diosckerneldump_arg
  • Add missing ABI compat for DIOCSKERNELDUMP_FREEBSD12 in netdump -- needed by initial 'clear' that 12.x dumpon performs
  • Two minor error number changes in netdump's ioctl interface (use ENOTTY for bad command and ENODEV for !netdump_supported)
  • Fix device-remove in netdump unload and geom orphan to use KDA_REMOVE_DEV

I can't see any problems. I find it odd that we support multiple dump configurations for the same device, especially since that introduces some complexity (KDA_REMOVE vs. KDA_REMOVE_DEV). I'm having trouble imagining a scenario where that would be useful.

sys/geom/geom_dev.c
510

Might as well move this definition into the case statement below?

sys/sys/disk.h
192

We will probably want to permit these to be IPv6 addresses. That can be done as a separate change before 13.0.

This revision is now accepted and ready to land.May 2 2019, 6:39 PM

I can't see any problems. I find it odd that we support multiple dump configurations for the same device, especially since that introduces some complexity (KDA_REMOVE vs. KDA_REMOVE_DEV). I'm having trouble imagining a scenario where that would be useful.

The case I have in mind is falling back to higher (and slower) levels of compression if the dump wouldn't fit at the faster/less compressed level. Or you could imagine falling back to a 2nd netdump server (same /dev/netdump device, same iface) if the first server can't be reached. Maybe these are a reach, but that was the idea.

sys/geom/geom_dev.c
510

Can't — it must be in-scope after leaving that case statement's embedded block due to the fallthrough.

sys/sys/disk.h
192

Hm, it'll be another ABI break too. That's a bit unfortunate; I'd like to avoid a second round of compatibility ABI for older versions of head. Maybe we should just make that change now (without adding ipv6 support to dumpon/netdump, just adding union types to the ABI interface)?

It'd require minor fiddling in NETDUMPGCONF_FREEBSD12, but otherwise should be straightforward, I think.

(I don't see any good named union type for in_addr+in6_addr — outside of ofed's cma_ip_addr — but it's a widely used construct in the tree — nfs, tcp_lro, tcp_fastopen, ip_fw, pf, if_llatbl all seem to have some kind of definition.) There's in_dependaddr but I'm not sure in_addr_4in6 is the most appropriate spelling of in_addr given the way we would use it. I'm leaning towards defining a new type.

In D19996#433713, @cem wrote:

I can't see any problems. I find it odd that we support multiple dump configurations for the same device, especially since that introduces some complexity (KDA_REMOVE vs. KDA_REMOVE_DEV). I'm having trouble imagining a scenario where that would be useful.

The case I have in mind is falling back to higher (and slower) levels of compression if the dump wouldn't fit at the faster/less compressed level. Or you could imagine falling back to a 2nd netdump server (same /dev/netdump device, same iface) if the first server can't be reached. Maybe these are a reach, but that was the idea.

Ok, makes sense.

sys/geom/geom_dev.c
510

Ah, I thought it was permitted to refer to variables outside of their lexical scope so long as you're still in the same function.

sys/sys/disk.h
192

We don't provide compatibility guarantees in head, so we can just break the interface. (Hence "before 13.0.")

I think we can (ab)use struct sockaddr?

cem planned changes to this revision.May 3 2019, 2:16 AM
cem marked an inline comment as done.
cem added inline comments.
sys/geom/geom_dev.c
510

You might be right, but I wasn't confident of that and don't want to dig up the C standard to figure it out. :-)

sys/sys/disk.h
192

Sure, but I'm only adding "12" abi compatibility for slightly older head, not because I actually care about running 12 world (dumpon in particular) on a 13 kernel. The idea is to make the transition less painful for head consumers / crashboxes that regularly up/downgrade kernels, so the single step seems better to me.

Sockaddr isn't big enough and/or doesn't represent the right thing. It's the same size as in6_addr but doesn't logically represent an in6_addr. sockaddr_storage is closer, but huge overkill. sockaddr_in / sockaddr_in6 have a lot of redundant stuff we don't care about for this use, I think. I'll put something together.

cem marked an inline comment as done.

Leave room in ioctl ABI for ipv6 as long as we're rototilling the interface.
Nothing actually supports non-INET netdump yet.

sys/sys/disk.h
192

I don't see what's wrong with sockaddr_storage. It's larger than is needed but that doesn't matter for a rarely executed ioctl, and it's standard to use that in code that is address family-agnostic. And sys/_sockaddr_storage.h will introduce less pollution in sys/disk.h.

sys/sys/disk.h
192

Nothing wrong with sockaddr_storage, really. Mostly it just makes the interface more general in a way that isn't helpful (requires more boilerplate to check for EINVAL input). We can use it instead, if you would prefer.

sys/sys/disk.h
192

I think I'd still prefer it to a hand-rolled structure. Is the extra boilerplate just that we have to check the AF of each field instead of one?

sys/sys/disk.h
192

Yeah. And whatever ambiguity surrounds other fields that matter for sockets but not hosts, eg flowid, port, etc.

sys/sys/disk.h
192

Also that sockaddr_storage is an array type rather than a union and all real access requires ugly casting; and we don't save anything in header includes because we need net/if.h which means we pull in the in/in6 addr types already.

sys/sys/disk.h
192

I've rototilled sockaddr_storage into just dumpon alone (to give it a shot) and it is really really ugly. And there would be a large number of additional churn involving sockaddr_in casts in netdump_client.c to finish the job. I don't think sockaddr_storage is a suitable type for this use case (and it's only used with BSD sockets for legacy API reasons).

So I don't plan to continue converting to sockaddr_storage — I'll just leave it with the address union instead.

This revision is now accepted and ready to land.May 6 2019, 1:56 PM
This revision was automatically updated to reflect the committed changes.