Page MenuHomeFreeBSD

kern: abstract away the vnode coredumper to allow pluggable dumpers
ClosedPublic

Authored by kevans on Jul 16 2025, 4:50 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 13, 6:06 AM
Unknown Object (File)
Sat, Oct 11, 3:05 PM
Unknown Object (File)
Sat, Oct 11, 3:05 PM
Unknown Object (File)
Sat, Oct 11, 3:05 PM
Unknown Object (File)
Sat, Oct 11, 3:04 PM
Unknown Object (File)
Sat, Oct 11, 3:04 PM
Unknown Object (File)
Sat, Oct 11, 3:04 PM
Unknown Object (File)
Sat, Oct 11, 6:38 AM
Subscribers

Details

Summary

The default and only stock coredumper will continue to be the traditional
vnode dumper, which will dump to a vnode and issue a devctl notification.
With this change, one can write a kmod that injects custom handling of user
coredumps that offers richer behavior, particularly in case one wants to
add more metadata than we can tap out via devd.

The main motivation here is to pave the way for my usercore daemon to be
able to reroute coredumps before they ever touch the disk. In some cases
they may be discarded and we can avoid the overhead of writing anything, in
others they allow us to capture coredumps that would be written into an area
that's transient in nature (e.g., kyua test work directories) without having
to do more tricks to keep those alive. My WIP kmod writes the coredump into
a shmfd instead of a vnode, then installs that into ucored(8) with every
read(2) of /dev/ucore. This also allows me to capture more metadata
reliably before the process and jail disappear.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The practical application that uses this stack is here: https://git.kevans.dev/kevans/ucored/src/branch/wip/kmod/kmod/coredump_ucored.c (quite WIP)

  • kern_shm_open2 is used with a shmfd that I prepopulated with a coredump to pass the shmfd back to my ucored(8) read-only
  • coredump_ucored is registered as a new coredumper @ COREDUMPER_SPECIAL; when loaded, it'll pass cores back out to /dev/ucore for ucored(8) to snatch up and work with
sys/sys/exec.h
122 ↗(On Diff #158587)

This is probably worth a COREDUMP_HANDLER(9) if the idea isn't rejected

I would move the vnode dumper into its own source file. (Wanted to do it for long time)

sys/kern/kern_sig.c
4173 ↗(On Diff #158587)
4290 ↗(On Diff #158587)

Do you need a way to drain the activities of the coredumpers?

4324 ↗(On Diff #158587)

Use the opportunity to add EXTERRORs? This EFAULT is not quite EFAULT.

So kern.corefile gets ignored if we use a non-default dumper? I guess that's not too surprising--core.5 would be a good place to document quirks like that.

sys/kern/kern_sig.c
106 ↗(On Diff #158587)
4324 ↗(On Diff #158587)

How would an exterror be consumed at this point?

4371 ↗(On Diff #158587)

Do you need to acquire some reference on chosen to ensure it doesn't get freed while a process is dumping?

sys/kern/kern_sig.c
4371 ↗(On Diff #158587)

This kind of goes with kib's comment above about draining, and something I forgot to think about when trying to un-PoC this.

I guess struct coredumper should probably grow a cd_flags / CDF_QUIESCE / coredumper_quiesce() so that we stop handing it requests prior to deregistration (presumably in a MOD_UNLOAD handler), and perhaps a refcount(9) to track and wait on in-flight coredumps

kevans retitled this revision from kern: further abstract away the vnode coredumper to allow pluggable dumpers to kern: abstract away the vnode coredumper to allow pluggable dumpers.Jul 16 2025, 3:55 PM
sys/kern/kern_sig.c
4371 ↗(On Diff #158587)

In lieu of an explicit quiesce state, I think you can just unlink the dumper from the list and use a refcount to wait for dumping threads to drain. There is a small undocumented interface in blockcount.h that maybe makes this a bit neater.

sys/kern/kern_sig.c
4371 ↗(On Diff #158587)

Good point, I've sprinkled some blockcount on it locally and that feels better, though it does have me thinking more about how I have this currently setup. coredumper_unregister() is currently done via SYSUNINIT, so I'd need one of:

  • Another SYSUNINIT to further free resources,
  • A callback for the dumper to free up, or
  • Drop the SYSINIT/SYSUNINIT entirely and make the module handle registration in MOD_LOAD/MOD_UNLOAD

I don't really like option #1. I'm not sure I see a clear winner between the other two, though clearly just pushing that off to the module would be simpler.

kevans marked 5 inline comments as done.

Address some review feedback

Refcount dumps being processed by a handler and block at unregister time until
the dumper has finished. Add a manpage documenting all of this, and also point
out how one would start to write a custom coredump writer.

sys/kern/kern_ucoredump.c
259

In general, what is the cd_probe method supposed to do?

sys/kern/kern_ucoredump.c
259

It offers some mechanism to filter out the kind of coredumps the dumper will handle, so that one can decline to deal with one and we don't lose the dump entirely as it falls back to some other dumper (or the standard vnode dumper). I'm not strongly tied to the motion, at one point I thought I'd want my custom dumper to decline coredumps if nothing had opened its /dev/ucore or if we enqueued too many dumps without a reader to drain them.

kib added inline comments.
sys/kern/kern_ucoredump.c
259

Ok.

I asked because the probe is not sleepable, and I wonder what kind of choice can it do.

Eventually, I suspect we might want a handler that allows for pseudo-live debugging, where e.g. it would spawn debugging server which attaches to the process in coma. Not right now.

This revision is now accepted and ready to land.Jul 18 2025, 5:17 AM
sys/kern/kern_ucoredump.c
259

My initial thinking of thread/proc-based filtering (not state of the module as above) was that one could write interesting policies around any of p_sig/p_comm/jail/uid

share/man/man5/core.5
46
47
51

If I'm reading this correctly, it's not "dumped" until after the write is done.

98

Not asking you to change anything, but: this is less true after commit f31695cc64e2328028b7432a2a6bdcd088909b2a. A further optimization would be to avoid visiting each page in a segment if the VM object for the mapping has no backing object. Then we can skip over non-resident ranges of VA in logn time rather than calling vm_fault() on each page in core_output().

This would be useful for programs which map very large VA ranges. I think chromium does this, and snmalloc also maps 512GB by default. Dumping core from such a program is slow, even with the commit above.

share/man/man9/coredumper_register.9
13

Maybe drop "kernel"? It's a bit confusing otherwise.

69
75

This sentence doesn't really make sense to me.

76
90
91
kevans marked 9 inline comments as done.

Address review feedback -- manpage fixes

This revision now requires review to proceed.Jul 18 2025, 2:57 PM

The code change looks ok to me, my comments are about the docs.

share/man/man5/core.5
51

Thinking about this a bit more, this kind of detail is probably not helpful to the "average" reader. More specifically, we shouldn't assume that a user knows what a vnode is. The doc mentions that multiple coredumpers (what's a coredumper?) can be loaded (how?) but doesn't say why they're useful.

I understand that you have a feature planned which will make this make more sense, but for now I'd just avoid mentioning coredumpers at all. This can be fleshed out once there's more user-visible functionality available.

I think moving the existing text around is ok.

share/man/man9/coredumper_register.9
67

I'm not sure what "halt processing" means. Is this referring to the fact that unregistration can sleep?

This revision is now accepted and ready to land.Jul 18 2025, 4:34 PM

AIUC? We should xref core.5 <> savecore.8.

kevans marked an inline comment as not done.Jul 18 2025, 4:47 PM
kevans added inline comments.
share/man/man5/core.5
51

Right, good point.

51

Hmm, yeah, that makes sense; I'll revert this back to just regrouping the core naming stuff together.

98

Noted, I might do a follow-up editorial pass on core(5) later- I think I'd also like to rewrite the bits describing compression somehow to better draw the values of kern.compress_user_cores back to the kernel options needed to enable them, but I don't really have a vision for how to better describe that at the moment.

share/man/man9/coredumper_register.9
67

Basically, yes- what I was wanting to capture here is that the coredumper should be aware of the fact that it's about to unregister and depending on the implementation, may want to be sure that any in-progress dumps happening concurrently will finish up to avoid blocking module unload for too long (or perhaps have its own method to quiesce dumps and abdicate from handling more via its cd_probe).

75

Whoops; missed a word and adding "level greater than 0" to the end.

AIUC? We should xref core.5 <> savecore.8.

No, savecore is for kernel dumps, core.5 describes user process core dumps. Yes, it is confusing.

Revert most of the changes to core(5), except reorganizing to keep the filename
parts together; this is useful for future possible changes to pin these
paragraphs to a specific dumper.

Remove the clause about halting processing altogether in coredumper_register(9).
We already highlight that unregistration will block, that seems sufficient.

This revision now requires review to proceed.Jul 18 2025, 7:02 PM

I'm going to tentatively plan to commit this somewhere around the middle of next week without further objection, since the last rounds of commentary were on manpages- I'm just beating on my /dev/ucore implementation a bit more to be sure this is exactly the shape I need

This revision is now accepted and ready to land.Jul 21 2025, 1:50 PM