kern/kern_exit.c: some style kern/kern_fork/exit.c: organize includes Remove sys/cdefs.h. Remove sys/param.h. Order the sys/*.h includes alphabetically. kern/kern_fork.c: define the exterror category for fork Convert EINVALs in kern_fork.c into EXTERRORs. sys: add AUE_PDRFORK contrib/openbsm: add AUE_PDRFORK sys: add pdrfork(2) lib/libsys, lib/libc: export pdrfork(2) kern/kern_exit.c: define the exterror category for exit/wait Convert EINVALs in kern_exit.c into EXTERRORs. kern/kern_exit.c: extract some helpers from proc_to_reap() kern/kern_exit.c: extract wait6_check_alive() helper Add pdwait(2) lib/libsys, lib/libc: export pdwait Regen for the fork and exit/wait exterror category addition Regen syscall tables after pdfork(2) and pdwait(2) additions
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Add cap_pdwait_rights.
Make WNOWAIT functional for pdwait(), or rather make defaults to reap the process on pdwait().
Suggested by: asomers
I had to build an aarch64 VM just to test the RFSPAWN change, but it works now. It probably works on amd64 too, but I can't test it without writing assembly code.
As for the pdwait() reaping behavior, I don't have a strong opinion either way. I suggested that pdwait(WEXITED) should reap the process just like wait4() for consistency's sake.
As for the RFPROCDESC behavior, I think that your proposal of using pdrfork(RFPROCESC) to modify the current process is very unlikely to be useful. I would prefer that to return EINVAL. OTOH, anybody who misuses pdrfork(RFPROCDESC) will currently get EBADF, which isn't so wrong. So you're way should be ok.
And I've found another problem. Currently, combining pdrfork() with waitpid(WEXITED) will reap the process, and make the pid available for reuse. I thought otherwise last Thursday only because my testing was incomplete. This limitation makes the Rust feature impossible to implement bug-free, at least until pdwait() is available. Would it be possible to reserve the pid even after the process has been reaped, until the process descriptor is closed? That needn't be part of this review, of course, but I mention it in case that affects the requirements of pdwait().
| sys/kern/kern_fork.c | ||
|---|---|---|
| 222 | How does one even get a process descriptor for the current process? The only way I know of is to use pdfork(), then have the parent send the process descriptor to the child via sendmsg. All that just so the child can modify its own process, using pdrfork instead of rfork? That doesn't seem useful to me. | |
| sys/sys/procdesc.h | ||
| 132 | ||
I planned to work on posix_spawn() after this series is landed. Thank you for the testing.
As for the pdwait() reaping behavior, I don't have a strong opinion either way. I suggested that pdwait(WEXITED) should reap the process just like wait4() for consistency's sake.
I already changed the behavior as you suggested.
As for the RFPROCDESC behavior, I think that your proposal of using pdrfork(RFPROCESC) to modify the current process is very unlikely to be useful. I would prefer that to return EINVAL. OTOH, anybody who misuses pdrfork(RFPROCDESC) will currently get EBADF, which isn't so wrong. So you're way should be ok.
I agree, I dropped this chunk.
And I've found another problem. Currently, combining pdrfork() with waitpid(WEXITED) will reap the process, and make the pid available for reuse. I thought otherwise last Thursday only because my testing was incomplete. This limitation makes the Rust feature impossible to implement bug-free, at least until pdwait() is available. Would it be possible to reserve the pid even after the process has been reaped, until the process descriptor is closed? That needn't be part of this review, of course, but I mention it in case that affects the requirements of pdwait().
So right now pdwait() reaps and this frees the pid. I wrote (untested) change to only free pid on free of procdesc (i.e. the last close, after reap).
| sys/kern/kern_fork.c | ||
|---|---|---|
| 222 | Ok, I dropped this stuff. | |
Fix pdwait() userspace prototype.
Drop not useful operations on curproc procdesc.
Postpone freeing the zombie' pid until procdesc is freed.
I'm afraid that with this latest patch, I get an instapanic on boot in pid 1.
panic: bit 1442 not set in 0 cpuid = 0 time = 1768845658 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d8008920 vpanic() at vpanic+0x136/frame 0xfffffe00d8008a50 panic() at panic+0x43/frame 0xfffffe00d8008ab0 proc_id_clear() at proc_id_clear+0xa6/frame 0xfffffe00d8008ae0 proc_reap() at proc_reap+0x445/frame 0xfffffe00d8008b20 proc_to_reap() at proc_to_reap+0x363/frame 0xfffffe00d8008b70 kern_wait6() at kern_wait6+0x1df/frame 0xfffffe00d8008c10 sys_wait4() at sys_wait4+0x6b/frame 0xfffffe00d8008e00 amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe00d8008f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d8008f30 --- syscall (7, FreeBSD ELF64, wait4), rip = 0x2d804a, rsp = 0x820548518, rbp = 0x820548580 --- KDB: enter: panic
| sys/bsm/audit_kevents.h | ||
|---|---|---|
| 667 | This conflicts with AUE_SPECIALFD . | |
Renumber AUE_PDRFORK.
Drop contrib/openbsm patch, similar to how other new AUE_ constants are not imported there.
Do not release procdesc->pd_pid if the descriptor was already detached from the struct proc. In this case, the pid is freed by reaping.
This looks great, @kib! If you commit it, I'll follow up with the test suite. A few questions:
- Are you planning to write the man page yourself, or would you like help with that?
- Even if you don't MFC the new syscalls, would it be possible to MFC the pid recycling change?
I prototyped the initial version of the docs. Feel free to suggest improvements.
- Even if you don't MFC the new syscalls, would it be possible to MFC the pid recycling change?
I plan to MFC everything. It depends on how the live testing in HEAD goes, of course.
Thanks for writing the man page too. I've made a few comments, mostly to make the English more grammatical and idiomatic.
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 33 | Don't forget to adjust .Dd too | |
| 104 | More idiomatic English. | |
| 112–114 | number agreement | |
| 117 | ||
| 118 | ||
| 136–137 | ||
| 140 | ||
| 179–180 | ||
| 239 | Or perhaps sooner, if MFCd? I'll set a reminder for myself to update the man page before 15.1, if you've MFCd by then. | |
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 80 | ||
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 102 | Sure, except you're missing a space in "betweenthe". | |
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 239 | Yes, it is only a safest estimation. | |
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 182–183 | Missing macro dots | |
| sys/kern/kern_exit.c | ||
|---|---|---|
| 1322 | Why "%jx"? options is an int. | |
| 1452 | Is it formally correct to dereference p here? AFAICS there is nothing preventing it from being recycled after the proc lock is dropped. | |
| 1559 | Why do we check p_state == PRS_ZOMBIE twice? I do not see how it can change: if wait6_check_alive() returns false, the proc lock will not have been dropped. | |
| 1573 | I think this path is missing fdrop()? | |
| sys/kern/kern_exit.c | ||
|---|---|---|
| 1322 | The format is utilized by usermode, which always casts the arguments to uintmax_t: we do not know the real types of the values passed by the kernel, we only know that they are integer. I noted it in the exterror(9) man page that the 'j' modifier must always be applied. | |
| 1452 | Formally yes because struct proc are type-stable. Of course there is a possible race with the 'p' reuse, whatever unlikely it is. I changed this to use the local variable pid that already caches the pid, same as in line 1432. | |
| 1559 | Yes, I think this if(){} block can be removed. This is a remnant of earlier, different, structure of the pdwait() loop. | |
Avoid racy access to the p_pid member of possibly reclaimed struct proc.
Other review' comments are handled.
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 115 | Either the "the" should be removed, or you should write "The .Fn pdrfork system call". | |
| 136 | ||
| sys/compat/freebsd32/freebsd32_misc.c | ||
| 303 | BTW, I suspect freebsd32_rusage_out() should do some zeroing. struct timeval32 will have a hole on non-amd64 platforms, so we are copying out uninitialized memory there. | |
| sys/kern/kern_exit.c | ||
| 1452 | Right, I was referring to the race. | |
| sys/kern/subr_capability.c | ||
| 93 | rights.4 should be updated to mention the new CAP_PDWAIT. | |
| sys/kern/sys_procdesc.c | ||
| 317 | Do we need to wake up the kern_pdwait() loop here too? I guess not, since the pdwait() caller must be holding a reference on the procdesc, so PDF_CLOSED cannot be set? | |
| lib/libsys/pdfork.2 | ||
|---|---|---|
| 136 | It is function as well, and might become a wrapper some day. | |
| sys/kern/sys_procdesc.c | ||
| 317 | If PDF_CLOSED set, then indeed no process might have the file installed into the descriptors table. | |
Assert that PDF_CLOSED is not set for fd operated by pdwait().
Make pdwait() cancellable.