This is inspired by Linux' pidfd_getfd(), and should put our procdesc facility mostly on par with Linux.
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
The test program
/* $Id: pddupfd.c,v 1.2 2026/05/22 03:59:31 kostik Exp kostik $ */
#include <sys/types.h>
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <sys/procdesc.h>
#include <sys/resource.h>
#include <sys/wait.h>
#include <err.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MY_SYS_pdopenpid 603
#define MY_SYS_pddupfd 604
static int
my_pdopenpid(pid_t pid, int flags)
{
return (syscall(MY_SYS_pdopenpid, pid, flags));
}
static int
my_pddupfd(pid_t pid, int remote_fd, int flags)
{
return (syscall(MY_SYS_pddupfd, pid, remote_fd, flags));
}
static void
check(int fd, int child, int *comm)
{
struct __wrusage wu;
struct __siginfo si;
int error, f, pp, status, workfd;
char cmd[128];
error = pdgetpid(fd, &pp);
if (error == -1)
err(1, "pgdetpid");
printf("child pid %d %d\n", child, pp);
while (*comm == -1)
;
workfd = my_pddupfd(fd, *comm, 0);
if (workfd == -1)
err(1, "pddupfd");
f = fcntl(workfd, F_GETFD);
if (f == -1)
err(1, "F_GETFD %d", workfd);
f &= ~FD_CLOEXEC;
error = fcntl(workfd, F_SETFD, f);
if (error == -1)
err(1, "F_SETFD %d", workfd);
snprintf(cmd, sizeof(cmd), "cat 0<&%d", workfd);
system(cmd);
close(workfd);
*comm = 0;
error = pdwait(fd, &status, WEXITED, &wu, &si);
if (error == -1)
err(1, "pdwait");
printf("exited pid %d status %#x si.si_status %#x si.si_code %#x\n",
pp, status, si.si_status, si.si_code);
}
int
main(void)
{
int *comm;
pid_t child;
int fd, workfd;
comm = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED |
MAP_ANONYMOUS, -1, 0);
if (comm == MAP_FAILED)
err(1, "mmap");
*comm = -1;
child = fork();
if (child == -1)
err(1, "fork");
if (child == 0) {
workfd = open("/etc/passwd", O_RDONLY);
if (workfd == -1)
err(1, "open");
*comm = workfd;
while (*comm != 0)
sleep(1);
_exit(0x123);
}
fd = my_pdopenpid(child, 0);
if (fd == -1)
err(1, "pdopenpid");
*comm = 1;
check(fd, child, comm);
close(fd);
}I cannot apply the patch, it fails in pdfork.2. (I already applied the pdopenpid() patch.) Could you please rebase? Or, if your branch is already public, could you please include a pointer to it in the review description so that it's easier for me to fetch? That makes it easier for me to e.g., write syzkaller descriptions for new syscalls.
I had to apply ACL to my gitweb server due to 'AI'. If you provide me with the list of IP addresses to enable, I will open for them.
I pushed the code to github.com/kostikbel/freebsd-src branch pdrfork.
Re-check with p_candebug() that credentials of the target are still fine.
Block execve() around the check and fget_remote() call with execve_block(), which prevents suid exec under us.
| sys/kern/kern_descrip.c | ||
|---|---|---|
| 3186 | fd_flags and fcaps may be NULL here, with p != td->td_proc, e.g., via kern_proc_kqueues_out1(). | |
| sys/kern/kern_exec.c | ||
| 470 | ||
| 471 | execve_unblock() only wakes us up if P_INEXEC_WAIT is set, so won't we sleep here forever? | |
| sys/kern/sys_procdesc.c | ||
| 723 | Some comment should explain why we block execve, it is not very obvious. | |
| 743 | We need to do a deep copy of the ioctl list in the fcaps structure. It's private to the FDE. | |
| 750 | If pd->pd_proc == NULL, we still return 0. | |
Fix execve blocking.
Add comment around blocking call.
Copy ioctl list when getting caps/free it on error.
Check fd_flags and fcaps for NULL before assigning.
Return ESRCH if procdesc already disassociated from the process.
| sys/kern/kern_exec.c | ||
|---|---|---|
| 389 | Rather than adding this mechanism, can we re-use the existing p_lock? That is, make execve() block if the process is held. I think there are other places where it is important to avoid racing with execve of a setuid image. | |
| sys/kern/kern_exec.c | ||
|---|---|---|
| 389 | I considered it, but PHOLD() cannot be used this way. I need to block the thread entering execve() when execblock is taken. This means that the thread must sleep, and it is sometimes unsafe to drop and reacquire the process mutex in places where p_lock is incremented. | |
| sys/kern/kern_exec.c | ||
|---|---|---|
| 389 | I think it could be done like this:
IMO it is cleanest if we can guarantee that pget(pid, PGET_HOLD | PGET_CANDEBUG) will block execve until the hold is released. Note that p_candebug() fails if P_INEXEC set. I have a WIP patch along these lines that I'm working on for other reasons, I will share soon. | |