Page MenuHomeFreeBSD

pddupfd(2)
Needs ReviewPublic

Authored by kib on Fri, May 22, 4:07 AM.
Tags
None
Referenced Files
F159511036: D57163.id179225.diff
Mon, Jun 15, 3:03 AM
F159498827: D57163.diff
Sun, Jun 14, 11:05 PM
Unknown Object (File)
Sun, Jun 14, 12:16 PM
Unknown Object (File)
Sun, Jun 14, 6:42 AM
Unknown Object (File)
Sun, Jun 14, 6:25 AM
Unknown Object (File)
Sun, Jun 14, 12:23 AM
Unknown Object (File)
Sat, Jun 13, 7:32 PM
Unknown Object (File)
Sat, Jun 13, 4:50 AM

Details

Reviewers
markj
brooks
Summary

This is inspired by Linux' pidfd_getfd(), and should put our procdesc facility mostly on par with Linux.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Fri, May 22, 4:07 AM
kib added a parent revision: D57124: sys: add pdopenpid(2).

The test program

/* $Id: pddupfd.c,v 1.2 2026/05/22 03:59:31 kostik Exp kostik $ */

#include <sys/types.h>
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <sys/procdesc.h>
#include <sys/resource.h>
#include <sys/wait.h>
#include <err.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MY_SYS_pdopenpid	603
#define MY_SYS_pddupfd		604

static int
my_pdopenpid(pid_t pid, int flags)
{
	return (syscall(MY_SYS_pdopenpid, pid, flags));
}

static int
my_pddupfd(pid_t pid, int remote_fd, int flags)
{
	return (syscall(MY_SYS_pddupfd, pid, remote_fd, flags));
}

static void
check(int fd, int child, int *comm)
{
	struct __wrusage wu;
	struct __siginfo si;
	int error, f, pp, status, workfd;
	char cmd[128];

	error = pdgetpid(fd, &pp);
	if (error == -1)
		err(1, "pgdetpid");
	printf("child pid %d %d\n", child, pp);

	while (*comm == -1)
		;
	workfd = my_pddupfd(fd, *comm, 0);
	if (workfd == -1)
		err(1, "pddupfd");

	f = fcntl(workfd, F_GETFD);
	if (f == -1)
		err(1, "F_GETFD %d", workfd);
	f &= ~FD_CLOEXEC;
	error = fcntl(workfd, F_SETFD, f);
	if (error == -1)
		err(1, "F_SETFD %d", workfd);

	snprintf(cmd, sizeof(cmd), "cat 0<&%d", workfd);
	system(cmd);
	close(workfd);

	*comm = 0;
	
	error = pdwait(fd, &status, WEXITED, &wu, &si);
	if (error == -1)
		err(1, "pdwait");
	printf("exited pid %d status %#x si.si_status %#x si.si_code %#x\n",
	    pp, status, si.si_status, si.si_code);
}

int
main(void)
{
	int *comm;
	pid_t child;
	int fd, workfd;

	comm = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED |
	    MAP_ANONYMOUS, -1, 0);
	if (comm == MAP_FAILED)
		err(1, "mmap");
	*comm = -1;

	child = fork();
	if (child == -1)
		err(1, "fork");
	if (child == 0) {
		workfd = open("/etc/passwd", O_RDONLY);
		if (workfd == -1)
			err(1, "open");
		*comm = workfd;
		while (*comm != 0)
			sleep(1);
		_exit(0x123);
	}

	fd = my_pdopenpid(child, 0);
	if (fd == -1)
		err(1, "pdopenpid");

	*comm = 1;
	check(fd, child, comm);
	close(fd);
}

Preseve capabilities and RESOLVE_BENEATH of the source fd.

I cannot apply the patch, it fails in pdfork.2. (I already applied the pdopenpid() patch.) Could you please rebase? Or, if your branch is already public, could you please include a pointer to it in the review description so that it's easier for me to fetch? That makes it easier for me to e.g., write syzkaller descriptions for new syscalls.

I cannot apply the patch, it fails in pdfork.2. (I already applied the pdopenpid() patch.) Could you please rebase? Or, if your branch is already public, could you please include a pointer to it in the review description so that it's easier for me to fetch? That makes it easier for me to e.g., write syzkaller descriptions for new syscalls.

I had to apply ACL to my gitweb server due to 'AI'. If you provide me with the list of IP addresses to enable, I will open for them.

I pushed the code to github.com/kostikbel/freebsd-src branch pdrfork.

kib edited subscribers, added: arrowd; removed: gleb.Tue, May 26, 4:41 PM

Re-check with p_candebug() that credentials of the target are still fine.
Block execve() around the check and fget_remote() call with execve_block(), which prevents suid exec under us.

sys/kern/kern_descrip.c
3186

fd_flags and fcaps may be NULL here, with p != td->td_proc, e.g., via kern_proc_kqueues_out1().

sys/kern/kern_exec.c
470
471

execve_unblock() only wakes us up if P_INEXEC_WAIT is set, so won't we sleep here forever?

sys/kern/sys_procdesc.c
723

Some comment should explain why we block execve, it is not very obvious.

743

We need to do a deep copy of the ioctl list in the fcaps structure. It's private to the FDE.

750

If pd->pd_proc == NULL, we still return 0.

kib marked 6 inline comments as done.

Fix execve blocking.
Add comment around blocking call.
Copy ioctl list when getting caps/free it on error.
Check fd_flags and fcaps for NULL before assigning.
Return ESRCH if procdesc already disassociated from the process.

sys/kern/kern_exec.c
389

Rather than adding this mechanism, can we re-use the existing p_lock? That is, make execve() block if the process is held.

I think there are other places where it is important to avoid racing with execve of a setuid image.

kib marked an inline comment as done.Thu, Jun 4, 6:10 PM
kib added inline comments.
sys/kern/kern_exec.c
389

I considered it, but PHOLD() cannot be used this way. I need to block the thread entering execve() when execblock is taken. This means that the thread must sleep, and it is sometimes unsafe to drop and reacquire the process mutex in places where p_lock is incremented.

sys/kern/kern_exec.c
389

I think it could be done like this:

  • do_execve() blocks if p_lock > 0
  • PHOLD works as it does before, no need to block
  • code which cares about stability of P_INEXEC (e.g., most if not all p_candebug() callers) can check it and decide what to do
  • provide a execve_block() function for those consumers who wish to wait

IMO it is cleanest if we can guarantee that pget(pid, PGET_HOLD | PGET_CANDEBUG) will block execve until the hold is released. Note that p_candebug() fails if P_INEXEC set.

I have a WIP patch along these lines that I'm working on for other reasons, I will share soon.