Page MenuHomeFreeBSD

fork: Disallow sharing of fd tables with non-passable descriptors
Needs ReviewPublic

Authored by markj on Fri, Jul 29, 4:03 PM.

Details

Reviewers
mjg
kib
Summary

We have a file type flag, DFLAG_PASSABLE, which when clear indicates
that the file data is somehow tied to the owning process. For example,
knotes for a EVFILT_TIMER event have a notion of the owning process.

Descriptors for such files cannot be passed over a unix socket and are
not inherited by the child of a fork(). However, rfork() lets the child
share the fd table with the parent, and we do not check whether this is
safe.

Modify rfork() to fail if RFFDG is clear and the parent's table contains
a non-passable fd.

Reported by: syzbot+81270844d2029cf6ff5e@syzkaller.appspotmail.com

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 46679
Build 43568: arc lint + arc unit

Event Timeline

markj requested review of this revision.Fri, Jul 29, 4:03 PM

Arguably, this breaks Tor Egge' linuxthreads library. IMO it should not disallow sharing if address space is shared.

Also, if the requested operation 'create new process but share file descriptor table' cannot be performed, I would expect the syscall to return error, instead of changing the semantic.

BTW, I do not quite understand the argument that kqueue filedescriptor cannot be shared. It cannot be shared iff filedescriptor table is not shared, because kqueue uses fd instead of file. So shared filedesc table means that using kqueues from such children is fine.

In D35987#817110, @kib wrote:

Arguably, this breaks Tor Egge' linuxthreads library.

If the consumer has kqueues open and creates threads without specifying CLONE_FILES, yes.

IMO it should not disallow sharing if address space is shared.

That would not fix the problem. Some knote types expect to be owned by a unique process.

Also, if the requested operation 'create new process but share file descriptor table' cannot be performed, I would expect the syscall to return error, instead of changing the semantic.

I don't follow, the change does exactly as you say: if the rfork() tries to share the fd table, and the table contains a non-passable descriptor, rfork() returns EOPNOTSUPP.

In D35987#817113, @kib wrote:

BTW, I do not quite understand the argument that kqueue filedescriptor cannot be shared. It cannot be shared iff filedescriptor table is not shared, because kqueue uses fd instead of file. So shared filedesc table means that using kqueues from such children is fine.

A shared fd table isn't sufficient, at least not today. For example, EVFILT_PROC and EVFILT_SIGNAL place knotes on a per-process list (p_klist). How can it work if multiple procs are able to refer to the knote?

A less obvious example is the disarm-on-SIGSTOP behaviour, where EVFILT_TIMER knotes are placed on a per-process list when the owning process receives SIGSTOP.

sys/kern/kern_descrip.c
2185

does this really need to be here? i don't know what's not passable modulo kqueue, but the few cases like that could have a dedicated install routine which checks for the above. then this one could only assert that the flag is set.

In D35987#817110, @kib wrote:

Arguably, this breaks Tor Egge' linuxthreads library.

If the consumer has kqueues open and creates threads without specifying CLONE_FILES, yes.

I think then we could declare that linuxthreads is broken.

IMO it should not disallow sharing if address space is shared.

That would not fix the problem. Some knote types expect to be owned by a unique process.

Also, if the requested operation 'create new process but share file descriptor table' cannot be performed, I would expect the syscall to return error, instead of changing the semantic.

I don't follow, the change does exactly as you say: if the rfork() tries to share the fd table, and the table contains a non-passable descriptor, rfork() returns EOPNOTSUPP.

Then I misread the patch, sorry.

In D35987#817113, @kib wrote:

BTW, I do not quite understand the argument that kqueue filedescriptor cannot be shared. It cannot be shared iff filedescriptor table is not shared, because kqueue uses fd instead of file. So shared filedesc table means that using kqueues from such children is fine.

A shared fd table isn't sufficient, at least not today. For example, EVFILT_PROC and EVFILT_SIGNAL place knotes on a per-process list (p_klist). How can it work if multiple procs are able to refer to the knote?

A less obvious example is the disarm-on-SIGSTOP behaviour, where EVFILT_TIMER knotes are placed on a per-process list when the owning process receives SIGSTOP.

So can we do more nuanced fix? For instance, adding unsafe knote type to a kqueue would increment some new 'unsafe counter' in the kqueue, and we can compare the counter against 0 for vetting. I believe PROC/TIMER/SIGNAL were rarely used in time of FreeBSD 5.x-7.x, and not quite often used now.

BTW, I do not think we have any other non-passable file descriptor types except kqueue?

It's really hard to grep for. I think it would be prudent to add DFLAG_NOTPASSABLE and require fileops to have one of the 2 set. I can write it later. Judging by git log it is only kqueue and badfileops.

In D35987#817172, @kib wrote:
In D35987#817110, @kib wrote:

Arguably, this breaks Tor Egge' linuxthreads library.

If the consumer has kqueues open and creates threads without specifying CLONE_FILES, yes.

I think then we could declare that linuxthreads is broken.

It was removed from the ports tree 9 years ago (marked broken for a reason I can't easily see) and apparently only ever supported i386. So this does not seem like a new development. Is it certain that an application using linuxthreads will run unmodified today anyway? Maybe a compatibility knob is sufficient.

BTW, do you have a copy of the linuxthread sources? I cannot easily find it, I only see the FreeBSD patch from the ports tree history.

In D35987#817113, @kib wrote:

BTW, I do not quite understand the argument that kqueue filedescriptor cannot be shared. It cannot be shared iff filedescriptor table is not shared, because kqueue uses fd instead of file. So shared filedesc table means that using kqueues from such children is fine.

A shared fd table isn't sufficient, at least not today. For example, EVFILT_PROC and EVFILT_SIGNAL place knotes on a per-process list (p_klist). How can it work if multiple procs are able to refer to the knote?

A less obvious example is the disarm-on-SIGSTOP behaviour, where EVFILT_TIMER knotes are placed on a per-process list when the owning process receives SIGSTOP.

So can we do more nuanced fix? For instance, adding unsafe knote type to a kqueue would increment some new 'unsafe counter' in the kqueue, and we can compare the counter against 0 for vetting. I believe PROC/TIMER/SIGNAL were rarely used in time of FreeBSD 5.x-7.x, and not quite often used now.

Well, kqueue would also need to check fd_refcnt == 1 before adding a new PROC/TIMER/SIGNAL knote to the queue. It seems like non-trivial complexity.

BTW, I do not think we have any other non-passable file descriptor types except kqueue?

Indeed, I do not think so either. I looked through all the fileops definitions in the tree.

I believe linuxthreads was compiled straight from the glibc nptl sources, with Tor' patches applied. I did not rechecked this.

Could you, please, show the problems that appear from kqueue having the knowledge of the owning process? For instance, for EVFILT_TIMER, callback knows what is the owning process, so I do not quite understand what is breaking exactly.

(Sorry for the late reply)