Page MenuHomeFreeBSD

close_range: add CLOSE_RANGE_CLOEXEC
AbandonedPublic

Authored by christian.brauner_ubuntu.com on Dec 17 2020, 3:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 27, 6:35 AM
Unknown Object (File)
Sat, Apr 27, 6:34 AM
Unknown Object (File)
Sat, Apr 27, 4:20 AM
Unknown Object (File)
Dec 20 2023, 3:44 AM
Unknown Object (File)
Sep 30 2023, 7:09 PM
Unknown Object (File)
Sep 21 2023, 10:13 PM
Unknown Object (File)
Aug 17 2023, 2:47 PM
Unknown Object (File)
Aug 16 2023, 4:46 AM
Subscribers

Details

Reviewers
kevans
kib
mjg
Summary

Starting with Linux kernel v5.10 we have added support to mark a range of file descriptors as close-on-exec without actually closing them. This is useful in general and has been on the wishlist of some people for quite some time.
I'd like to keep the close_range() implementation on FreeBSD and Linux as close together as possible so the name of the new flag CLOSE_RANGE_CLOEXEC and it's value (1U << 2) is identical to that used on Linux.

Test Plan

I probably need some initial guidance on how to add tests where.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

christian.brauner_ubuntu.com added inline comments.
sys/sys/unistd.h
164

I'm not sure where this flag should best be defined. :)
On Linux I've added a separate close_range.h header.
Please tell me what you would prefer!

sys/sys/unistd.h
164

Somehow me adding this comment has caused the revision to be abandoned which is a bit surprising. Just note that I didn't intend this. :)

The test situation here is kinda ~meh; we have a blend of ATF/TAP tests. This in particular should be added to the closefrom TAP test @ ^/tests/sys/file/closefrom_test.c.

sys/kern/kern_descrip.c
1361

It's tempting to say we should go ahead and just inline the above kern_cloexec() into kern_close_range; this SLOCK and the paired SUNLOCK upon exit can be x{,un}lock instead if (flags & CLOSE_RANGE_CLOEXEC) != 0, then the loop just needs minor restructuring: fdp->fd_ofiles[fd] is already the filedescent you need to operate on and you already have the xlock for it to boot.

1410

Style nit: this and the above conditional in kern_close_range() should be explicit comparisons against 0

sys/sys/unistd.h
164

Odd!

This seems to be the ideal spot for the definition to go.

sys/kern/kern_descrip.c
1361

Sure can do.

1410

Ah yes, I need to remember that going forward. :)

sys/kern/kern_descrip.c
1342

kern_* namespace is used for syscall helpers, namely to the functions that implement syscalls after ABI details are handled. But I am not sure why do you need this function at all, I agree with kevans.

sys/sys/unistd.h
164

It is weird to place CLOSE_RANGE_CLOEXEC define in the middle of _PC_ series.

I would place it right before #endif /* __BSD_VISIBLE */ line at the end of the file.

sys/kern/kern_descrip.c
1342

Will do!

sys/sys/unistd.h
164

Noted.

The current close range code needs some cleanup regardless of this change. I think it would be most time efficient if exact semantics got documented and then someone from FreeBSD side implemented it (I'm happy to do it). Unless you really want to spend time on this, in which case I'm happy to review.

In D27654#618060, @mjg wrote:

The current close range code needs some cleanup regardless of this change. I think it would be most time efficient if exact semantics got documented and then someone from FreeBSD side implemented it (I'm happy to do it). Unless you really want to spend time on this, in which case I'm happy to review.

I'm happy to do it. I'm likely to contribute patches in the future so it'd be pretty helpful to get some experience writing patches for FreeBSD.

In D27654#618060, @mjg wrote:

The current close range code needs some cleanup regardless of this change. I think it would be most time efficient if exact semantics got documented and then someone from FreeBSD side implemented it (I'm happy to do it). Unless you really want to spend time on this, in which case I'm happy to review.

I'm happy to do it. I'm likely to contribute patches in the future so it'd be pretty helpful to get some experience writing patches for FreeBSD.

Glad to hear it. In that case I'll comment on the patch.

The current close_range suffers from a lot of avoidable filedesc relocking, I have a patch to fix it. Even then actual close drops the filedesc lock, so some degree of relock dance will still be there.

In contrast the feature you are implementing here does not need any of it. It can just take the filedesc lock upfront and have a for loop over the content. You can branch on it after the range got determined. iow in my opinion this just have a for loop doing fdeget_locked. I don't know the Linux code, it may be divorcing the 2 loops is very different from what was done there, but will very likely be the easiest way forward including any other future ops.

The cleanup I mentioned is in as of r368732.

In D27654#618173, @mjg wrote:

The cleanup I mentioned is in as of r368732.

Nice. I'll take a look early next week.

In D27654#618081, @mjg wrote:
In D27654#618060, @mjg wrote:

The current close range code needs some cleanup regardless of this change. I think it would be most time efficient if exact semantics got documented and then someone from FreeBSD side implemented it (I'm happy to do it). Unless you really want to spend time on this, in which case I'm happy to review.

I'm happy to do it. I'm likely to contribute patches in the future so it'd be pretty helpful to get some experience writing patches for FreeBSD.

Glad to hear it. In that case I'll comment on the patch.

The current close_range suffers from a lot of avoidable filedesc relocking, I have a patch to fix it. Even then actual close drops the filedesc lock, so some degree of relock dance will still be there.

In contrast the feature you are implementing here does not need any of it. It can just take the filedesc lock upfront and have a for loop over the content. You can branch on it after the range got determined. iow in my opinion this just have a for loop doing fdeget_locked. I don't know the Linux code, it may be divorcing the 2 loops is very different from what was done there, but will very likely be the easiest way forward including any other future ops.

I'll start to work on this early next week. Thanks for the feedback.

FWIW, I think I found the first candidate we should smack with this flag:

49858 tshark   2.915694 CALL  fcntl(0x3526,F_SETFD,FD_CLOEXEC)
49858 tshark   2.915727 RET   fcntl -1 errno 9 Bad file descriptor
49858 tshark   2.915760 CALL  fcntl(0x3527,F_SETFD,FD_CLOEXEC)
49858 tshark   2.915794 RET   fcntl -1 errno 9 Bad file descriptor
49858 tshark   2.915828 CALL  fcntl(0x3528,F_SETFD,FD_CLOEXEC)
49858 tshark   2.915861 RET   fcntl -1 errno 9 Bad file descriptor
49858 tshark   2.915894 CALL  fcntl(0x3529,F_SETFD,FD_CLOEXEC)
49858 tshark   2.915928 RET   fcntl -1 errno 9 Bad file descriptor
49858 tshark   2.915962 CALL  fcntl(0x352a,F_SETFD,FD_CLOEXEC)
49858 tshark   2.915995 RET   fcntl -1 errno 9 Bad file descriptor
49859 tshark   2.914993 CALL  fcntl(0x3476,F_SETFD,FD_CLOEXEC)
49859 tshark   2.916066 RET   fcntl -1 errno 9 Bad file descriptor
49858 tshark   2.916029 CALL  fcntl(0x352b,F_SETFD,FD_CLOEXEC)
49858 tshark   2.916134 RET   fcntl -1 errno 9 Bad file descriptor
49858 tshark   2.916167 CALL  fcntl(0x352c,F_SETFD,FD_CLOEXEC)
49858 tshark   2.916201 RET   fcntl -1 errno 9 Bad file descriptor