Page MenuHomeFreeBSD

Implement a close_range(2) syscall
Needs ReviewPublic

Authored by kevans on Sep 12 2019, 10:40 PM.

Details

Reviewers
kib
emaste
markj
brooks
vangyzen
Group Reviewers
manpages
Summary

[This interface is based on what is described in version 3 of a patchset for close_range() to lkml -- I would wait for it to be accepted and merged before committing...]

close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible.

sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity[0].

The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature.

[0] I'm not averse to changing it at all...

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 28134

Event Timeline

kevans created this revision.Sep 12 2019, 10:40 PM
kib accepted this revision.Sep 13 2019, 5:49 AM
kib added inline comments.
include/unistd.h
329

this should go under BSD_VISIBLE. I think closefrom() should as well, but this is the existing bug.

sys/kern/kern_descrip.c
1326

s/available/defined/

This revision is now accepted and ready to land.Sep 13 2019, 5:49 AM
markj added inline comments.Sep 13 2019, 1:43 PM
sys/kern/kern_descrip.c
1294

Is it intended for userland to be able to pass KERN_CLOSE_RANGE_MAXFD? If not we should check for it in the sys_ wrapper and return EINVAL.

kevans added inline comments.Sep 13 2019, 2:15 PM
sys/kern/kern_descrip.c
1294

Hmm... based on the current definition of our close_range, we would have to since Linux is expecting ~0U to close all fd, and I've represented both fd parameters as int. I guess I should likely change this to u_int to match -- I didn't pay enough detail to this initially, and we usually don't accept unsigned fd in syscalls so I didn't think much of it.

kevans added inline comments.Sep 13 2019, 2:23 PM
sys/sys/syscallsubr.h
317

The internal kern_close_range definition is likely going to change to take u_int fds, so this will change to ~0U... is it worth keeping the definition around at that point, just to formalize it?

markj added inline comments.Sep 13 2019, 4:32 PM
sys/kern/kern_descrip.c
1294

lowfd and highfd are not really descriptors, but rather bounds for a range of descriptors. Having fds typed as int is useful for detecting a programming error that results in userland specifying an fd of -1, but here we give a special meaning to a value of -1 (or ~0U, whatever) for highfd. So from that perspective, using u_int makes sense to me.

Since we are allowing ~0U from userspace, should that be documented in the man page?

kevans marked 3 inline comments as done.Sep 13 2019, 5:33 PM
kevans added inline comments.
sys/kern/kern_descrip.c
1294

Fair- I'll go ahead and change that to match.

I think once the implementation takes unsigned values instead, documenting it specifically in the manpage doesn't mean too much. At that point, there is no specific check for ~0U and it's no longer a violation of what would normally return EINVAL, it just gets naturally clamped down to fdp->fd_lastfile and the manpage should make sure to indicate that lowfd, highfd will get clamped to the valid range.

kevans updated this revision to Diff 62044.Sep 13 2019, 5:40 PM
kevans marked an inline comment as done.
kevans edited the summary of this revision. (Show Details)

Whoops, prematurely marked those as done -- upload the new diff now because of it.

  • s/available/defined/
  • changed close_range to take u_int
  • Dropped the macro for KERN_CLOSE_RANGE_MAXFD; ~0U works just as well, or whatever other arbitrarily high value the caller wants to pass in
  • __BSD_VISIBLE for the close_range declaration
  • Some manpage revision: mention the arguments will be clamped to valid range of open descriptors
This revision now requires review to proceed.Sep 13 2019, 5:40 PM
markj added inline comments.Sep 13 2019, 5:51 PM
sys/kern/kern_descrip.c
1296

Shouldn't the check come before the clamp? Suppose a process has descriptors 0, 1 and 2 open and fd_lastfile is 2. closefrom(3) would return EINVAL.

kevans marked an inline comment as done.Sep 13 2019, 5:57 PM
kevans added inline comments.
sys/kern/kern_descrip.c
1296

Hmm... I thought I had wanted different semantics for close_range (i.e. close_range(3, ~0U, 0) in that same scenario to return EINVAL), but I suppose that also makes little sense.

kib accepted this revision.Sep 13 2019, 6:02 PM
kib added inline comments.
include/unistd.h
331

I would instead place the declaration into existing BSD_VISIBLE block at the end of the header.

This revision is now accepted and ready to land.Sep 13 2019, 6:02 PM
kevans updated this revision to Diff 62049.Sep 13 2019, 6:11 PM

Hopefully the last update:

  • Move close_range declaration down to the existing __BSD_VISIBLE block
  • Fix the EINVAL semantics; in both closefrom(2) and close_range(2) cases, returning EINVAL for (2, ~0U) just because we've already closed fd=2 and up would both break well-formed closefrom(2) and make little sense.
This revision now requires review to proceed.Sep 13 2019, 6:11 PM
markj accepted this revision.Sep 13 2019, 7:17 PM
This revision is now accepted and ready to land.Sep 13 2019, 7:17 PM
vangyzen accepted this revision.Nov 5 2019, 3:54 PM
vangyzen added a subscriber: vangyzen.
vangyzen added inline comments.
include/unistd.h
561

This __BSD_VISIBLE is redundant with line 486. (This is an existing bug.)

close_range should go on line 497.

kevans updated this revision to Diff 65618.Dec 13 2019, 8:11 PM
kevans marked an inline comment as done.

Rebase (new syscalls have been introduced since) and address feedback from @vangyzen -- still no indication of the status of the Linux patch; Christian had resent a v5 on 2019/10/25 targetting Linux 5.5, but it looks like that's not going to happen.

This revision now requires review to proceed.Dec 13 2019, 8:11 PM