Page MenuHomeFreeBSD

Provide O_SEARCH
ClosedPublic

Authored by kevans on Jan 18 2020, 4:19 AM.
Tags
None
Referenced Files
F108321449: D23247.id67657.diff
Thu, Jan 23, 8:38 PM
Unknown Object (File)
Fri, Jan 17, 8:59 PM
Unknown Object (File)
Tue, Jan 14, 1:11 AM
Unknown Object (File)
Sat, Jan 11, 11:02 PM
Unknown Object (File)
Thu, Jan 9, 10:05 PM
Unknown Object (File)
Thu, Jan 9, 2:00 PM
Unknown Object (File)
Sat, Dec 28, 10:22 PM
Unknown Object (File)
Dec 11 2024, 2:12 PM

Details

Reviewers
kib
jilles
ngie
Group Reviewers
manpages
Restricted Owners Package(Owns No Changed Paths)
Commits
rS357412: Provide O_SEARCH
rS357410: pseudofs: don't do VEXEC check in VOP_CACHEDLOOKUP
Summary

O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself. This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that one when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

lib/libc/sys/open.2
525 ↗(On Diff #66957)

I wasn't sure how to properly annotate this any other way to effectively communicate that O_RDONLY/O_WRONLY/O_RDWR are mutually exclusive with both O_SEARCH/O_EXEC.

The manpage should explain what "search only" means.

lib/libc/sys/open.2
525 ↗(On Diff #66957)

An attempt was made to open a descriptor with an illegal combination of O_RDONLY, O_WRONLY, or O_RDWR, and O_EXEC or O_SEARCH.

Or

An attempt was made to open a descriptor with an illegal combination of (O_RDONLY || O_WRONLY || O_RDWR) and (O_EXEC || O_SEARCH).

I do not object, but there is some issue that probably makes O_SEARCH addition not very reasonable until it is resolved. From what I read in POSIX description of O_SEARCH, access checks for search must be performed at the open time. Then, the *at(2) family of syscalls should not check the current directory permissions when the file descriptor is specified as the starting point for the lookup. We do not implement this mode, namei(9) always checks the current directory mode for the path starting point.

This was allowed by the POSIX draft when *at implementation was done. See also the note in open(2) man page.

To implement this, I think the most straightforward way would be for struct componentname cn_flags to grow a specific flag to avoid VEXEC check in vfs_cache_lookup(), which should then be cleared upon use. The flag is set in namei when the vnode from fd is set as a starting point and O_SEARCH was specified for open.

Ah, I overlooked that- this plan sounds reasonable for me, I will do this.

Is it harmful to extend our interpretation of UB for O_EXEC in this way? It's not clear to me that O_EXEC on a directory is otherwise sensible/useful, and I would think no one relies on this now, so I lean towards "not harmful."

Is it harmful to extend our interpretation of UB for O_EXEC in this way? It's not clear to me that O_EXEC on a directory is otherwise sensible/useful, and I would think no one relies on this now, so I lean towards "not harmful."

Do we expect that O_EXEC might start meaning anything for directories ? Does e.g. linux define different values for O_EXEC vs. O_SEARCH ? Currently POSIX says that the result is unspecified.

My gut reaction is to save the bit and reuse O_EXEC. This might require more changes, e.g. is O_EXEC | O_RDONLY a valid open mode for directory ?

In D23247#509781, @kib wrote:

Do we expect that O_EXEC might start meaning anything for directories ? Does e.g. linux define different values for O_EXEC vs. O_SEARCH ? Currently POSIX says that the result is unspecified.

I don't think we would expect this at all. Currently, AFAICT, Linux (nor glibc) actually implement O_SEARCH because they have O_PATH. Illumos and NetBSD both define them as distinctly different values, while musl defines them as the same (and equivalent to O_PATH as a close approximation supported by the kernel they primarily operate under). Rich Felker had at some point suggested that Linux should choose O_PATH|03 or something to this effect as O_SEARCH since the semantics are slightly different from what they provide for O_PATH, but I cannot find where I read this now.

My gut reaction is to save the bit and reuse O_EXEC. This might require more changes, e.g. is O_EXEC | O_RDONLY a valid open mode for directory ?

I think the status quo is fine from a validity perspective. O_EXEC | O_RDONLY is what almost everyone that wants to use it gets anyways, because we reject all of the non !RDONLY ACCMODE bits in conjunction with O_EXEC. I think it makes sense to continue rejecting the other bits, because I do not think a writable O_EXEC/O_SEARCH directory has any obvious interpretation.

In D23247#509781, @kib wrote:

Do we expect that O_EXEC might start meaning anything for directories ? Does e.g. linux define different values for O_EXEC vs. O_SEARCH ? Currently POSIX says that the result is unspecified.

I don't think we would expect this at all. Currently, AFAICT, Linux (nor glibc) actually implement O_SEARCH because they have O_PATH. Illumos and NetBSD both define them as distinctly different values, while musl defines them as the same (and equivalent to O_PATH as a close approximation supported by the kernel they primarily operate under). Rich Felker had at some point suggested that Linux should choose O_PATH|03 or something to this effect as O_SEARCH since the semantics are slightly different from what they provide for O_PATH, but I cannot find where I read this now.

Ok.

My gut reaction is to save the bit and reuse O_EXEC. This might require more changes, e.g. is O_EXEC | O_RDONLY a valid open mode for directory ?

I think the status quo is fine from a validity perspective. O_EXEC | O_RDONLY is what almost everyone that wants to use it gets anyways, because we reject all of the non !RDONLY ACCMODE bits in conjunction with O_EXEC. I think it makes sense to continue rejecting the other bits, because I do not think a writable O_EXEC/O_SEARCH directory has any obvious interpretation.

I do not understand this part of your response. O_EXEC | O_RDONLY i distinct from O_EXEC (and from O_RDONLY in fact). O_EXEC allows entry lookup, while O_RDONLY allows entry reading.

My point is that we migh start allowing O_EXEC | O_RDONLY for directories.

In D23247#509797, @kib wrote:

I think the status quo is fine from a validity perspective. O_EXEC | O_RDONLY is what almost everyone that wants to use it gets anyways, because we reject all of the non !RDONLY ACCMODE bits in conjunction with O_EXEC. I think it makes sense to continue rejecting the other bits, because I do not think a writable O_EXEC/O_SEARCH directory has any obvious interpretation.

I do not understand this part of your response. O_EXEC | O_RDONLY i distinct from O_EXEC (and from O_RDONLY in fact). O_EXEC allows entry lookup, while O_RDONLY allows entry reading.

My point is that we migh start allowing O_EXEC | O_RDONLY for directories.

Sorry, my point was that O_RDONLY is defined to be 0 and anyone currently using it is implicitly getting O_EXEC | O_RDONLY anyways if they're using O_EXEC and there's no way to get any other behavior because you can't specify any non-0 ACCMODE bits with O_EXEC without rejection in kern_openat().

In D23247#509797, @kib wrote:

I think the status quo is fine from a validity perspective. O_EXEC | O_RDONLY is what almost everyone that wants to use it gets anyways, because we reject all of the non !RDONLY ACCMODE bits in conjunction with O_EXEC. I think it makes sense to continue rejecting the other bits, because I do not think a writable O_EXEC/O_SEARCH directory has any obvious interpretation.

I do not understand this part of your response. O_EXEC | O_RDONLY i distinct from O_EXEC (and from O_RDONLY in fact). O_EXEC allows entry lookup, while O_RDONLY allows entry reading.

My point is that we migh start allowing O_EXEC | O_RDONLY for directories.

Sorry, my point was that O_RDONLY is defined to be 0 and anyone currently using it is implicitly getting O_EXEC | O_RDONLY anyways if they're using O_EXEC and there's no way to get any other behavior because you can't specify any non-0 ACCMODE bits with O_EXEC without rejection in kern_openat().

Right. So then I am not sure if we should act as if O_RDONLY is specified, rather I think we should not. You should be able to open(O_EXEC) a dir without r permission, I think this is the main point of the flag.

In D23247#509800, @kib wrote:

Right. So then I am not sure if we should act as if O_RDONLY is specified, rather I think we should not. You should be able to open(O_EXEC) a dir without r permission, I think this is the main point of the flag.

Hmm... this gets a little more complicated, but I'm inclined to agree. Further, in my testing, Linux's O_PATH (which has been accepted by some as 'close enough' to O_EXEC/O_SEARCH and it's advertised as being suitable for O_EXEC) will ignore the r bit by virtue of the fact that it's not even opening the underlying file/dir.

In D23247#509800, @kib wrote:

Right. So then I am not sure if we should act as if O_RDONLY is specified, rather I think we should not. You should be able to open(O_EXEC) a dir without r permission, I think this is the main point of the flag.

Hmm... this gets a little more complicated, but I'm inclined to agree. Further, in my testing, Linux's O_PATH (which has been accepted by some as 'close enough' to O_EXEC/O_SEARCH and it's advertised as being suitable for O_EXEC) will ignore the r bit by virtue of the fact that it's not even opening the underlying file/dir.

Well, getdirents() would need to check this. And somewhere we need to call VOP_ACCESS(VEXEC) when doing open(O_EXEC) for directory (might be we already do).

But there is one more complication. The vfs_cache_lookup() change would only work for filesystem that use vfs_cache_lookup(). For filesystems that provide its own VOP_LOOKUP implementation we would need to do the same change as for vfs_cache_lookup().

In D23247#509851, @kib wrote:

Well, getdirents() would need to check this. And somewhere we need to call VOP_ACCESS(VEXEC) when doing open(O_EXEC) for directory (might be we already do).

AFAICT kern_getdirentries() will technically DTRT as-is; the fd has to have FREAD set because O_EXEC|O_WRONLY is explicitly rejected. I think I'll add the check anyways, though, just to make it clear that the expectation is that O_SEARCH will always allow it even if we change what O_EXEC may be combined with.

Edit: I was objectively wrong; typo in my test. Checking for FREAD | FSEARCH proves fruitful. Also, it looks like VEXEC should be checked in vn_open -> vn_open_vnode

But there is one more complication. The vfs_cache_lookup() change would only work for filesystem that use vfs_cache_lookup(). For filesystems that provide its own VOP_LOOKUP implementation we would need to do the same change as for vfs_cache_lookup().

I'm about halfway through these.

kevans retitled this revision from Provide O_SEARCH as an alias for O_EXEC to Provide O_SEARCH.
kevans edited the summary of this revision. (Show Details)

Update based on discussion; start building NetBSD's O_SEARCH tests and add a couple more to test semantics we've discussed.

lib/libc/sys/open.2
169 ↗(On Diff #67583)

May be explicitly state , the alias for O_EXEC.

sys/fs/nfsclient/nfs_clvnops.c
1201 ↗(On Diff #67583)

I would try to create a simple helper vn_dir_check_exec from the above 4 lines.

sys/kern/vfs_lookup.c
443 ↗(On Diff #67583)

Where is this reference dropped ?

sys/kern/vfs_lookup.c
443 ↗(On Diff #67583)

This should match the reference that previously would have been taken in fgetvp_rights, dropped at line 488 and 582.

It does look like namei will effectively leak this ref (along with the equivalent taken in the above ndp->ni_dirfd == AT_FDCWD branch) if we immediately jump at 503 or set dp again at 576.

I suspect we should simplify this by stashing dp as odp and moving the various if(dp != NULL) vrele(dp) to doing the same with odp after out label and in the !SYMLINK branch at 508.

I don't entirely understand some of this, though... If we reset dp at 576, what reference are we dropping if we end up at 582? I suspect I just haven't traced stuff out enough to find us ref'ing ndp->ni_dvp

sys/kern/vfs_lookup.c
443 ↗(On Diff #67583)

I misread the diff, sorry. Lets put the possible leak aside.

Note that O_SEARCH is an alias for O_EXEC, and add a helper for the common case of VEXEC checking. The remainder (zfs and fusefs) use internal VEXEC mechanisms used seemingly instead of VOP_ACCESS indirection.

sys/kern/vfs_lookup.c
443 ↗(On Diff #67583)

I am ok with this- I may revisit it after.

I suggest to add a test where the directory is opened for search, then its permissions are changed to 0 (or at least the owner is denied everything), and then the test tries to use the still opened dirfd for some *at call.

This revision is now accepted and ready to land.Feb 1 2020, 11:53 PM
In D23247#514763, @kib wrote:

I suggest to add a test where the directory is opened for search, then its permissions are changed to 0 (or at least the owner is denied everything), and then the test tries to use the still opened dirfd for some *at call.

Sure- will do pre-commit.

Said test revealed that this doesn't work quite right with ZFS, because it repeats the VEXEC check in VOP_CACHEDLOOKUP and doesn't see NOEXECCHECK. Add an internal 'cached' argument so we can determine if the request is coming by way of VOP_LOOKUP (zfs w/o namecache) or VOP_CACHEDLOOKUP and omit it in the latter case, since it's already been done.

pfs_lookup does also repeats the check; I've included it, but I will likely just remove that one in advance. (since it's generally irrelevant to O_SEARCH)

This revision now requires review to proceed.Feb 2 2020, 5:15 AM
This revision is now accepted and ready to land.Feb 2 2020, 12:44 PM
This revision was automatically updated to reflect the committed changes.
Owners added a reviewer: Restricted Owners Package.Feb 2 2020, 4:35 PM