Page MenuHomeFreeBSD

Add openat2 system call
Needs ReviewPublic

Authored by walker.aj325_gmail.com on Jan 10 2024, 2:25 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Sep 1, 3:57 PM
Unknown Object (File)
Mon, Aug 12, 10:14 AM
Unknown Object (File)
Aug 9 2024, 3:56 PM
Unknown Object (File)
Jul 30 2024, 2:41 AM
Unknown Object (File)
Jul 2 2024, 7:36 AM
Unknown Object (File)
Jul 2 2024, 7:27 AM
Unknown Object (File)
Jun 30 2024, 9:18 AM
Unknown Object (File)
Jun 28 2024, 5:25 PM
Subscribers

Details

Reviewers
kib
brooks
mav
Summary

Add an equivalent of the Linux openat2 system call. This is an extension of openat(2) system call and provides a superset of its functionality. Rather than taking arguments for mode and flags, a pointer to an open_how structure is passed along with a size argument to facilitate future extensions of the system call. At present, the resolve flag is the primary differentiator between openat2(2) and openat(2), which is used to specify how all components of the pathname will be resolved.

The openat2 system call is currently used by Samba after 4.17 to create a fast-path for opening files that helps mitigate a metadata performance regression in versions after Samba 4.13 due to symlink safety checks.

This functionality is implemented via the kern_openat2 kernel syscall with kern_openat being refactored to wrap around kern_openat2. This commit does not implement openat2 resolve flags (will be added in subsequent effort).

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

My current plan is to map the openat2 resolve flags into new ones that can be passed into vn_open_cred() as additional vn_open_flags and then mapped to relevant namei flags in open2nameif(), but am very much open to suggestions about how you want to do this properly.

I'm leery of changing arguments for vn_open_cred() since ZFS consumes it directly.

We can map RESOLVE_NO_XDEV to NOCROSSMOUNT and RESOLVE_BENEATH to RBENEATH pretty trivially.

Once we've settled on design for arguments that kern_openat2() takes, I will update other consumers in kernel to use it instead of kern_openat().

lib/libc/sys/Symbol.map
425

The symbol should go into FBSD_1.8 namespace

lib/libc/sys/openat2.c
32

No need to add these two lines anymore

39

Are you sure about the 'long' return type? IMO it should be int.

I am aware that Linux' syscall() proto has the long return type.

42

Where is sys_openat2() is defined?

I suspect you tried to mimic the handling of the libc intersposing by libthr, but it is very incomplete in the patch.

sys/kern/vfs_syscalls.c
1116

Why we need to check how for NULL? Wasn't it enough that we pass in-kernel pointer?

1116

Again, why the NULL check is needed? Isn't copyin below is enough?

1117

I suggest to just pass both flags arguments to kern_openat2() instead of the pointer to a structure/its size. The 'parsing' should be done by callers.

1119
1122

Why this and the next check are needed?

1180
lib/libc/sys/Symbol.map
821

Current best practice is to not expose these unless needed.

lib/libc/sys/openat2.c
39

I agree. The return is a file descriptor or -1 and that will always be an int. The upstream commit (https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=work.openat2&id=fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179) provides no justification for the use of long (and in fact claims the return type is int).

sys/bsm/audit_kevents.h
665 ↗(On Diff #132542)

This will need to be submitted upstream.

sys/kern/syscalls.master
3336

I'm slightly conflicted here. _In_ is currently ok, but I think this should actually be _In_reads_bytes_(size).

sys/kern/vfs_syscalls.c
1097

Space after declarations. You could alternatively use a compound literal as is done in sys_mmap.

1114

This doesn't match what Linux does. See the "If ksize is smaller than usize" case in the Extensibility section of the manpage. This is probably ok for now in practice, but there should be a comment here and an entry in the BUGS section of the manpage to document it.

1114

There's a lot of redundancy with parse_open_how's checks. Probably most should be here (or likely in a user_openat2 between sys_openat2 and kern_openat2) because all the errors in parse_open_how's should be KASSERTS.

1117–1124

I don't think these blocks are all that useful. If you're going to do it, it should be right above sys_openat2.

1134

Pretty sure this is int even if the syscall returns long.

1309

I'd probably put this adjacent to sys_openat.

sys/sys/openat2.h
58

Declaring these without implementation seems a bit odd.

sys/sys/syscallsubr.h
256 ↗(On Diff #132542)

Declaring here seems premature (though we'll need it in CheriBSD). It could be a static function for now.

This diff should address review received and contain reference implementation of handling for openat2 resolve flags. I'm happy to remove from this diff if you prefer for implementation to be handled in different commit.

walker.aj325_gmail.com added inline comments.
sys/bsm/audit_kevents.h
665 ↗(On Diff #132542)

Should I provisionally set to AUE_OPENAT for the purposes of this diff / review?

sys/kern/vfs_syscalls.c
1097

Space after declarations. You could alternatively use a compound literal as is done in sys_mmap.

Per other feedback, I converted kern_openat2 to just take a flags2 arg.

1114

That's a good point. I'll write up an entry in BUGS once I get to the point of writing the manpage and attaching here.

I think I covered the items you requested. I'm not very experienced with phabricator. Please let me know if I'm doing things wrong. I switched from AUE_OPENAT2 to AUE_OPENAT, but am not sure what is expected here regarding auditing. AUE_NULL is obviously wrong since it would provide an unaudited way to open files. Do I commit with AUE_OPENAT, then apply to get formal AUE_OPENAT2, then make another revision to update the auditing?

I suppose compat32 shims are not needed?

lib/libc/include/libc_private.h
216

Look at lib/libc/sys/interposing_table.c SLOTs.

Also, all this stuff is done to allow libthr to introduce cancellation points, please look at lib/libthr/thread/thr_syscalls.c, __thr_openat() as the model.

lib/libc/sys/openat2.c
31

I do not think that stdarg.h is needed

sys/kern/vfs_syscalls.c
1155

Don't you need to check for unknown flags and return an error?

In D43390#989830, @kib wrote:

I suppose compat32 shims are not needed?

Not until Linux decides to add a pointer to open_how.

sys/kern/vfs_syscalls.c
1092

Might as well remove the no longer required blank link while you're here.

1155

From my understanding of the Linux implementation, you need to check all the open_how members for invalid bit patterns here. It looks like kern_openat2 doesn't check flags sufficiently, but even if it did, you need to verify that the discarded high bits are zero. Likewise with mode can't contain ~07777 and resolve needs to only contain valid values.

jrtc27 added inline comments.
sys/kern/syscalls.master
3336

Typo'ed the fix