linuxulator: Add user xattrs support for i386 and amd64 platforms.
Needs RevisionPublic

Authored by fsu on Nov 23 2017, 8:33 AM.

Details

Summary

...

Test Plan

The functionality was tested using LTP from master.
The helper script is attached.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
fsu created this revision.Nov 23 2017, 8:33 AM
fsu added a comment.Nov 23 2017, 8:34 AM

Test helper script for LTP running.

LGTM, but I'll add some reviewers that may have further insight.

As a side note: I couldn't find any linux syscalls related to ACLs so I suspect they are implemented in userland(glibc?) using EAs. This would mean that ACLs "just work" in the linuxulator but are not translated to FreeBSD ACLs. Just something for future consideration, I won't expect it for this revision.

sys/amd64/linux/linux_sysvec.c
150 ↗(On Diff #35645)

Letś commit this independently first.
It was discussed with netchild@

sys/amd64/linux32/linux32_sysvec.c
149 ↗(On Diff #35645)

Same as 64 bit version.. should be committed first (and MFC'd).

sys/compat/linux/linux_xattr.c
415

Typo: convert

sys/compat/linux/linux_xattr.h
3

Please add SPDX ID tags here.
The committers guide was updated :).

sys/i386/linux/linux_sysvec.c
148 ↗(On Diff #35645)

As in previous comments: should be committed first.
I am wondering why we define this three times, but that isn't necessarily wrong or something we should fix now.

fsu added a comment.Nov 24 2017, 8:30 AM

In case of linux acls worked thru xattr syscalls, but with different namespace:
% strace setfacl -m m::rx ./FILE0
...
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
lstat("./FILE0", {st_mode=S_IFREG|0654, st_size=5, ...}) = 0
getxattr("./FILE0", "system.posix_acl_access", "\2\0\0\0\1\0\6\0\377\377\377\377\4\0\6\0\377\377\377\377\20\0\5\0\377\377\377\377 \0\4\0\377\377\377\377", 132) = 36
exit_group(0)

The patch above will not support acls. It supports only 'user.' namespace linux prefix.
To provide acl supporting, the linux acls prefixes parsing should be added, depending of it VOP_* acl related functions should be called.
I am not prefer to add it in the same patch, because it will be too complex.

Link to review for bsd -> linux errno mappings update:
https://reviews.freebsd.org/D13221

pfg added a comment.Nov 24 2017, 3:32 PM
In D13209#275338, @thisisadrgreenthumb_gmail.com wrote:

The patch above will not support acls. It supports only 'user.' namespace linux prefix.
To provide acl supporting, the linux acls prefixes parsing should be added, depending of it VOP_* acl related functions should be called.
I am not prefer to add it in the same patch, because it will be too complex.

I agree: I have no hurry for ACLs, although they are the most important attributes. I just mention them because it's likely that Robert & Co. will want support for them at some point, like to run SELINUX.

sys/compat/linux/linux_xattr.c
2
* SPDX-License-Identifier: BSD-2-Clause-FreeBSD
*
sys/compat/linux/linux_xattr.h
2
* SPDX-License-Identifier: BSD-2-Clause-FreeBSD
*
cem added a comment.Nov 27 2017, 12:05 AM

I haven't reviewed the entire patch, just wanted to put out comments for now.

sys/amd64/linux/linux_proto.h
613–614 ↗(On Diff #35645)

Please don't include generated code in code review.

When this is committed, traditionally the regenerated syscalls are done separately as a second commit, as well.

sys/bsm/audit_kevents.h
776

G comes before I alphabetically (this entry is in the wrong place)

sys/compat/linux/linux_xattr.c
102–104

Is this requirement documented somewhere? I skimmed FreeBSD and Linux extattr manual pages and did not see this fs object type or error code documented.

119

I don't understand what vap is for at all. Why not just use vattr directly?

157–158

Can drop else and indentation. Everything after if(...) goto foo; is else.

159

TRYUPGRADE is typically just called UPGRADE.

UPGRADE can fail. This lock call needs error checking.

Finally, why take SHARED initially and UPGRADE? It's not like there is a happy path where we can finish the operation SHARED only. Instead we simply do some error checking with SHARED before upgrading.

Given that, I think it makes the most sense to just take EXCLUSIVE initially and skip this upgrade.

sys/compat/linux/linux_xattr.h
30

This should be defined in terms of the Linux ABI, not a FreeBSD constant.

fsu updated this revision to Diff 35850.Nov 27 2017, 11:56 AM

Fix different marks from pfg and cem + split the patch additionaly (see comment above).

fsu marked 8 inline comments as done.Nov 27 2017, 12:02 PM
cem added a comment.Nov 28 2017, 3:19 AM

Thanks!

If someone wanted to test this, how would they setup a test environment? Where is the mentioned test script? (Maybe I'm just missing it.)

sys/compat/linux/linux_xattr.c
92

could be linux_extattr_set_vp() to match the FreeBSD extattr_set_vp().

123

Why do we check these? extattr_set_vp() doesn't.

We should probably perform the same MAC checking extattr_set_vp() does.

Hm, most of the content of this routine could be replaced by a function call, if we made extattr_set_vp non-static.

164

It seems like this check could be moved up to the beginning of the function

186

Shouldn't these use the Linux constant XATTR_NAME_MAX instead of the FreeBSD one?

189

I'd use sizeof(attrname) in place of EXTATTR_MAXNAMELEN, but that's a weak style preference.

270

This could probably be a thin shim around a non-static extattr_get_vp().

306

style nits: should be combined with above line; missing space between if and (

426

Probably worth linking to fs/fuse's fuse_xattrlist_convert(), which is kind of the inverse function.

(Also, I didn't review this function yet.)

430–433

As used by linux_listxattr -> linux_listxattr_common, linux_buf and thus linux_p point into userspace.

To write to them, you need to use copyout().

It would also be good to more explicitly name the passed pointers so it is clear they are dangerous userspace pointers.

491

It seems odd to need an exclusive lock for listing! The VOP only requires SHARED.

492

We should perform the same MAC check as extattr_list_vp().

498

This implementation will only pass through user.foo extended attributes to Linux programs. Should Linux programs be able to list / inspect system. attributes too?

503–504

This cannot occur. The NULL check should be removed.

531

I'm not sure this is this needed

535

Nor this

pfg added a comment.Nov 28 2017, 4:05 AM
In D13209#276906, @cem wrote:

Thanks!

If someone wanted to test this, how would they setup a test environment? Where is the mentioned test script? (Maybe I'm just missing it.)

In one of the early comments Fedor attached a shell script with the tests extracted from the Linux Test Project.
Alternatively you can check all the test suite here:

https://linux-test-project.github.io/

There are several at testcases/kernel/syscalls

cem added a comment.Nov 28 2017, 4:24 AM
In D13209#276929, @pfg wrote:

In one of the early comments Fedor attached a shell script with the tests extracted from the Linux Test Project.
Alternatively you can check all the test suite here:

https://linux-test-project.github.io/

There are several at testcases/kernel/syscalls

So I need a Linux system to compile them, right? Do they compile statically or do I need some Linux shared libraries as well? Thanks!

fsu added a comment.Nov 28 2017, 7:56 AM
In D13209#276930, @cem wrote:
In D13209#276929, @pfg wrote:

In one of the early comments Fedor attached a shell script with the tests extracted from the Linux Test Project.
Alternatively you can check all the test suite here:

https://linux-test-project.github.io/

There are several at testcases/kernel/syscalls

So I need a Linux system to compile them, right? Do they compile statically or do I need some Linux shared libraries as well? Thanks!

As it could be seen from test script, you need just git clone ltp project from github (https://github.com/linux-test-project/ltp).
Than configure and make it.
Place test script from attachment near the ltp directory.
Run the script. It will create directory with testcase binaries and additional run script.
All this stuff should be done on linux machine with the same architecture as your FreeBSD test machine.
Copy already created directory to freebsd. cd to it. You will see run_ltp_xattrs.sh. Run it.
It is expected, that all linux testcases will be "green".

fsu added a comment.Nov 28 2017, 8:33 AM

As could be seen from https://reviews.freebsd.org/D13267,
all extattr_*_vp() functions were exported, but, the only extattr_delete_vp() is used by current review iteration.
To use all the extattr_*_vp() functions I need to pull up the vnode locking logic out of this functions,
because I need one vnode lock per one system call, as I understand.
So, I should decide, where is the better way:

  1. Use the current way, where from linux_*_common() I will call VOP_*'s.
  2. Try to make extattr_*_vp() from vfs_extattr.c more generic. I mean remove vnode locking logic and add additional checks.
sys/compat/linux/linux_xattr.c
123

Because it is required by LTP test cases.
MAC checks are not required because this logic works only with user. xattrs.

430–433

Yep, will be fixed later.

498

The patch is working only with user. namespace.
The system. namespace mean that these are ACL's on linux side, because it does not use extattr namespace integers, like freebsd, moreover it does not have acl syscalls.
Make system. namespace support by linuxulator it is next, relativly complex task, which could be made in the future.

cem added a comment.Dec 1 2017, 7:25 AM
In D13209#276969, @thisisadrgreenthumb_gmail.com wrote:

So, I should decide, where is the better way:

  1. Use the current way, where from linux_*_common() I will call VOP_*'s.
  2. Try to make extattr_*_vp() from vfs_extattr.c more generic. I mean remove vnode locking logic and add additional checks.

I think (2) matches what we do elsewhere -- the pattern is sys_foo is the FreeBSD syscall that invokes kern_foo; the Linux version of foo can also invoke kern_foo.

fsu added a comment.Dec 1 2017, 9:49 AM

Ok, thanks for suggest.
My choice was closer to (1), because it was not required to modify kernel code, so it was faster in case of debugging, because it is not needed to rebuild kernel every time. But (2) is more clear.
So, I will not commit https://reviews.freebsd.org/D13267 for now, but I will update it little bit later together with current review.

rwatson requested changes to this revision.Dec 1 2017, 1:13 PM

Overall, if the vn_extattr_*(9) KPIs are not sufficient to implement the required semantics, then we should consider ways to improve them to avoid the code duplication and errors found in this patch. I'm a bit puzzled by the need to query attributes to check immutable flags, etc. explicitly -- the filesystem should be performing these checks already. Unless something else is wrong..?

sys/bsm/audit_kevents.h
778

You should submit patches to the OpenBSM project to allocate actual audit event identifiers for these Linux system calls.

sys/compat/linux/linux_xattr.c
123

I'm confused by this statement. MAC checks are always required, regardless of whether we are modifying user extended attributes or otherwise. You should really be using the underlying vnode access methods from VFS: vn_extattr_set(), vn_extattr_set(), etc, rather than rolling your own. They exist to avoid code duplication leading to implementation errors, such as the one you have just described.

291

You appear to have skipped the MAC check here as well. This check is very important and should not be omitted. More ideally, you would use the existing vn_extattr_get(9) call, which will do that check for you.

492

Yes, a MAC check is required here as well. Ideally vn_extattr_list(9) would be used to avoid code duplication.

This revision now requires changes to proceed.Dec 1 2017, 1:13 PM
fsu added a comment.Dec 4 2017, 8:53 AM

Ok, I was wrong with with MAC and user extattrs, the MAC checking functions should be restored.
Also, the general question, just to be sure, does MAC check is responsibility of VFS or the filesystem?

A am asking because:

  1. I can not find vn_extattr_list(9) function in vnode.h header, it would be great if somebody point it to me.
  2. I can not find MAC check in case of vn_extattr_get(9). I can see call to mac_vnode_check_getextattr() only in case of extattr_get_vp() from vfs_extattr.c, or the check is implemented by some other way?
  3. Also, I can see the code duplication in case of vfs_extattr.c and vfs_vnops.c:

The extattr_*_vp() group of functions from the vfs_extattr.c could be basicly removed and the vn_extattr_*() group could be used.

So, I ready to prepare review to adding vn_extattr_list(9) and removing code duplication for vfs_extattr.c and vfs_vnops.c in case of extattrs, will be it rational, or there are some reasons to leave it as is?

In D13209#278822, @thisisadrgreenthumb_gmail.com wrote:

Ok, I was wrong with with MAC and user extattrs, the MAC checking functions should be restored.
Also, the general question, just to be sure, does MAC check is responsibility of VFS or the filesystem?

A am asking because:

  1. I can not find vn_extattr_list(9) function in vnode.h header, it would be great if somebody point it to me.
  2. I can not find MAC check in case of vn_extattr_get(9). I can see call to mac_vnode_check_getextattr() only in case of extattr_get_vp() from vfs_extattr.c, or the check is implemented by some other way?
  3. Also, I can see the code duplication in case of vfs_extattr.c and vfs_vnops.c: The extattr_*_vp() group of functions from the vfs_extattr.c could be basicly removed and the vn_extattr_*() group could be used.

    So, I ready to prepare review to adding vn_extattr_list(9) and removing code duplication for vfs_extattr.c and vfs_vnops.c in case of extattrs, will be it rational, or there are some reasons to leave it as is?

Responsibility for MAC is split between VFS common code and the filesystem. Wherever possible, we seek to perform access control checks in the common VFS code to avoid the need for MAC checks in individual filesystems (which would be verbose, error-prone, and open for door for potential inconsistency).

MAC label storage, on the other hand, is the responsibility of the filesystem, since filesystems may store MAC labels in different ways -- for example, UFS might store them in extended attributes, whereas NFS might use extended RPCs to access labels. Most filesystems, however, are left without any specific knowledge of MAC due to support for single-label filesystems, in which VFS uses the label of the mount point for every vnode in the filesystem, avoiding the need for filesystem-specific storage (but at the cost of not allowing different labels for different tiles).

vfs_extattr.c currently implements both the native extended-attribute system calls in the FreeBSD system-call ABI and also utility functions to support those system calls. In the new world order, one can imagine refactoring this a bit so that the utility functions better serve other system-call ABIs, such as the Linux system-call ABI.

My apologies -- in my comment, I miswrote: you should be using extattr_set_vp(), extattr_get_vp(), etc, and not the vn_extattr_*() KPIs. The former are intended to support system calls using full authorisation. The latter are another set of utility functions intended to service kernel components that are themselves utilising extended attributes -- e.g., within filesystems themselves, or in the MAC Framework. Note the comment above vn_extattr_set() discussing authorisation as the kernel. Possibly the vn_*() functions should be renamed to make that distinction more clear.

I would be supportive of a change to make the extattr_*_vp(9) KPIs "public" (i.e., non-static, documented) and using those directly in the Linux system-call ABI implementation. We might wish to rename these to be kern_extattr_*() to make it more clear that they are the internals of system calls.

fsu added a comment.Dec 5 2017, 12:56 PM

Ok, if I understood correctly, the idea is to use vp_extattr_*() to use in case of calls from filesystem, and extattr_*_vp() - in case of system calls.
Let's get back to:
https://reviews.freebsd.org/D13267
I want to solve the interface questions before getting back to linuxulator xattrs implementation.
Robert, you are already invited to the review above.