Page MenuHomeFreeBSD

fdescfs: add an option to return underlying file vnode on lookup
ClosedPublic

Authored by kib on May 5 2021, 11:11 PM.

Details

Summary

The 'nodup' option forces fdescfs to return real vnode behind file descriptor instead of the fdescfs fd vnode. The end result is that e.g. stat("/dev/fd/3") returns the stat data for the underlying vnode, if any. Similarly, fchdir(2) works in the expected way.

For open(2), if applied over file descriptor opened with O_PATH, it effectively re-open that vnode into normal file descriptor which has the specified access mode, assuming the current vnode permissions allow it.

This is done by a mount option, because permission check on open(2) breaks established fdescfs open semantic of dup(2)-ing the descriptor. So it is not suitable for /dev/fd mount.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib requested review of this revision.May 5 2021, 11:11 PM
kib created this revision.

Excellent. This provides exactly what we needed for samba. In addition to the issues reported about O_DIRECTORY on fdescfs, I can now do the following:

>>>import os
>>>os.open("/testdir", os.O_RDONLY)
3
>>> os.chdir("/dev/fd/3")
>>> os.getcwd()
'/dir2'
>>> os.unlink("/testdir")
>>> os.symlink("/dir1", "/testdir")
>>> os.chdir("/testdir")
>>> os.getcwd()
'/dir1'
>>> os.chdir("/dev/fd/3")
>>> os.getcwd()
'/dir2'

I appreciate the work in getting this done.

After some more considerations, I do not think that this can go as is. Problem is that returning real vnode makes e.g. open("/dev/fd/3") recheck the permissions on the real vnode, and if they changed compared to the time when fd 3 was opened, open("/dev/fd/3") fails, which is not same as dup2().

I will try to provide more nuanced patch. We probably should distinguish lookups from open vs lookups from anything else.

BTW, another approach is to give you something like openat(fd, NULL, O_EMPTY_PATH).
It would e.g. reopen O_PATH-opened fd into normal fd, same as this fdescfd opens. I understand that this would imply some amount of changes to the samba code. But would it be cleaner and less depended on the administrative config of the machine? (Only discussing)

Do not substitute fdescfs vnode by real vnode on open, this causes unneeded rights recheck on open.

NB. This means that D30131 is needed again, in addition to this change.

And more I think about this, more I am on the side that O_EMPTY_PATH is much cleaner solution than trying to hack fdescfs to be more linux-like.

I will defer to your judgment about what makes sense and works best from a kernel perspective. The issue you highlighted with re-evaluation of permissions on open may be a problem. I haven't checked what Linux does in this case.

In general though, the more that we deviate from Linux style behavior, the more overhead will be added for porters in terms of porting and maintaining software. It also means that we'll be less likely to have FreeBSD-specific regressions. It would be better from my perspective if there was some way to do what I did above in the python snippet.

I will defer to your judgment about what makes sense and works best from a kernel perspective. The issue you highlighted with re-evaluation of permissions on open may be a problem. I haven't checked what Linux does in this case.

Can you re-check that D30131 + updated D30140 solve all your cases? With it, if fd 3 was opened with O_PATH, open("/dev/fd/3") still gives O_PATH-opened dup, but things like chdir() should work.

open("/dev/fd/N") not being exactly dup2() probably breaks existing software that relies on the behavior, e.g. bash.

I have two distinct use-cases for the O_PATH descriptors:

  1. "re-open" the O_PATH desc. This wasn't possible because fdescfs gives me the equivalent of dup2(). You propose to use openat() as follows int new_fd = openat(opath_fd, NULL, O_EMPTY_PATH); to do this?
  2. be able to use path-based syscalls with a fdescfs path, e.g. chmod("/dev/fd/FD", 551);

With the above referenced patches D30131 and D30140, use-case (2) works correctly. I am probably mis-understanding the directions for (1). It's failing with EBADF:

openat(AT_FDCWD,"/opath_dir",O_RDONLY|0x400000,00) = 3 (0x3)
openat(3,"(null)",O_RDONLY|0x2000000,00)	 ERR#14 'Bad address'

I also tried with patch in D30148.

I have two distinct use-cases for the O_PATH descriptors:

  1. "re-open" the O_PATH desc. This wasn't possible because fdescfs gives me the equivalent of dup2(). You propose to use openat() as follows int new_fd = openat(opath_fd, NULL, O_EMPTY_PATH); to do this?
  2. be able to use path-based syscalls with a fdescfs path, e.g. chmod("/dev/fd/FD", 551);

With the above referenced patches D30131 and D30140, use-case (2) works correctly. I am probably mis-understanding the directions for (1). It's failing with EBADF:

openat(AT_FDCWD,"/opath_dir",O_RDONLY|0x400000,00) = 3 (0x3)
openat(3,"(null)",O_RDONLY|0x2000000,00)	 ERR#14 'Bad address'

This should be openat(3, "", O_RDONLY|O_EMPTY_PATH) most likely, sorry for confusion.

I also tried with patch in D30148.

Yes O_EMPTY_PATH AKA 0x400000 is added by D30148.

Unfortunately I do not think that we can change fdescfs to return real vnode from lookup, it breaks e.g. bash.

Okay. Still having issues with O_EMPTY_PATH for my minimal test case:

	int op, fd;
	op = open("/", O_DIRECTORY|O_PATH);
	if (op == -1) {
		printf("op failed: %s\n", strerror(errno));
	}

	fd = openat(op, NULL, O_RDONLY|O_EMPTY_PATH);
	if (fd == -1) {
		printf("fd failed: %s\n", strerror(errno));
	}

results in

openat(AT_FDCWD,"/",O_RDONLY|O_DIRECTORY|0x400000,00) = 4 (0x4)
openat(4,"(null)",O_RDONLY|0x2000000,00)	 ERR#14 'Bad address'

I only applied the three patches listed above.

Okay. Still having issues with O_EMPTY_PATH for my minimal test case:

	int op, fd;
	op = open("/", O_DIRECTORY|O_PATH);
	if (op == -1) {
		printf("op failed: %s\n", strerror(errno));
	}

	fd = openat(op, NULL, O_RDONLY|O_EMPTY_PATH);
	if (fd == -1) {
		printf("fd failed: %s\n", strerror(errno));
	}

results in

openat(AT_FDCWD,"/",O_RDONLY|O_DIRECTORY|0x400000,00) = 4 (0x4)
openat(4,"(null)",O_RDONLY|0x2000000,00)	 ERR#14 'Bad address'

I only applied the three patches listed above.

Please look closely at my last comment. Use openat(fd, "", O_RDONLY|O_EMPTY_PATH) <- note that the second argument is "" and not NULL.

Indeed. It works much better when I read and do the right thing :)

Okay. I can confirm that the combination of D30131 D30140 and D30148 work as expected.

For reference purpose, here is a link to the WIP Samba development branch where I'm working this issue: https://gitlab.com/samba-team/devel/samba/-/commits/anodos325-add-fdescfs-proc-fd-path-plumbing
overall strategy is:

  1. in source3/smbd/open.c convert "pathref" struct files_struct to a non-pathref one through openat() with O_EMPTY_PATH.
  2. in source3/lib/system.c we set up a "proc_fd pattern" for FreeBSD using fdescfs. (this is will be used by samba in cases where Linux-side uses /proc/self/fd/FD with path-based syscalls that aren't open().
  3. write verbose enough log messages to guide admin to correct (for samba) fdescfs configuration.

For reference purpose, here is a link to the WIP Samba development branch where I'm working this issue: https://gitlab.com/samba-team/devel/samba/-/commits/anodos325-add-fdescfs-proc-fd-path-plumbing
overall strategy is:

  1. in source3/smbd/open.c convert "pathref" struct files_struct to a non-pathref one through openat() with O_EMPTY_PATH.
  2. in source3/lib/system.c we set up a "proc_fd pattern" for FreeBSD using fdescfs. (this is will be used by samba in cases where Linux-side uses /proc/self/fd/FD with path-based syscalls that aren't open().
  3. write verbose enough log messages to guide admin to correct (for samba) fdescfs configuration.

Did you considered avoiding fdescfs at all? I do not know why Linux people decided to use it there, but also Linux devised the AT_EMPTY_PATH flag. So for instance when you need to do stat on fd 3 (O_PATH), you can just do fstatat(fd, """, &st, AT_EMPTY_PATH), avoiding additional name lookup by stat("/dev/fd/3").

Be stricter, do check for LOOKUP op when returning real vnode.

VFS syscalls and VOPs should already check for cross-volume dvp/vp, but be safe.

kib retitled this revision from fdescfs: return opened file vnode on lookup to fdescfs: add an option to return underlying file vnode on lookup.
kib edited the summary of this revision. (Show Details)
markj added inline comments.
share/man/man5/fdescfs.5
93
142
152
This revision is now accepted and ready to land.May 31 2021, 12:50 PM
kib marked 3 inline comments as done.May 31 2021, 1:07 PM