Page MenuHomeFreeBSD

vfs: add restrictions to read(2) of a directory
ClosedPublic

Authored by kevans on Apr 27 2020, 5:40 PM.

Details

Summary

Historically, we've allowed read() of a directory and some filesystems will accommodate (e.g. ufs/ffs, msdosfs). Some history, courtesy of Warner:

${COPY AND PASTE OF ALL OF IMP'S STATEMENTS}

Disallowing read(2) on a directory has the side-effect of masking application bugs from relying on other implementation's behavior (e.g. Linux) of rejecting these with EISDIR across the board, but allowing it has been a vector for at least one stack disclosure bug in the past[0].

By POSIX, this is implementation-defined whether read() handles directories or not. Popular implementations have chosen to reject them, and this seems sensible: the data you're reading from a directory would seem to not be structured in some unified way across filesystem implementations like with readdir(2), so it seems really hard to rely on this.

With this patch, we will reject most read(2) of a dirfd with EISDIR. Users that know what they're doing can conscientiously set bsd.security.allow_read_dir=1 to allow non-jailed root alone to read(2) directories, as it has proven useful for debugging or recovery.

While we're adding logic pertaining to directory vnodes to vn_io_fault, an additional assertion has also been added to ensure that we're not reaching vn_io_fault with any write request on a directory vnode. Such request would be a logical error in the kernel, and must be debugged rather than allowing it to potentially silently error out.

ZFS will no longer allow read(2) of a directory; what is was previously copying out is decidedly not useful for any debugging or recovery purposes. Allowing read(2) here is also inconsistent with other platforms that support ZFS, at least Solaris and Linux both rejecting such reads.

[0] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:10.ufs.asc

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Could you please incorporate the manual page change from D24814 into this one?

IMHO I think it's not worth the effort to make this knob-controllable and the old behavior should be gone: directories are not stored in an uniform format, it varies from file system to file system and I'd be really surprised if anything would depend on the old behavior: for starters, these application would have already been broken many years ago when we move to a different file system. Neither fsck_ffs nor fsck_msdosfs rely on being able to read() directories directly, and I don't see any legitimate reason that an application should do it either.

The EISDIR behavior is standard compliant and is already adopted by at least macOS, Linux and OpenBSD, by the way.

sys/fs/smbfs/smbfs_vnops.c
505 ↗(On Diff #71066)

Yes, it's still there (we might want to deprecate it or update it to support SMB2 protocol though).

Further discussion leaves me content with this patch, without a a sysctl to revert to the old behaviour.

Could you please incorporate the manual page change from D24814 into this one?

Definitely, will do in the next iteration.

Further discussion leaves me content with this patch, without a a sysctl to revert to the old behaviour.

It's somewhat tempting to leave a kernel config option behind to re-enable it and only axe the functionality everywhere that isn't ufs. I imagine Kirk's probably sometimes examining specifics that aren't necessarily reachable with readdir; this option would require basically no maintenance, as the path from vn_io_fault to ufs doesn't require any change and Kirk's the maintainer of the other side.

read(2) on a directory seems only useful for educational or debugging purposes. These purposes are better served by a different API.

read(2) on a directory may expose information about files that have been deleted, which is a security problem.

Given that some applications (such as Git) may depend on or be statically configured to depend on read(2) on a directory returning an error, a global knob seems unwise. Removing the functionality entirely seems better than adding a complicated enabling mechanism that will not be used much. So I agree with the current patch (plus a man page change).

kevans updated this revision to Diff 71709.May 13 2020, 4:17 AM

Incorporate man page change, complements of @delphij.

I've also added a short comment as to why I've added the check to vn_io_fault, since it may not be obvious in a brief read-through of the file if you're not already somewhat familiar with the fileop <-> vfs bits and the fact that fo_read starts here, though it's stated within the same file.

delphij accepted this revision.May 13 2020, 4:25 AM
This revision is now accepted and ready to land.May 13 2020, 4:25 AM
kib added inline comments.May 13 2020, 12:51 PM
sys/fs/msdosfs/msdosfs_vnops.c
555 ↗(On Diff #71709)

I think it is still worth keep this check somehow, either as KASSERT, or just as return (EISDIR). After all, msdosfs is often read from untrusted sources or even synthesized on fly.

There of course will be a redundancy with the upper-level check that in principle looked at the same data, but for msdosfs it is better to recheck.

kevans added inline comments.May 13 2020, 1:36 PM
sys/fs/msdosfs/msdosfs_vnops.c
555 ↗(On Diff #71709)

Good point- I'm leaning towards replacing it with return (EISDIR). If the situation is volatile enough that we end up here after having checked back up in vn_io_fault, I'm not sure panicking has much utility from a debugging perspective and we're not not necessarily in a state that otherwise warrants it.

kevans updated this revision to Diff 71727.May 13 2020, 5:25 PM

Added sanity check of denode attributes in msdosfs_read, rejecting with the appropriate EISDIR as needed.

This revision now requires review to proceed.May 13 2020, 5:25 PM
kib accepted this revision.May 13 2020, 5:36 PM
This revision is now accepted and ready to land.May 13 2020, 5:36 PM
kib added inline comments.May 13 2020, 5:38 PM
sys/kern/vfs_vnops.c
1171 ↗(On Diff #71727)

There should be a blank line after the last line of code that is described by multi-line comment, style(9).

kevans marked an inline comment as done.May 13 2020, 8:15 PM
kevans added inline comments.
sys/kern/vfs_vnops.c
1171 ↗(On Diff #71727)

Fixed in my local copy, will roll it into next update if others request changes.

jilles accepted this revision.May 13 2020, 9:20 PM
mckusick accepted this revision.May 13 2020, 10:17 PM

This looks good to me. I am fine with not supporting the old behavior.

delphij accepted this revision.May 13 2020, 11:03 PM
bdrewery added inline comments.
sys/kern/vfs_vnops.c
1171 ↗(On Diff #71727)

Where is this in style(9)?

cy accepted this revision.May 14 2020, 7:46 PM
emaste accepted this revision.May 14 2020, 7:46 PM

Probably grep(1) should be updated after this change. It has -d option to specify which action to to take on processing directories. From manpage:

-d ACTION, --directories=ACTION
       If an input file is a directory, use ACTION to process it.  By
       default, ACTION is read, which means that directories are read
       just as if they were ordinary files.  If ACTION is skip,
       directories are silently skipped.  If ACTION is recurse, grep
       reads all files under each directory, recursively; this is
       equivalent to the -r option.
kib added inline comments.May 15 2020, 2:19 PM
sys/kern/vfs_vnops.c
1171 ↗(On Diff #71727)

I cannot find it in style(9), but I also cannot find there a rule that multi-line comment should be preceded by a blank line.

imp added a comment.May 15 2020, 2:56 PM

Probably grep(1) should be updated after this change. It has -d option to specify which action to to take on processing directories. From manpage:

-d ACTION, --directories=ACTION
       If an input file is a directory, use ACTION to process it.  By
       default, ACTION is read, which means that directories are read
       just as if they were ordinary files.  If ACTION is skip,
       directories are silently skipped.  If ACTION is recurse, grep
       reads all files under each directory, recursively; this is
       equivalent to the -r option.

grep doesn't actually read directories for anything useful (and can't for ZFS), so this change is a nop as far as GREP is concerned.

kevans updated this revision to Diff 71826.May 15 2020, 3:08 PM
kevans marked an inline comment as done.
kevans retitled this revision from (RFC) vfs: return EISDIR for all read() of a directory to vfs: add restrictions to read(2) of a directory.
kevans edited the summary of this revision. (Show Details)
kevans added reviewers: rgrimes, phk.

Update to just restricting the ability to read() a directory to non-jailed root if bsd.security.allow_read_dir is set. I'm not dead-set on it living in the bsd.security.* namespace, but it could be a source of security issues if a MAC policy is presented that grants PRIV_VFS_READ_DIR so I've tentatively pushed it here.

Inviting some that had dissenting opinions on -arch@ to the review, because it's important to me that we come to a mutually agreeable solution.

This revision now requires review to proceed.May 15 2020, 3:08 PM
kevans added inline comments.May 15 2020, 3:10 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
649 ↗(On Diff #71826)

I'm going to send a patch for this to OpenZFS. This is arguably internal consistency for OpenZFS, as the Linux implementation will reject directories with EISDIR.

imp accepted this revision.EditedMay 15 2020, 3:10 PM

All this angst over reading a directory which you shouldn't be doing since around 1983 or so.
fsdb is adequate, or should be updated, for debugging these things. It uses an existing security grant. And if you're looking at the low-level stuff, you'll likely need more information that just the dirent that you can get from reading the directory.
But, you can't do any of that with zfs, nfs, or others.
I've almost never wanted to read directories to get the dirents from them, even while debugging. They are at best, mildly interesting.
Adding another security knob to allow this for a niche use-case seems like overkill. Meh. The patch is fine.

pdp-7 unix seemed to allow reading directories, but they were weird, special things there so I'm unsure (my pdp-7 assembler sucks).
1st Edition's sources are lost, mostly. The kernel allows it. The reconstructed sources from 2nd or 3rd edition read it though.
V6 to V7 changed the filesystem format, and should have been a warning, but reading directories weren't materially changed.
4.1b BSD introduced readdir because of UFS. UFS broke all directory reading programs in 1983. ls, du, find, etc all had to be rewritten. readdir() and friends were introduced here.
SysVr3 picked up readdir() in 1987 for the AT&T fork of Unix. SysVr4 updated all the directory reading programs in 1988 because different filesystem types were introduced.
In the 90s, these interfaces became completely ubiquitous as PDP-11s running V7 faded from view and all the folks that initially started on V7 upgraded to SysV.
Linux never supported this (though I've not done the software archeology to check) because it has always had a pathological diversity of filesystems.

No normal person has needed this functionality for longer than most people using BSD have been alive (BSD already had readdir when I started using it in 84). No program has really wanted to use this in a long time and there are more bugs from allowing the read and programs using the dirents as data (Eg .foorc commands) than people have saved systems by doing this...
It's mildly useful, sometimes, for debugging. the dirents are kinda interesting. Not interesting enough, imho, to allow a hole in the security perimeter.
However, if we have a hole, that only root can set, it's no worse than forcing root to go through fsdb or something.

This revision is now accepted and ready to land.May 15 2020, 3:10 PM
bdrewery added inline comments.May 15 2020, 3:18 PM
sys/kern/vfs_vnops.c
1171 ↗(On Diff #71727)

So are we saying the style should be this?

code_thing_A;
code_thing_B;

/*
  * Comment describing code doing thing C
  */
code_thing_C;

code_thing_D;
code_thing_E;

I can add an example of this to style(9) if that's what you're saying. The blank line rules always confused me.

I will likely commit this in three parts if we can all agree on this:

1.) ZFS rejection of read(2) of a dirfd
2.) sysctl without the priv check
3.) add the priv check

I think it would be a good idea to MFC just the sysctl (default flipped to ON to not violate POLA) so that there's the option on stable/12 to turn the behavior off in some form for those that don't want it.

kib added inline comments.May 15 2020, 3:46 PM
sys/kern/vfs_vnops.c
1171 ↗(On Diff #71727)

Yes, I definitely was told to use this style very early. I do not remember, was it bde or alc.

The corner cases are the following:

statement;
something (if/for...) {
      /*
        * Multi-line, no blank line before.
        */
        statement;

and

     statement;

     /*
       * Multi-line.
       */
}
statement;

so no blank line after '{ and multi-line, and no blank line after multi-line if next is '}'.
I think sys/vm is more or less consistent in this rule, at least.

phk accepted this revision.May 15 2020, 5:09 PM
rgrimes requested changes to this revision.May 15 2020, 7:41 PM

I do not like the undocumented (per summary anyway) KASSERT added for writes.

sys/kern/vfs_vnops.c
1179 ↗(On Diff #71826)

This is not described in the summary of what you are changing. I really fail why we need to panic the system if we reach this point, just return an error and prevent the write.

This revision now requires changes to proceed.May 15 2020, 7:41 PM
kevans added inline comments.May 15 2020, 7:43 PM
sys/kern/vfs_vnops.c
1179 ↗(On Diff #71826)

Ah, I'll mention it in the commit message... this isn't a situation that can reasonably happen, as it should be prevented higher up because you cannot open a directory writable. If we've reached this point with a directory/write, then something has gone horribly terribly wrong and it must be debugged.

mckusick accepted this revision.May 15 2020, 10:41 PM

I believe that this change is correct as it stands.

sys/kern/vfs_vnops.c
1179 ↗(On Diff #71826)

I agree with Kyle. Getting here with a write is a kernel logic error that should be debugged.

kevans edited the summary of this revision. (Show Details)May 16 2020, 12:12 AM

Now that I have had time to go play with this on a "Linux" system, I am afraid I object to this change. Your doing it because you don't like the occasional splat of "foo is a binary file" when you run a grep *, you should see the results when EISDIR is returned. These types of greps on a linux system return an error message for every directory in the directory you ran the grep in, far more noise than your small issue, and for many of us this is gona be a PITA/POLA change. I hadnt thought much about it cause I alias 'grep' 'grep -d skip' and do not see your issue or my issue normally on linux.

I think to rectifier your issue you should probably do the same I have done...

Now that I have had time to go play with this on a "Linux" system, I am afraid I object to this change. Your doing it because you don't like the occasional splat of "foo is a binary file" when you run a grep *, you should see the results when EISDIR is returned. These types of greps on a linux system return an error message for every directory in the directory you ran the grep in, far more noise than your small issue, and for many of us this is gona be a PITA/POLA change. I hadnt thought much about it cause I alias 'grep' 'grep -d skip' and do not see your issue or my issue normally on linux.

I think to rectifier your issue you should probably do the same I have done...

I'm afraid I have no idea what you're talking about here...grep isn't my problem, my problems are:

  1. The security implications of being able to read(2) a dirfd,
  2. Consistency with how most modern kernels behave,

2a) the amount of time I waste hunting down application bugs because of that

There was probably a 3, but it's late and a bit worn down from recent discussions.

In D24596#547205, @imp wrote:

Probably grep(1) should be updated after this change. It has -d option to specify which action to to take on processing directories. From manpage:

-d ACTION, --directories=ACTION
       If an input file is a directory, use ACTION to process it.  By
       default, ACTION is read, which means that directories are read
       just as if they were ordinary files.  If ACTION is skip,
       directories are silently skipped.  If ACTION is recurse, grep
       reads all files under each directory, recursively; this is
       equivalent to the -r option.

grep doesn't actually read directories for anything useful (and can't for ZFS), so this change is a nop as far as GREP is concerned.

Thats actually wrong, grep shall now spew forth tons of EISDIR errors when run after this change and a grep * is done.

Now that I have had time to go play with this on a "Linux" system, I am afraid I object to this change. Your doing it because you don't like the occasional splat of "foo is a binary file" when you run a grep *, you should see the results when EISDIR is returned. These types of greps on a linux system return an error message for every directory in the directory you ran the grep in, far more noise than your small issue, and for many of us this is gona be a PITA/POLA change. I hadnt thought much about it cause I alias 'grep' 'grep -d skip' and do not see your issue or my issue normally on linux.

I think to rectifier your issue you should probably do the same I have done...

I'm afraid I have no idea what you're talking about here...grep isn't my problem, my problems are:

  1. The security implications of being able to read(2) a dirfd,
  2. Consistency with how most modern kernels behave,

2a) the amount of time I waste hunting down application bugs because of that

There was probably a 3, but it's late and a bit worn down from recent discussions.

I can't find it in the bugzilla, so it must of been on the mail list, and I attributed it to you, which I guess it was not. But this change has a very visible change for users of grep *.

I can't find it in the bugzilla, so it must of been on the mail list, and I attributed it to you, which I guess it was not. But this change has a very visible change for users of grep *.

I now vaguely recall responding to someone on the list about this- yes, I've found it annoying in the past on UFS when grep's called a directory a file, but this wasn't and hasn't been a consideration in this policy change.

pi added a subscriber: pi.May 18 2020, 5:34 AM

These types of greps on a linux system return an error message for every directory in the directory you ran the grep in

So if we modify grep to not try to read directories and not spew errors, would that be a solution ?

Or is there some POSIX or similar rule that requires those errors 8-} ?

As for the impact on grep itself- yes, it does, but it's not clear to me that it's a negative impact.

  • When you're grepping *, it seems unlikely that you're trying to find names that exist within directories in cwd.
  • If you are trying to find names that exist, I would argue that grep * is the wrong tool for this anyways, as it won't work on many filesystems

In either case, you're getting mixed results right now. You're either wanting matches within files and spending time grepping directory entries, or you're wanting directory entries and wasting time matching files.

In D24596#547800, @pi wrote:

These types of greps on a linux system return an error message for every directory in the directory you ran the grep in

So if we modify grep to not try to read directories and not spew errors, would that be a solution ?

It wouldn't even take much modification, just changing the default to be -d skip.

Or is there some POSIX or similar rule that requires those errors 8-} ?

POSIX allows implementations to do what they want with read() on a directory and grep's behaviour follows from that, so I think it's within the spirit for us to show errors or not, as we see fit.

In D24596#547800, @pi wrote:

These types of greps on a linux system return an error message for every directory in the directory you ran the grep in

So if we modify grep to not try to read directories and not spew errors, would that be a solution ?

It wouldn't even take much modification, just changing the default to be -d skip.

Or is there some POSIX or similar rule that requires those errors 8-} ?

POSIX allows implementations to do what they want with read() on a directory and grep's behaviour follows from that, so I think it's within the spirit for us to show errors or not, as we see fit.

Please do not change the default, that would just make our greps defaults different than linux and everyone else. Just "recommend" the use of grep -d skip in the UPDATE entry, the RELEASE NOTES, and perhaps by adding t as an example, aka commented out, in /root/csh.cshrc and /root/.profile, and perhaps in the usr/share/skel files.

In D24596#547800, @pi wrote:

These types of greps on a linux system return an error message for every directory in the directory you ran the grep in

So if we modify grep to not try to read directories and not spew errors, would that be a solution ?

It wouldn't even take much modification, just changing the default to be -d skip.

Or is there some POSIX or similar rule that requires those errors 8-} ?

POSIX allows implementations to do what they want with read() on a directory and grep's behaviour follows from that, so I think it's within the spirit for us to show errors or not, as we see fit.

Please do not change the default, that would just make our greps defaults different than linux and everyone else. Just "recommend" the use of grep -d skip in the UPDATE entry, the RELEASE NOTES, and perhaps by adding t as an example, aka commented out, in /root/csh.cshrc and /root/.profile, and perhaps in the usr/share/skel files.

That seems reasonable, though I do wonder if -d read as a default really makes sense anywhere else. The only linux filesystem I'm aware of that allows you to read() directories, I suspect a grep user wouldn't want to include that (directory statistics) by default and otherwise they're just getting lots of errors. OTOH, the error's a nice indicator that maybe you meant to recurse instead.

kevans updated this revision to Diff 72032.May 20 2020, 4:55 PM

Hopefully a nearly-final draft, include example aliases for grep to default to -d skip in .cshrc and .shrc as well as the UPDATING/RELNOTES notices and revisions that will happen.

This also now includes the small amount of priv(9) bits needed to allow a MAC module to grant PRIV_VFS_READ_DIR to jailed root.

delphij requested changes to this revision.May 20 2020, 5:45 PM

POSIX allows implementations to do what they want with read() on a directory and grep's behaviour follows from that, so I think it's within the spirit for us to show errors or not, as we see fit.

Please do not change the default, that would just make our greps defaults different than linux and everyone else. Just "recommend" the use of grep -d skip in the UPDATE entry, the RELEASE NOTES, and perhaps by adding t as an example, aka commented out, in /root/csh.cshrc and /root/.profile, and perhaps in the usr/share/skel files.

Can we please, please, please refrain from turning /usr/share/skel files into an encyclopedia describing every corner cases of command line utilities when they are already properly documented? The user do not need this information, even if they do, they are not going to read the skeleton but to read manual page.

By the way, bsdgrep doesn't really give any error when grep'ing directory. tmpfs already returns EISDIR when reading, and here is my test:

$ /usr/bin/grep . /tmp
$ /usr/local/bin/grep . /tmp
grep: /tmp: Is a directory

(the latter is GNU grep).

This revision now requires changes to proceed.May 20 2020, 5:45 PM
rgrimes accepted this revision.May 20 2020, 5:49 PM

Works for me

POSIX allows implementations to do what they want with read() on a directory and grep's behaviour follows from that, so I think it's within the spirit for us to show errors or not, as we see fit.

Please do not change the default, that would just make our greps defaults different than linux and everyone else. Just "recommend" the use of grep -d skip in the UPDATE entry, the RELEASE NOTES, and perhaps by adding t as an example, aka commented out, in /root/csh.cshrc and /root/.profile, and perhaps in the usr/share/skel files.

Can we please, please, please refrain from turning /usr/share/skel files into an encyclopedia describing every corner cases of command line utilities when they are already properly documented? The user do not need this information, even if they do, they are not going to read the skeleton but to read manual page.

I can remove this pre-commit if consensus moves in that direction.

By the way, bsdgrep doesn't really give any error when grep'ing directory. tmpfs already returns EISDIR when reading, and here is my test:

$ /usr/bin/grep . /tmp
$ /usr/local/bin/grep . /tmp
grep: /tmp: Is a directory

(the latter is GNU grep).

Something odd must be going on here, because it definitely does warn:

root@viper:/usr/src/bin/sh# grep -V
grep (BSD grep) 2.6.0-FreeBSD
root@viper:/usr/src/bin/sh# grep 'test' .
grep: .: Is a directory

cem added a comment.May 20 2020, 6:00 PM

By the way, bsdgrep doesn't really give any error when grep'ing directory. tmpfs already returns EISDIR when reading, and here is my test:

$ /usr/bin/grep . /tmp
$ /usr/local/bin/grep . /tmp
grep: /tmp: Is a directory

(the latter is GNU grep).

Something odd must be going on here, because it definitely does warn:

root@viper:/usr/src/bin/sh# grep -V
grep (BSD grep) 2.6.0-FreeBSD
root@viper:/usr/src/bin/sh# grep 'test' .
grep: .: Is a directory

Ambient GREP_OPTIONS environment variable?

I can remove this pre-commit if consensus moves in that direction.

Even if it's getting included I would commit it separately.

Are there any other outstanding concerns that haven't been addressed? I think consensus is that the skel/ changes will simply never get committed.

This revision was not accepted when it landed; it landed in state Needs Revision.Jun 4 2020, 6:10 PM
This revision was automatically updated to reflect the committed changes.