vfs: add restrictions to read(2) of a directory [1/2]
Historically, we've allowed read() of a directory and some filesystems will
accommodate (e.g. ufs/ffs, msdosfs). From the history department staffed by
pdp-7 unix seemed to allow reading directories, but they were weird, special
things there so I'm unsure (my pdp-7 assembler sucks).
1st Edition's sources are lost, mostly. The kernel allows it. The
reconstructed sources from 2nd or 3rd edition read it though.
V6 to V7 changed the filesystem format, and should have been a warning, but
reading directories weren't materially changed.
4.1b BSD introduced readdir because of UFS. UFS broke all directory reading
programs in 1983. ls, du, find, etc all had to be rewritten. readdir() and
friends were introduced here.
SysVr3 picked up readdir() in 1987 for the AT&T fork of Unix. SysVr4 updated
all the directory reading programs in 1988 because different filesystem
types were introduced.
In the 90s, these interfaces became completely ubiquitous as PDP-11s running
V7 faded from view and all the folks that initially started on V7 upgraded
to SysV. Linux never supported this (though I've not done the software
archeology to check) because it has always had a pathological diversity of
Disallowing read(2) on a directory has the side-effect of masking
application bugs from relying on other implementation's behavior
(e.g. Linux) of rejecting these with EISDIR across the board, but allowing
it has been a vector for at least one stack disclosure bug in the past.
By POSIX, this is implementation-defined whether read() handles directories
or not. Popular implementations have chosen to reject them, and this seems
sensible: the data you're reading from a directory is not structured in some
unified way across filesystem implementations like with readdir(2), so it is
impossible for applications to portably rely on this.
With this patch, we will reject most read(2) of a dirfd with EISDIR. Users
that know what they're doing can conscientiously set
bsd.security.allow_read_dir=1 to allow read(2) of directories, as it has
proven useful for debugging or recovery. A future commit will further limit
the sysctl to allow only the system root to read(2) directories, to make it
at least relatively safe to leave on for longer periods of time.
While we're adding logic pertaining to directory vnodes to vn_io_fault, an
additional assertion has also been added to ensure that we're not reaching
vn_io_fault with any write request on a directory vnode. Such request would
be a logical error in the kernel, and must be debugged rather than allowing
it to potentially silently error out.
Commented out shell aliases have been placed in root's chsrc/shrc to promote
awareness that grep may become noisy after this change, depending on your
A tentative MFC plan has been put together to try and make it as trivial as
possible to identify issues and collect reports; note that this will be
strongly re-evaluated. Tentatively, I will MFC this knob with the default as
it is in HEAD to improve our odds of actually getting reports. The future
priv(9) to further restrict the sysctl WILL NOT BE MERGED BACK, so the knob
will be a faithful reversion on stable/12. We will go into the merge
acknowledging that the sysctl default may be flipped back to restore
historical behavior at *any* point if it's warranted.
Reviewed by: mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by: rgrimes (latest version)
MFC after: 1 month (note the MFC plan mentioned above)
Relnotes: absolutely, but will amend previous RELNOTES entry
Differential Revision: https://reviews.freebsd.org/D24596