Page MenuHomeFreeBSD

[1/3] Add mostly Linux-compatible file sealing support
ClosedPublic

Authored by kevans on Aug 24 2019, 4:28 AM.

Details

Summary

File sealing applies protections against certain actions (currently: write, growth, shrink) at the inode level. New fileops are added to accommodate seals - EINVAL is returned by fcntl(2) if they are not implemented.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kevans created this revision.Aug 24 2019, 4:28 AM
bcr accepted this revision as: manpages.Aug 24 2019, 1:26 PM
bcr added a subscriber: bcr.

OK from manpages.

0mp added a subscriber: 0mp.Aug 24 2019, 1:32 PM

OK form manpages (2)

kevans updated this revision to Diff 61231.Aug 24 2019, 3:51 PM

Fix the design, following discussion with kib... key points:

  • Adding to struct file is a lot of overhead, and the wrong place
  • Sealing should be per-inode; requires filesystem support
  • Push sealing flags down into the shmfd, provide fileops for interacting with the flags

I've also amended ftruncate(2) manpage to indicate that it can return ENOTCAPABLE.

kevans updated this revision to Diff 61234.Aug 24 2019, 4:48 PM

Address issue that came up while writing tests (will ride in with memfd_create): F_SEAL_SHRINK | F_SEAL_GROW allows ftruncate(fd, n) where n == current size. The last iteration broke the CAP_FTRUNCATE revocation anyways when the seals moved out of struct file. Formalize this.

kib added inline comments.Aug 25 2019, 3:19 PM
sys/kern/sys_generic.c
614 ↗(On Diff #61234)

Why do we need to expose seals to the file level ? I would prefer, and actually think that it is required, that seals handling should be done by the file type itself. In other words, it should be shmfd business,

kevans added inline comments.Aug 25 2019, 7:02 PM
sys/kern/sys_generic.c
614 ↗(On Diff #61234)

Whoops. I would think you're correct... if we do something like:

int fd, fdx;
char buf[8];

fd = memfd_create("...", MFD_ALLOW_SEALING);
fdx = dup(fd);

fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE | F_SEAL_SEAL);
write(fdx, buf, sizeof(buf));

I would expect the write to fdx to fail because we're supposed to apply seals at the inode level, so revoking CAP_WRITE from fd isn't sufficient and likewise for ftruncate bits... I'll push it down a bit further into shm_dotruncate and shm_write and make it all return EPERM like Linux implementation.

emaste added a subscriber: emaste.Aug 26 2019, 12:52 AM
kevans updated this revision to Diff 61278.Aug 26 2019, 1:53 AM

Push sealing further into filesystem

markj added inline comments.Aug 26 2019, 3:35 PM
lib/libc/sys/fcntl.2
187 ↗(On Diff #61278)

It looks like they are now associated with the backing file description, rather than the descriptor.

sys/kern/kern_descrip.c
49 ↗(On Diff #61278)

Do we still need this include?

sys/sys/fcntl.h
259 ↗(On Diff #61278)

Missing close paren and period.

kib added inline comments.Aug 27 2019, 1:46 PM
lib/libc/sys/truncate.2
148 ↗(On Diff #61278)

Is this addition stil relevant ?

sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

This is weird. IMO it should be checked (atomically) in fo_add_seals().

Wouldn't it allow a race where one thread already set F_SEAL_SEAL while other thread did not see it after fo_get_seals() but fo_add_seals() cannot proceed ?

In general, what are the desired atomicity requirements for interaction between seals and in-flight ops ? E.g. I believe that the current arrangement allows for write(2) to verifiable process while other thread sets the seal on shmfd. Do we declare that as a user race, or do we want to ensure that no write can be in progress after set seal for write fcntl returned ?

sys/sys/file.h
448 ↗(On Diff #61278)

Why this is needed ?

kevans added inline comments.Aug 27 2019, 2:10 PM
sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

Hmm... yeah, true- I'll whittle this down to fo_add_seals() alone and deal with F_SEAL_SEAL in fo_add_seals.

Linux manpage only seems to guarantee that the seals are enforced once fcntl(..., F_ADD_SEALS) has successfully returned: "Once this call succeeds, the seals are enforced by the kernel immediately." -> I interpret this as deeming it a user race. I think most users of seals would tend towards creating the file, operating on it, then sealing it before passing it around.

kib added inline comments.Aug 27 2019, 2:14 PM
sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

I would be very skeptical to infer anything from linux manpage, esp. such fine details. If somebody could read the implementation, or ask authors of the API about the intent ...

I understand that this might be too much to ask.

jilles added a subscriber: jilles.Aug 27 2019, 7:09 PM
jilles added inline comments.
sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

As far as I understand, the intent is to avoid the need for userland to implement such complicated things as the Xorg server's busfault (see https://lists.x.org/archives/xorg-devel/2013-November/038598.html ). This is further described in the NOTES section of http://man7.org/linux/man-pages/man2/memfd_create.2.html . Also note the extra properties of F_SEAL_WRITE in http://man7.org/linux/man-pages/man2/fcntl.2.html : the sealing call fails if a writable MAP_SHARED mapping exists, and aborts any asynchronous I/O.

As such, any conflicting modification visible after a successful F_ADD_SEALS or F_GET_SEALS might lead to a security hole.

However, when adding seals, this still leaves the kernel the choice of waiting for the conflicting modification to end, synchronously cancelling it or denying the seal change.

Atomicity on the F_SEAL_SEAL bit seems less important.

kevans updated this revision to Diff 61636.Sep 4 2019, 4:32 AM

Take another stab at it:

  • Slap some sx(9) around; slock while we're doing ftruncate/mmap/write/F_GET_SEALS, xlock while we're setting seals
  • Take advantage of posixshm now tracking writeable mappings to reject with EBUSY.
  • Remove some now-unrelated cruft.
kevans edited the summary of this revision. (Show Details)Sep 4 2019, 4:37 AM
kib added a comment.Sep 5 2019, 8:35 PM

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

lib/libc/sys/fcntl.2
574 ↗(On Diff #61636)

mappings *of* the file ?

sys/kern/uipc_shm.c
573 ↗(On Diff #61636)

Traditionally rv denotes Mach error. Please use error there.

575 ↗(On Diff #61636)

Can you move object locking there as well ?

1231 ↗(On Diff #61636)

This lock is only useful because reading of writemappings could be non-atomic on 32bit arches, right ?

1251 ↗(On Diff #61636)

I think there is no point is taking the lock there.

In D21391#469159, @kib wrote:

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

I think my only concern here is in shm_write; I think blocking F_ADD_SEALS until completion is a good idea, but that would require either:

  • rangelock_rlock wrapping the rangelock_wlock, which sounds bad (though I know very little about rangelocks), or
  • always wlock from 0 to OFF_MAX, but that seems less than ideal

Unless you mean to add a separate rangelock for it -- which seems to defeat the purpose.

sys/kern/uipc_shm.c
1231 ↗(On Diff #61636)

This was my theory at least, yeah.

1251 ↗(On Diff #61636)

This one I was attempting to guarantee that any pending F_ADD_SEALS had finished completion prior to fetching the current seal. This may have been a foolish prospect, though, as there could be at least a couple of problems with that.

kib added a comment.Sep 6 2019, 7:59 AM
In D21391#469159, @kib wrote:

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

I think my only concern here is in shm_write; I think blocking F_ADD_SEALS until completion is a good idea, but that would require either:

  • rangelock_rlock wrapping the rangelock_wlock, which sounds bad (though I know very little about rangelocks), or
  • always wlock from 0 to OFF_MAX, but that seems less than ideal

Unless you mean to add a separate rangelock for it -- which seems to defeat the purpose.

Why do you need aditional rlock ? R and W rangelocks conflict if the ranges overlap, similarly W and W conflict on overlapping ranges. If you W-lock [0, MAX) around manipulations of shm_seals, then you should get the same exclusion modes as for shm_seal_sx. In particular, modifications to shm_seals would wait until all current writers go out of the rangelock.

The only bit that is currently missed is the range-locked truncate. But in fact this is the bug without seals..

In D21391#469284, @kib wrote:
In D21391#469159, @kib wrote:

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

I think my only concern here is in shm_write; I think blocking F_ADD_SEALS until completion is a good idea, but that would require either:

  • rangelock_rlock wrapping the rangelock_wlock, which sounds bad (though I know very little about rangelocks), or
  • always wlock from 0 to OFF_MAX, but that seems less than ideal

Unless you mean to add a separate rangelock for it -- which seems to defeat the purpose.

Why do you need aditional rlock ? R and W rangelocks conflict if the ranges overlap, similarly W and W conflict on overlapping ranges. If you W-lock [0, MAX) around manipulations of shm_seals, then you should get the same exclusion modes as for shm_seal_sx. In particular, modifications to shm_seals would wait until all current writers go out of the rangelock.

Ahhh, sorry, I missed that. So I'd just move the seal checking into wlocked section of write and this would be sufficient for my needs.

The only bit that is currently missed is the range-locked truncate. But in fact this is the bug without seals..

Hmm... It's at least relatively safe, since it wlocks the vm obj and other reader/writers do as well. Is ftruncate safe if we have writemappings?

kib added a comment.Sep 6 2019, 12:40 PM
In D21391#469284, @kib wrote:

The only bit that is currently missed is the range-locked truncate. But in fact this is the bug without seals..

Hmm... It's at least relatively safe, since it wlocks the vm obj and other reader/writers do as well. Is ftruncate safe if we have writemappings?

It is safe in the sense that kernel should not crash when parallel write(2) and ftruncate(2) are performed on shmfd, this is the guarantees that are provided by owning the vm object lock.

OTOH, user might see inconsistent reads or writes content if truncate is performed simultaneously.

kevans updated this revision to Diff 62013.Sep 12 2019, 11:11 PM
kevans marked 10 inline comments as done.
  • Rename rv to error
  • Move object locking out of shm_dotruncate_locked
  • Remove needless lock from shm_get_seals; there will be a race in userland anyways, this probably doesn't need to happen.
  • Strip the shm seal sx, use rangelocking instead.
kib accepted this revision.Sep 13 2019, 5:44 AM
kib added inline comments.
sys/kern/uipc_shm.c
1194 ↗(On Diff #62013)

I think you should note that we unmap i.e. decrement writemappings without taking a range lock, so the object lock there is to avoid torn reads on ILP32 arches.

This revision is now accepted and ready to land.Sep 13 2019, 5:44 AM
kevans updated this revision to Diff 62070.Sep 13 2019, 9:47 PM

Revise comment to indicate why we care about the VM_OBJECT_RLOCK

This revision now requires review to proceed.Sep 13 2019, 9:47 PM
kib accepted this revision.Sep 14 2019, 9:03 AM
kib added inline comments.
sys/kern/uipc_shm.c
433 ↗(On Diff #62070)

I do not think this blank line is useful.

This revision is now accepted and ready to land.Sep 14 2019, 9:03 AM
markj added inline comments.Mon, Sep 23, 8:49 PM
sys/kern/uipc_shm.c
1202 ↗(On Diff #62070)

Did you mean to also check shm_kmappings here? They are not counted in the object.

kevans marked 2 inline comments as done.Mon, Sep 23, 8:59 PM
kevans added inline comments.
sys/kern/uipc_shm.c
1202 ↗(On Diff #62070)

kib suggested elsewhere that we should ignore kernel mappings for sealing purposes, so I backed down on checking it.

markj accepted this revision.Mon, Sep 23, 9:01 PM
This revision was automatically updated to reflect the committed changes.