Page MenuHomeFreeBSD

[1/3] Add mostly Linux-compatible file sealing support
ClosedPublic

Authored by kevans on Aug 24 2019, 4:28 AM.
Tags
None
Referenced Files
F106101549: D21391.diff
Wed, Dec 25, 10:27 AM
Unknown Object (File)
Mon, Dec 2, 4:48 AM
Unknown Object (File)
Thu, Nov 28, 4:35 PM
Unknown Object (File)
Nov 20 2024, 3:04 PM
Unknown Object (File)
Nov 6 2024, 10:37 AM
Unknown Object (File)
Nov 3 2024, 10:48 PM
Unknown Object (File)
Oct 26 2024, 12:34 PM
Unknown Object (File)
Oct 2 2024, 3:05 AM

Details

Summary

File sealing applies protections against certain actions (currently: write, growth, shrink) at the inode level. New fileops are added to accommodate seals - EINVAL is returned by fcntl(2) if they are not implemented.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bcr added a subscriber: bcr.

OK from manpages.

Fix the design, following discussion with kib... key points:

  • Adding to struct file is a lot of overhead, and the wrong place
  • Sealing should be per-inode; requires filesystem support
  • Push sealing flags down into the shmfd, provide fileops for interacting with the flags

I've also amended ftruncate(2) manpage to indicate that it can return ENOTCAPABLE.

Address issue that came up while writing tests (will ride in with memfd_create): F_SEAL_SHRINK | F_SEAL_GROW allows ftruncate(fd, n) where n == current size. The last iteration broke the CAP_FTRUNCATE revocation anyways when the seals moved out of struct file. Formalize this.

sys/kern/sys_generic.c
614 ↗(On Diff #61234)

Why do we need to expose seals to the file level ? I would prefer, and actually think that it is required, that seals handling should be done by the file type itself. In other words, it should be shmfd business,

sys/kern/sys_generic.c
614 ↗(On Diff #61234)

Whoops. I would think you're correct... if we do something like:

int fd, fdx;
char buf[8];

fd = memfd_create("...", MFD_ALLOW_SEALING);
fdx = dup(fd);

fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE | F_SEAL_SEAL);
write(fdx, buf, sizeof(buf));

I would expect the write to fdx to fail because we're supposed to apply seals at the inode level, so revoking CAP_WRITE from fd isn't sufficient and likewise for ftruncate bits... I'll push it down a bit further into shm_dotruncate and shm_write and make it all return EPERM like Linux implementation.

Push sealing further into filesystem

lib/libc/sys/fcntl.2
187 ↗(On Diff #61278)

It looks like they are now associated with the backing file description, rather than the descriptor.

sys/kern/kern_descrip.c
49 ↗(On Diff #61278)

Do we still need this include?

sys/sys/fcntl.h
259 ↗(On Diff #61278)

Missing close paren and period.

lib/libc/sys/truncate.2
148 ↗(On Diff #61278)

Is this addition stil relevant ?

sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

This is weird. IMO it should be checked (atomically) in fo_add_seals().

Wouldn't it allow a race where one thread already set F_SEAL_SEAL while other thread did not see it after fo_get_seals() but fo_add_seals() cannot proceed ?

In general, what are the desired atomicity requirements for interaction between seals and in-flight ops ? E.g. I believe that the current arrangement allows for write(2) to verifiable process while other thread sets the seal on shmfd. Do we declare that as a user race, or do we want to ensure that no write can be in progress after set seal for write fcntl returned ?

sys/sys/file.h
448 ↗(On Diff #61278)

Why this is needed ?

sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

Hmm... yeah, true- I'll whittle this down to fo_add_seals() alone and deal with F_SEAL_SEAL in fo_add_seals.

Linux manpage only seems to guarantee that the seals are enforced once fcntl(..., F_ADD_SEALS) has successfully returned: "Once this call succeeds, the seals are enforced by the kernel immediately." -> I interpret this as deeming it a user race. I think most users of seals would tend towards creating the file, operating on it, then sealing it before passing it around.

sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

I would be very skeptical to infer anything from linux manpage, esp. such fine details. If somebody could read the implementation, or ask authors of the API about the intent ...

I understand that this might be too much to ask.

jilles added inline comments.
sys/kern/kern_descrip.c
768 ↗(On Diff #61278)

As far as I understand, the intent is to avoid the need for userland to implement such complicated things as the Xorg server's busfault (see https://lists.x.org/archives/xorg-devel/2013-November/038598.html ). This is further described in the NOTES section of http://man7.org/linux/man-pages/man2/memfd_create.2.html . Also note the extra properties of F_SEAL_WRITE in http://man7.org/linux/man-pages/man2/fcntl.2.html : the sealing call fails if a writable MAP_SHARED mapping exists, and aborts any asynchronous I/O.

As such, any conflicting modification visible after a successful F_ADD_SEALS or F_GET_SEALS might lead to a security hole.

However, when adding seals, this still leaves the kernel the choice of waiting for the conflicting modification to end, synchronously cancelling it or denying the seal change.

Atomicity on the F_SEAL_SEAL bit seems less important.

Take another stab at it:

  • Slap some sx(9) around; slock while we're doing ftruncate/mmap/write/F_GET_SEALS, xlock while we're setting seals
  • Take advantage of posixshm now tracking writeable mappings to reject with EBUSY.
  • Remove some now-unrelated cruft.

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

lib/libc/sys/fcntl.2
574 ↗(On Diff #61636)

mappings *of* the file ?

sys/kern/uipc_shm.c
573 ↗(On Diff #61636)

Traditionally rv denotes Mach error. Please use error there.

575 ↗(On Diff #61636)

Can you move object locking there as well ?

1231 ↗(On Diff #61636)

This lock is only useful because reading of writemappings could be non-atomic on 32bit arches, right ?

1251 ↗(On Diff #61636)

I think there is no point is taking the lock there.

In D21391#469159, @kib wrote:

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

I think my only concern here is in shm_write; I think blocking F_ADD_SEALS until completion is a good idea, but that would require either:

  • rangelock_rlock wrapping the rangelock_wlock, which sounds bad (though I know very little about rangelocks), or
  • always wlock from 0 to OFF_MAX, but that seems less than ideal

Unless you mean to add a separate rangelock for it -- which seems to defeat the purpose.

sys/kern/uipc_shm.c
1231 ↗(On Diff #61636)

This was my theory at least, yeah.

1251 ↗(On Diff #61636)

This one I was attempting to guarantee that any pending F_ADD_SEALS had finished completion prior to fetching the current seal. This may have been a foolish prospect, though, as there could be at least a couple of problems with that.

In D21391#469159, @kib wrote:

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

I think my only concern here is in shm_write; I think blocking F_ADD_SEALS until completion is a good idea, but that would require either:

  • rangelock_rlock wrapping the rangelock_wlock, which sounds bad (though I know very little about rangelocks), or
  • always wlock from 0 to OFF_MAX, but that seems less than ideal

Unless you mean to add a separate rangelock for it -- which seems to defeat the purpose.

Why do you need aditional rlock ? R and W rangelocks conflict if the ranges overlap, similarly W and W conflict on overlapping ranges. If you W-lock [0, MAX) around manipulations of shm_seals, then you should get the same exclusion modes as for shm_seal_sx. In particular, modifications to shm_seals would wait until all current writers go out of the rangelock.

The only bit that is currently missed is the range-locked truncate. But in fact this is the bug without seals..

In D21391#469284, @kib wrote:
In D21391#469159, @kib wrote:

Is it possible to replace shm_seal_sx with rangelocking ? E.g. instead of xlock you would take write rangelock for (0, max). I suspect that you can avoid adding sx this way, but did not thought details.

I think my only concern here is in shm_write; I think blocking F_ADD_SEALS until completion is a good idea, but that would require either:

  • rangelock_rlock wrapping the rangelock_wlock, which sounds bad (though I know very little about rangelocks), or
  • always wlock from 0 to OFF_MAX, but that seems less than ideal

Unless you mean to add a separate rangelock for it -- which seems to defeat the purpose.

Why do you need aditional rlock ? R and W rangelocks conflict if the ranges overlap, similarly W and W conflict on overlapping ranges. If you W-lock [0, MAX) around manipulations of shm_seals, then you should get the same exclusion modes as for shm_seal_sx. In particular, modifications to shm_seals would wait until all current writers go out of the rangelock.

Ahhh, sorry, I missed that. So I'd just move the seal checking into wlocked section of write and this would be sufficient for my needs.

The only bit that is currently missed is the range-locked truncate. But in fact this is the bug without seals..

Hmm... It's at least relatively safe, since it wlocks the vm obj and other reader/writers do as well. Is ftruncate safe if we have writemappings?

In D21391#469284, @kib wrote:

The only bit that is currently missed is the range-locked truncate. But in fact this is the bug without seals..

Hmm... It's at least relatively safe, since it wlocks the vm obj and other reader/writers do as well. Is ftruncate safe if we have writemappings?

It is safe in the sense that kernel should not crash when parallel write(2) and ftruncate(2) are performed on shmfd, this is the guarantees that are provided by owning the vm object lock.

OTOH, user might see inconsistent reads or writes content if truncate is performed simultaneously.

kevans marked 10 inline comments as done.
  • Rename rv to error
  • Move object locking out of shm_dotruncate_locked
  • Remove needless lock from shm_get_seals; there will be a race in userland anyways, this probably doesn't need to happen.
  • Strip the shm seal sx, use rangelocking instead.
kib added inline comments.
sys/kern/uipc_shm.c
1194 ↗(On Diff #62013)

I think you should note that we unmap i.e. decrement writemappings without taking a range lock, so the object lock there is to avoid torn reads on ILP32 arches.

This revision is now accepted and ready to land.Sep 13 2019, 5:44 AM

Revise comment to indicate why we care about the VM_OBJECT_RLOCK

This revision now requires review to proceed.Sep 13 2019, 9:47 PM
kib added inline comments.
sys/kern/uipc_shm.c
433 ↗(On Diff #62070)

I do not think this blank line is useful.

This revision is now accepted and ready to land.Sep 14 2019, 9:03 AM
sys/kern/uipc_shm.c
1202 ↗(On Diff #62070)

Did you mean to also check shm_kmappings here? They are not counted in the object.

kevans added inline comments.
sys/kern/uipc_shm.c
1202 ↗(On Diff #62070)

kib suggested elsewhere that we should ignore kernel mappings for sealing purposes, so I backed down on checking it.