Page MenuHomeFreeBSD

linuxolator: implement memfd_create syscall

Authored by kevans on Sep 30 2019, 2:07 AM.



This effectively mirrors our libc implementation, but with minor fudging -- name needs to be copied in from userspace, so we just copy it straight into stack-allocated memfd_name into the correct position rather than allocating memory that needs to be cleaned up.

The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have also been implemented now that we support them.

Suggestions on reducing the duplication would be nice... all of these constants match, and I would expect that Linux will never break that.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I dislike the reliance on equal FreeBSD and Linux flags values. Please use LINUX_MFD_XXX constants in linuxolator. You may add _Static_asserts that they are equal and avoid translating the current flags, but I do not think that it is sustainable in long run.

I waffled on things like this, and I think I ended up with a solution like this...

Generalize translation of 32-bit bit fields for Linux <-> BSD. Push these into a new structure that can describe a wide array of situations (ranging from these are mutually exclusive O_ACCMODE fields to more simplistic single-bit values). This approach was inspired by similar stuff that we (theoretically) do in qemu, but that we don't do in practice.

This gives us a different way to express these mappings that's perhaps a little less error-prone, as the same data is used in both directions of the translation. It also lets us de-clutter some sections of translations that, to me, distract from the implementation.

Some of the follow-up to move other stuff to this can be found in this diff: -- there are likely other spots that could see some benefit.

(Edit to add)
Note that this patch is a WIP... It doesn't yet observe the proper semantics for, e.g., enumerated values rather than actual bit masks.

I think you need to provide some explanation how your l/b bits functions are supposed to work.

138 ↗(On Diff #62772)

Why do you need U suffix ?

156 ↗(On Diff #62772)

I think '34' above actually needs the U suffix since the shift moves into the sign bit, you get undef behavior otherwise.

138 ↗(On Diff #62772)

Admittedly, I took the definitions as-is from the Linux headers for this where they are equally as pointless.

156 ↗(On Diff #62772)

Whoops, will fix.

kevans marked 2 inline comments as done.
kevans added a reviewer: trasz.

Address review comments:

  • Fix unsigned placement
  • Add some commentary on the BSD/Linux conversion stuff
  • Adding trasz@ while I'm here, since he's been working on Linux compat bits lately

Did you have a chance to test it with LTP (the linux-c7-ltp package)?

Did you have a chance to test it with LTP (the linux-c7-ltp package)?

I tried it after this comment, and I've committed some fixes for problems unearthed.

The current one that I'm struggling with is that Linux apparently expects to be able to dup an fd via fdescfs and truncate it if O_TRUNC was specified. It seems like this would be easily accomplished in dupfdopen, but I don't know if that behavior's sensible for native applications, too.

Just go ahead, I do not think we can make linuxolator worse by this.

This revision is now accepted and ready to land.Jun 9 2020, 10:43 PM
In D21845#555628, @kib wrote:

Just go ahead, I do not think we can make linuxolator worse by this.

Any chance you have a list of the crappery you've experienced/noticed?

I do wonder whether some of the corner cases that LTP has pointed out here are actually used aspects in practice.

LGTM. Even if something turned out to be suboptimal, it's easier to test it once it's in the tree.

This revision was automatically updated to reflect the committed changes.