Page MenuHomeFreeBSD

[3/3] Add linux-compatible memfd_create
ClosedPublic

Authored by kevans on Aug 24 2019, 4:28 AM.

Details

Summary

memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional CLOEXEC and file sealing support. This is used by some mesa parts, some linux libs, and qemu can also take advantage of it and uses the sealing to prevent resizing the region.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kevans created this revision.Aug 24 2019, 4:28 AM
kevans added inline comments.Aug 24 2019, 1:06 PM
sys/kern/sys_memfd.c
59 ↗(On Diff #61204)

Should return early on error so as to not fdrop stack garbage

kevans updated this revision to Diff 61235.Aug 24 2019, 4:51 PM
  • Fix attempt to fdrop stack garbage if kern_shm_open() failed
  • Add tests for memfd_create; we cover MFD_CLOEXEC, MFD_ALLOW_SEALING, as well as the various F_SEAL_* bits.
kevans updated this revision to Diff 61279.Aug 26 2019, 1:54 AM

Add a test for dup'ing a memfd and sealing the original, to make sure that the seal applies to the pre-seal dup'd fd as well.

brooks added inline comments.Aug 27 2019, 12:06 PM
lib/libc/sys/Symbol.map
743 ↗(On Diff #61279)

Please don't expose these unless there's an actual need.

sys/kern/uipc_shm.c
122 ↗(On Diff #61279)

This would usually go in sys/sys/syscallsubr.h

kevans added inline comments.Aug 27 2019, 1:41 PM
lib/libc/sys/Symbol.map
743 ↗(On Diff #61279)

Ah, sorry- I didn't read https://wiki.freebsd.org/AddingSyscalls closely enough!

sys/kern/uipc_shm.c
122 ↗(On Diff #61279)

I had actually intended to make this static because I'm not sure it would be generally useful elsewhere, given that it's just a wrapper around shm_open. Since you're here, though- should it be made generally available in sys/syscallsubr.h anyways, or do we consider it ok to make kern_* implementations static?

kib added a comment.Aug 27 2019, 1:50 PM

Do we need a new syscall ? Can all the specifics of memfd_create() handled by shmfd_open by some secret flag passed from libc ?

sys/kern/uipc_shm.c
1246 ↗(On Diff #61279)

Why not claim that transparent superpages are even better than HUGETLB ? And then ignore the flag, perhaps defining it to zero.

1259 ↗(On Diff #61279)

This is surprising. Why not allow sealing always ?

kevans added inline comments.Aug 27 2019, 2:01 PM
sys/kern/uipc_shm.c
1246 ↗(On Diff #61279)

It wasn't clear to me from previous discussions around HUGETLB-style flags that transparent superpages are better in all context, and I didn't want to remove the possibility of supporting it later. Most users of this don't seem to request HUGETLB anyways, and I seem to recall seeing one that tested memfd_create with HUGETLB flag and fell back to different implementation if it threw an error -- I'd have to go back and find it again.

1259 ↗(On Diff #61279)

This part wasn't entirely clear, and looking at the manpage again I should redo this and not add FSEALABLE or the struct file ** to kern_shm_open... memfd_create(2) has a note that lack of MFD_ALLOW_SEALING is populating the initial set of seals with F_SEAL_SEAL.

With that detail in mind, I guess !MFD_ALLOW_SEALING is useful if you want to dup it around and ensure that you maintain write access.

brooks added inline comments.Aug 27 2019, 4:22 PM
sys/kern/uipc_shm.c
122 ↗(On Diff #61279)

Just making it static would be fine.

I'll end up making it non-static in my tree, but that's an easy diff to carry.

kevans added inline comments.Aug 27 2019, 6:22 PM
sys/kern/uipc_shm.c
122 ↗(On Diff #61279)

I have no strong feeling either way, so I'll make it non-static and move declaration to syscallsubr.h in the next iteration.

kevans added a comment.Sep 4 2019, 3:15 AM
In D21393#466452, @kib wrote:

Do we need a new syscall ? Can all the specifics of memfd_create() handled by shmfd_open by some secret flag passed from libc ?

Could be- some kind of O_MEMFD that makes shmfd_open honor O_CLOEXEC and another flag equivalent to MFD_ALLOW_SEALING. The only downside is we're almost limited to creating the syscall anyways if they decide to add more flags to memfd_create in a future version, or if we decide to start honoring MFD_HUGETLB to some extent.

kevans updated this revision to Diff 62169.Sep 16 2019, 4:09 PM

Add a shm_open2 that takes an extra flags argument; the only flag currently defiend for it is SHM_ALLOW_SEALING to drop the F_SEAL_SEAL initial seal. The new syscall does not force O_CLOEXEC on the returned fd, care must be taken in callers to do this as needed. New interface should be flexible enough for other shm-like interfaces to use it as well.

shm_open is moved under COMPAT12 and shm_open has been rewritten in libc to proxy through to shm_open2 or legacy shm_open to maintain some form of compatibility with older kernel for a short time. shm_open2 remains undocumented for the time being.

I am completely unsure if I did any of the libc bits correctly. We seem to list _shm_open and __sys_shm_open in the symbol map, so I added weak references for these. I tested:

  • New libc, new kernel with lower than SHM_OPEN2_OSREL
  • New libc, new kernel with SHM_OPEN2_OSREL
  • New libc, old kernel with no shm_open2
kib added inline comments.Tue, Sep 17, 8:44 AM
lib/libc/sys/Symbol.map
920 ↗(On Diff #62169)

There is no point in exporting this symbol. It might be somewhat reasonable to export raw syscall __sys_shm_open2.

lib/libc/sys/shm_open.c
51 ↗(On Diff #62169)

extern keyword is redundant. You can move the prototype to libc_private.h

71 ↗(On Diff #62169)

So may be add an argument to the shm_open2() syscall to pass the name, and do something with it in the future ?

kevans updated this revision to Diff 62227.Tue, Sep 17, 7:16 PM
kevans marked 5 inline comments as done.
  • Tweaked wording -- memfd_create objects *must* (not can) be adjusted via ftruncate, otherwise they're unusable.
  • Described requirements of name a little better -- NULL is not acceptable for memfd_create, but an empty string is ok. "memfd:" will be prepended, so there's an effective limit of 249 characters.
  • A name argument has been added to shm_open2(), and memfd_create will now validate its name, prepend "memfd:", and pass it into the kernel. The Linux manpage doesn't mention EBADF, but experiment shows that name == NULL results in EBADF, so we do the same.
  • Dropped redundant keyword, moved prototype, dropped export.
lib/libc/sys/shm_open.c
71 ↗(On Diff #62169)

Sure, that seems like a better idea than dropping it and later having to hack together something to compensate if it's deemed useful. The latest revision also validates the name and transforms it as described by the Linux manpage for this, prepending "memfd:" to it.

I had considered not bothering dealing with prefixing the name for memfd_create since the kernel's just going to ignore it, but I do not have a good argument against it.

kib accepted this revision.Tue, Sep 17, 8:15 PM
kib added inline comments.
lib/libc/sys/Symbol.map
920 ↗(On Diff #62169)

Still, I would add __sys_shm_open2()

This revision is now accepted and ready to land.Tue, Sep 17, 8:15 PM
jilles added a subscriber: jilles.Tue, Sep 17, 9:00 PM

Very useful, and good generalization of shm_open(SHM_ANON) and memfd_create().

lib/libc/sys/Symbol.map
920 ↗(On Diff #62169)

If it is added at FBSDprivate_1.0 version, it's only for FreeBSD use (not for applications), so it can be added when necessary, and I think it should not be added now.

Exporting shm_open2() at FBSD_1.6 might make sense, but may not be sufficiently useful.

lib/libc/sys/memfd_create.2
46 ↗(On Diff #62227)

Do we care that this is no longer a direct system call but a function?

101–105 ↗(On Diff #62227)

Hmm, do we want to document this copied Linux bug?

kevans added inline comments.Wed, Sep 18, 1:18 AM
lib/libc/sys/memfd_create.2
46 ↗(On Diff #62227)

memfd_create(3) would be reasonable - at one point I was tempted to instead merge the content into shm_open(2) and add shm_open.2 -> memfd_create.3 MLINK. 100 lines of additional content is almost nothing, and these are clearly heavily linked. I do not have strong feelings on move to section 3 vs. merge.

101–105 ↗(On Diff #62227)

I think we have to, unfortunately, or we'll be just as confusing. I contemplated switching it to EINVAL, but EINVAL is already so incredibly overloaded that it's actually somewhat annoying to debug usage.

markj accepted this revision.Mon, Sep 23, 9:21 PM
markj added inline comments.
lib/libc/sys/memfd_create.2
70 ↗(On Diff #62227)

I find the "while currently unused" bit confusing since the rest of the sentence goes on to explain that the kernel will never use it.

84 ↗(On Diff #62227)

Missing "command" at the end of the sentence?

lib/libc/sys/shm_open.c
91 ↗(On Diff #62227)

Style: comments should end with a period.

kevans added inline comments.Tue, Sep 24, 12:16 AM
lib/libc/sys/memfd_create.2
70 ↗(On Diff #62227)

What I was trying to capture is that:

  • It's not currently used for anything
  • The kernel itself can never use the name as an identity for this shmfd, so, for example: conflicts mean nothing, as it's just a label
  • The kernel could eventually expose it to userland (but again, can never use it itself) via procfs/fdescfs or some other mechanism, so a sensible name is a good idea anyways

I'm unsure how to convey this in a concise manner suitable for a manpage, though.

markj added inline comments.Tue, Sep 24, 3:20 PM
lib/libc/sys/memfd_create.2
70 ↗(On Diff #62227)

I think you could just drop "while currently unused." You could perhaps also add, "Names are therefore not required to be unique."

kevans updated this revision to Diff 62537.Wed, Sep 25, 3:15 AM

This version merges the memfd_create content into shm_open(2). A link is created for memfd_create.3 (appropriate now that it's no longer a system call). Rewording suggested by @markj also included, along with style fixes.

This revision now requires review to proceed.Wed, Sep 25, 3:15 AM
kib accepted this revision.Wed, Sep 25, 12:20 PM
kib added inline comments.
lib/libc/sys/shm_open.c
67 ↗(On Diff #62537)

You might still want to add the page for shm_open2(2). For instance, to have complete kernel API documented.

It does not need to be too user-oriented like shm_open(2), more to mention it existence. Then the comment above is naturally placed there.

I do not suggest to block this commit until shm_open2(2) is written, you can do it later.

This revision is now accepted and ready to land.Wed, Sep 25, 12:20 PM
markj accepted this revision.Wed, Sep 25, 3:20 PM
kevans added inline comments.Wed, Sep 25, 5:35 PM
lib/libc/sys/shm_open.c
67 ↗(On Diff #62537)

Sure- will do.

This revision was automatically updated to reflect the committed changes.
kevans marked 4 inline comments as done.