Page MenuHomeFreeBSD

[3/3] Add linux-compatible memfd_create
ClosedPublic

Authored by kevans on Aug 24 2019, 4:28 AM.
Tags
None
Referenced Files
F80187609: D21393.diff
Thu, Mar 28, 11:18 PM
Unknown Object (File)
Fri, Mar 22, 9:06 PM
Unknown Object (File)
Mon, Mar 4, 3:10 PM
Unknown Object (File)
Mon, Mar 4, 3:10 PM
Unknown Object (File)
Mon, Mar 4, 3:10 PM
Unknown Object (File)
Mon, Mar 4, 3:10 PM
Unknown Object (File)
Mon, Mar 4, 3:10 PM
Unknown Object (File)
Mon, Mar 4, 3:10 PM

Details

Summary

memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional CLOEXEC and file sealing support. This is used by some mesa parts, some linux libs, and qemu can also take advantage of it and uses the sealing to prevent resizing the region.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 26685

Event Timeline

sys/kern/sys_memfd.c
59 ↗(On Diff #61204)

Should return early on error so as to not fdrop stack garbage

  • Fix attempt to fdrop stack garbage if kern_shm_open() failed
  • Add tests for memfd_create; we cover MFD_CLOEXEC, MFD_ALLOW_SEALING, as well as the various F_SEAL_* bits.

Add a test for dup'ing a memfd and sealing the original, to make sure that the seal applies to the pre-seal dup'd fd as well.

lib/libc/sys/Symbol.map
744

Please don't expose these unless there's an actual need.

sys/kern/uipc_shm.c
122

This would usually go in sys/sys/syscallsubr.h

lib/libc/sys/Symbol.map
744

Ah, sorry- I didn't read https://wiki.freebsd.org/AddingSyscalls closely enough!

sys/kern/uipc_shm.c
122

I had actually intended to make this static because I'm not sure it would be generally useful elsewhere, given that it's just a wrapper around shm_open. Since you're here, though- should it be made generally available in sys/syscallsubr.h anyways, or do we consider it ok to make kern_* implementations static?

Do we need a new syscall ? Can all the specifics of memfd_create() handled by shmfd_open by some secret flag passed from libc ?

sys/kern/uipc_shm.c
1334

Why not claim that transparent superpages are even better than HUGETLB ? And then ignore the flag, perhaps defining it to zero.

1347

This is surprising. Why not allow sealing always ?

sys/kern/uipc_shm.c
1334

It wasn't clear to me from previous discussions around HUGETLB-style flags that transparent superpages are better in all context, and I didn't want to remove the possibility of supporting it later. Most users of this don't seem to request HUGETLB anyways, and I seem to recall seeing one that tested memfd_create with HUGETLB flag and fell back to different implementation if it threw an error -- I'd have to go back and find it again.

1347

This part wasn't entirely clear, and looking at the manpage again I should redo this and not add FSEALABLE or the struct file ** to kern_shm_open... memfd_create(2) has a note that lack of MFD_ALLOW_SEALING is populating the initial set of seals with F_SEAL_SEAL.

With that detail in mind, I guess !MFD_ALLOW_SEALING is useful if you want to dup it around and ensure that you maintain write access.

sys/kern/uipc_shm.c
122

Just making it static would be fine.

I'll end up making it non-static in my tree, but that's an easy diff to carry.

sys/kern/uipc_shm.c
122

I have no strong feeling either way, so I'll make it non-static and move declaration to syscallsubr.h in the next iteration.

In D21393#466452, @kib wrote:

Do we need a new syscall ? Can all the specifics of memfd_create() handled by shmfd_open by some secret flag passed from libc ?

Could be- some kind of O_MEMFD that makes shmfd_open honor O_CLOEXEC and another flag equivalent to MFD_ALLOW_SEALING. The only downside is we're almost limited to creating the syscall anyways if they decide to add more flags to memfd_create in a future version, or if we decide to start honoring MFD_HUGETLB to some extent.

Add a shm_open2 that takes an extra flags argument; the only flag currently defiend for it is SHM_ALLOW_SEALING to drop the F_SEAL_SEAL initial seal. The new syscall does not force O_CLOEXEC on the returned fd, care must be taken in callers to do this as needed. New interface should be flexible enough for other shm-like interfaces to use it as well.

shm_open is moved under COMPAT12 and shm_open has been rewritten in libc to proxy through to shm_open2 or legacy shm_open to maintain some form of compatibility with older kernel for a short time. shm_open2 remains undocumented for the time being.

I am completely unsure if I did any of the libc bits correctly. We seem to list _shm_open and __sys_shm_open in the symbol map, so I added weak references for these. I tested:

  • New libc, new kernel with lower than SHM_OPEN2_OSREL
  • New libc, new kernel with SHM_OPEN2_OSREL
  • New libc, old kernel with no shm_open2
lib/libc/sys/Symbol.map
920

There is no point in exporting this symbol. It might be somewhat reasonable to export raw syscall __sys_shm_open2.

lib/libc/sys/shm_open.c
52

extern keyword is redundant. You can move the prototype to libc_private.h

72

So may be add an argument to the shm_open2() syscall to pass the name, and do something with it in the future ?

kevans marked 5 inline comments as done.
  • Tweaked wording -- memfd_create objects *must* (not can) be adjusted via ftruncate, otherwise they're unusable.
  • Described requirements of name a little better -- NULL is not acceptable for memfd_create, but an empty string is ok. "memfd:" will be prepended, so there's an effective limit of 249 characters.
  • A name argument has been added to shm_open2(), and memfd_create will now validate its name, prepend "memfd:", and pass it into the kernel. The Linux manpage doesn't mention EBADF, but experiment shows that name == NULL results in EBADF, so we do the same.
  • Dropped redundant keyword, moved prototype, dropped export.
lib/libc/sys/shm_open.c
72

Sure, that seems like a better idea than dropping it and later having to hack together something to compensate if it's deemed useful. The latest revision also validates the name and transforms it as described by the Linux manpage for this, prepending "memfd:" to it.

I had considered not bothering dealing with prefixing the name for memfd_create since the kernel's just going to ignore it, but I do not have a good argument against it.

kib added inline comments.
lib/libc/sys/Symbol.map
920

Still, I would add __sys_shm_open2()

This revision is now accepted and ready to land.Sep 17 2019, 8:15 PM

Very useful, and good generalization of shm_open(SHM_ANON) and memfd_create().

lib/libc/sys/Symbol.map
920

If it is added at FBSDprivate_1.0 version, it's only for FreeBSD use (not for applications), so it can be added when necessary, and I think it should not be added now.

Exporting shm_open2() at FBSD_1.6 might make sense, but may not be sufficiently useful.

lib/libc/sys/memfd_create.2
46 ↗(On Diff #62227)

Do we care that this is no longer a direct system call but a function?

101–105 ↗(On Diff #62227)

Hmm, do we want to document this copied Linux bug?

lib/libc/sys/memfd_create.2
46 ↗(On Diff #62227)

memfd_create(3) would be reasonable - at one point I was tempted to instead merge the content into shm_open(2) and add shm_open.2 -> memfd_create.3 MLINK. 100 lines of additional content is almost nothing, and these are clearly heavily linked. I do not have strong feelings on move to section 3 vs. merge.

101–105 ↗(On Diff #62227)

I think we have to, unfortunately, or we'll be just as confusing. I contemplated switching it to EINVAL, but EINVAL is already so incredibly overloaded that it's actually somewhat annoying to debug usage.

markj added inline comments.
lib/libc/sys/memfd_create.2
70 ↗(On Diff #62227)

I find the "while currently unused" bit confusing since the rest of the sentence goes on to explain that the kernel will never use it.

84 ↗(On Diff #62227)

Missing "command" at the end of the sentence?

lib/libc/sys/shm_open.c
92

Style: comments should end with a period.

lib/libc/sys/memfd_create.2
70 ↗(On Diff #62227)

What I was trying to capture is that:

  • It's not currently used for anything
  • The kernel itself can never use the name as an identity for this shmfd, so, for example: conflicts mean nothing, as it's just a label
  • The kernel could eventually expose it to userland (but again, can never use it itself) via procfs/fdescfs or some other mechanism, so a sensible name is a good idea anyways

I'm unsure how to convey this in a concise manner suitable for a manpage, though.

lib/libc/sys/memfd_create.2
70 ↗(On Diff #62227)

I think you could just drop "while currently unused." You could perhaps also add, "Names are therefore not required to be unique."

This version merges the memfd_create content into shm_open(2). A link is created for memfd_create.3 (appropriate now that it's no longer a system call). Rewording suggested by @markj also included, along with style fixes.

This revision now requires review to proceed.Sep 25 2019, 3:15 AM
kib added inline comments.
lib/libc/sys/shm_open.c
67

You might still want to add the page for shm_open2(2). For instance, to have complete kernel API documented.

It does not need to be too user-oriented like shm_open(2), more to mention it existence. Then the comment above is naturally placed there.

I do not suggest to block this commit until shm_open2(2) is written, you can do it later.

This revision is now accepted and ready to land.Sep 25 2019, 12:20 PM
lib/libc/sys/shm_open.c
67

Sure- will do.

This revision was automatically updated to reflect the committed changes.
kevans marked 4 inline comments as done.