Page MenuHomeFreeBSD

LinuxKPI: Allow cdev_pager prefault handler to steal pages from other vm_objects
ClosedPublic

Authored by wulf on Sep 24 2021, 1:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Mar 30, 7:48 PM
Unknown Object (File)
Mar 9 2024, 1:44 PM
Unknown Object (File)
Jan 19 2024, 10:25 PM
Unknown Object (File)
Jan 18 2024, 4:00 AM
Unknown Object (File)
Dec 29 2023, 2:14 PM
Unknown Object (File)
Dec 27 2023, 7:09 AM
Unknown Object (File)
Dec 20 2023, 3:57 AM
Unknown Object (File)
Nov 21 2023, 8:14 PM

Details

Summary

This workarounds "Page already inserted" panic in vm_page_insert
routine triggered on attempt to mmap file created with shmem_file_setup
call. After introduction of "GTT mmap interface v4" a.k.a. MMAP_OFFSET,
vm_objects allocated by these calls may try to own intersected sets of
pages that leads to the assertion. To ensure that acceptor vm_object
borrows a page from the right donor one, the later must be explicitly
locked by a caller of vmf_insert_pfn_prot.

P.S. Proposed change looks like naive and dirty hack but works for me and is very simple.
This is follow up to https://reviews.freebsd.org/D31672

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

wulf requested review of this revision.Sep 24 2021, 1:09 AM
sys/compat/linuxkpi/common/src/linux_page.c
331

Does this mean that the same page can be mapped multiple times in a VMA, at different offsets in the object?

sys/compat/linuxkpi/common/src/linux_page.c
331

The page is mapped in to different objects.
First time it is mapped in to vm_object created by shmem_file_setup(). That vm_object is associated with some GEM object in i915kms driver.
On the next step, the GEM object is mmap()-ed. It forces creation of another one vm_object with calling of cdev_pager_allocate(OBJT_MGTDEVICE).
And than execution of populate handler inserts into vm_object belonging mmap a page which is already inserted into vm_object created by shmem_file_setup()

sys/compat/linuxkpi/common/src/linux_page.c
331

And these are two different objects? So you must own two object' locks there?

BTW are the types of the objects different as well? I would expect that shmem_file_setup() to create something like a swap object.

sys/compat/linuxkpi/common/src/linux_page.c
331

So you must own two object' locks there?

Yes. That is why VM_OBJECT_ASSERT_WLOCKED() is placed twice in to this routine.

BTW are the types of the objects different as well? I would expect that shmem_file_setup() to create something like a swap object.

Yes. shmem_file_setup calls vm_pager_allocate(OBJT_DEFAULT

sys/compat/linuxkpi/common/src/linux_page.c
331

I.e. you establishing a lock order between two vm_objects. We do have order between a shadow object and its backing object, but there I do not think you have a shadowing. This is strange at least, and definitely should not be introduced on ad hoc manner.

When removing the page from the swap object, you at least need to clean the swap allocation, otherwise you get the swap space leak.

sys/compat/linuxkpi/common/src/linux_page.c
331

definitely should not be introduced on ad hoc manner.

Ok, I will rewrite that part with using vm_page_remove.

When removing the page from the swap object, you at least need to clean the swap allocation, otherwise you get the swap space leak.

Following code exists in vm_page_rename() execution path. Does it suits our needs?

/* Deferred free of swap space. */
if ((m->a.flags & PGA_SWAP_FREE) != 0)
        vm_pager_page_unswapped(m);
sys/compat/linuxkpi/common/src/linux_page.c
331

No, it should be something like vm_pager_page_unswapped(m) while m still belong to the shmem object.

I suggest to move the page renaming code into some helper that you would add into vm/.

sys/compat/linuxkpi/common/src/linux_page.c
331

A which point page leaves object?
As I see following steps in code starting from line 1620 in sys/vm/vm_page.c:

1. vm_pager_page_unswapped(m);
2. m->object = NULL;
3. vm_radix_remove(&object->rtree, m->pindex);
4. TAILQ_REMOVE(&object->memq, m, listq);

Does it not belong to shmem at step 1?

Eliminate lock ordering through use of vm_page_remove()

sys/compat/linuxkpi/common/src/linux_page.c
331

it should be something like vm_pager_page_unswapped(m) while m still belong to the shmem object.

The first thing vm_page_remove() does is calling of vm_pager_page_unswapped(m). Is last version satisfies above reqirement?

sys/compat/linuxkpi/common/src/linux_page.c
331

No, the call has to be unconditional. PGA_SWAP_FREE exists to support lazy deallocation of swap blocks, which helps reduce object lock contention. When removing a page from its object, we must complete the deallocation since the object contains swap block metadata. But if PGA_SWAP_FREE is not set, then vm_page_remove() will not deallocate the swap block.

So vm_page_page_unswapped() must be called directly.

Add unconditional execution of vm_pager_page_unswapped()

sys/compat/linuxkpi/common/src/linux_page.c
331

Thank you for explanation.

Is it correct now?

sys/compat/linuxkpi/common/src/linux_page.c
309

Why page still belongs to shm_obj at this point?

Move "page still belongs to shm_obj" check to right place

wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
309

Heh. That is result of "online coding".
MPASS moved to right place. Thanks!

sys/compat/linuxkpi/common/src/linux_page.c
309

I mean that you unbusied the page, and do not have shm_obj lock owned. What prevents other thread from e.g. handling the same access fault and removing page from shm_obj while all locks are dropped and page not busied?

wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
309

From my point of view, execution of fault handlers is serialized at least on process level with mmap semaphore.
Following is what we have in sys/compat/linuxkpi/common/src/linux_compat.c (about line 550):

        down_write(&vmap->vm_mm->mmap_sem);
...
                err = vmap->vm_ops->fault(vmap, &vmf);
...
        up_write(&vmap->vm_mm->mmap_sem);
sys/compat/linuxkpi/common/src/linux_page.c
309

From what I see, mmap_sem only protects fault (or rather, the page instantiation during the fault), and mmap_single/close. Since shmem segment is just anon vm object, what prevents say pagedaemon from reclaiming the page while the object is unlocked/page is unbusied?

I am not even sure that taking semaphore in linux_cdev_pager_dtor() is enough to guarantee that other thread cannot unmap or remap the address meantime, if there more references on the cdev vm_object.

Allow to steal pages grom any object

wulf marked an inline comment as done.Sep 25 2021, 1:29 AM
wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
309

Do you mean something like this revision? Caller allowed to rename page from any object, not only from specified in parameter and callee checks that page->object persists across vm_object relocking?

sys/compat/linuxkpi/common/src/linux_page.c
309

Note that vm_page_busy_acquire() is not safe as you use it, for the same reason. The tmp_obj lock might be dropped, which would allow the page to be reclaimed. I thihk you need VM_ALLOC_WAITFAIL and retry on fail.

wulf marked an inline comment as done.

Pass VM_ALLOC_WAITFAIL flag to vm_page_busy_acquire()

wulf marked an inline comment as done.Sep 25 2021, 2:09 AM
wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
309

done

sys/compat/linuxkpi/common/src/linux_page.c
306

No, you need to retry same as for case page->object != tmp_obj simply because you might no longer own page->object lock.

wulf marked an inline comment as done.

Do full cycle retry if vm_page_busy_acquire() has failed

wulf marked an inline comment as done.Sep 25 2021, 2:39 AM
wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
306

done

sys/compat/linuxkpi/common/src/linux_page.c
309

Before the call to vm_page_busy_acquire(), you need to check that you own the lock for page->object, because tmp_obj might be already staled. [This does not mean that check after busy is not needed]

wulf marked an inline comment as done.

Add check for owning of the lock for page->object

wulf marked an inline comment as done.Sep 25 2021, 3:26 AM
wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
309

done

sys/compat/linuxkpi/common/src/linux_page.c
310
....
VM_OBJECT_WLOCK(tmp_obj);
if (page->object == tmp_obj &&
       vm_page_busy_acquire(page, VM_ALLOC_WAITFAIL)) {
           KASSERT(page->object == tmp_obj, ("XXX something about page changing identity"));
           KASSERT((page->oflags & VPO_UNMANAGED) == 0, ("XXX something about shmem"));
           vm_pager_page_unswapped(page);
           ....

BTW, is it possible to have other mappings for the page through the page->object mappings? If yes, this stuff would destroy almost all VM invariants. You should either add KASSERT(!pmap_page_is_mapped(page), ()); before vm_page_remove(page), or consider doing pmap_remove_all(page); but then existing mappings through shmem would loose content, getting zero pages into holes.

wulf marked an inline comment as done.Sep 25 2021, 4:15 AM
wulf added inline comments.
sys/compat/linuxkpi/common/src/linux_page.c
310

but then existing mappings through shmem would loose content, getting zero pages into holes.

At least in my case there are no mappings through shmem when populate handler is invoked. I was running i915kms with vm_object_deallocate(shm_obj) call inserted right before lkpi_vmf_insert_pfn_prot_locked() for some time and did not find any negative effects.

Revision updated according to last @kib comment

sys/compat/linuxkpi/common/src/linux_page.c
310

Ran it for some time. No "page is mapped" asserts were triggered

This version is sort of fine.
More I think about it, more my belief is that you do not hit pmap_page_is_mapped() assertion just because the userspace did not anything hostile, in particular did not mapped this shmem segment more than once.

May be, instead of asserting, do something along the lines (with proper cleanup)

if (pmap_page_is_mapped(page)) {
    vm_page_xunbusy(page);
    VM_OBJECT_WUNLOCK(tmp_obj);
    printf("XXX describe something about this situation");
    VM_OBJECT_WLOCK(vm_obj);
    return (VM_FAULT_NOPAGE);
}

IMO, Although printf is better in general case, KASSERT is still good choice for i915kms.
It has some cleanup code which removes old mappings[1] and it works
So when i915kms mmaps same pages again (it happens sometimes), nothing bad happens.
I think we should not try to hide problem if the cleanup stopped to work as it happened e.g. on 5.5->5.6 update

[1] https://github.com/freebsd/drm-kmod/blob/5.5-stable/linuxkpi/gplv2/src/linux_page.c#L221

Or you are talking to be more error-prone to bugs in userspace drivers?

Replace KASSERT with printf.

From the other side, a lot of unconditional printfs reveals problem too

It is not only bugs but also a hostile userspace. Kernel must not allow corrupting its memory.

This revision is now accepted and ready to land.Sep 25 2021, 3:30 PM

Thank you for the time spent on it

sys/compat/linuxkpi/common/src/linux_page.c
315

Should this printf() go away?

Is there a chance it may end up spamming the console?

sys/compat/linuxkpi/common/src/linux_page.c
315

Should this printf() go away?

Is there a chance it may end up spamming the console?

No, it should stay, and the spam is intended, if ever occurring.