Page MenuHomeFreeBSD

vm_map: Add fences around pmap rundown
AbandonedPublic

Authored by markj on Oct 23 2020, 9:39 PM.

Details

Reviewers
alc
kib
mmel
Summary

pmap_remove_pages() is called without the pmap lock, so nothing
guarantees that all updates to the pmap are visible when removing all of
its mappings. Use a pair of fences to ensure that all updates are
visible to the thread calling pmap_remove_pages().

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 34346
Build 31469: arc lint + arc unit

Event Timeline

markj requested review of this revision.Oct 23 2020, 9:39 PM
markj created this revision.
markj added inline comments.
sys/vm/vm_map.c
422

Is this comment about vnodes still true somehow?

sys/vm/vm_map.c
395

Don't you need a release there too ?

422

I believe comment tries to say that dropping references to vnodes could cause inactivation, that indeed results in non-trivial fs activities if nlink == 0.

markj added inline comments.
sys/vm/vm_map.c
422

But does anything in this function drop a vnode reference? I believe it would happen earlier, when the vm_map is destroyed.

Add a missing _rel fence.

I'm afraid that I can't make sense out of the summary. Is this a problem with the KASSERT that precedes the pmap lock acquire in arm64's pmap_remove_pages()? Omce pmap_remove_pages() has performed the lock acquire, prior changes to the pmap should be visible to it.

kib added inline comments.
sys/vm/vm_map.c
422

It occurs in vmspace_dofree()->vm_map_remove()->vm_map_entry_deallocate(), and then vm_map_unlock() for deferred processing.

When vm_map destroyed, we really do not have anything to process already.

This revision is now accepted and ready to land.Oct 23 2020, 10:48 PM
mmel added inline comments.
sys/vm/vm_map.c
435

Should not be the same loop in vmspace_acquire_ref () also converted to using of atomic_load_int (& vm-> vm_refcnt)?
IMHO, this code

refcnt = vm->vm_refcnt;
do {
...
}
} while (!atomic_fcmpset_int (&vm->vm_refcnt, &refcnt, refcnt + 1));

represents undefined behavior. The compiler is free to optimize refcnt back to vm-> vm_refcnt.

In D26923#600563, @alc wrote:

I'm afraid that I can't make sense out of the summary. Is this a problem with the KASSERT that precedes the pmap lock acquire in arm64's pmap_remove_pages()? Omce pmap_remove_pages() has performed the lock acquire, prior changes to the pmap should be visible to it.

I somehow managed to miss the fact that pmap_remove_pages() acquires the pmap lock, so you're right, and this change doesn't really make sense. Similarly, vmspace_dofree() acquires the map lock when destroying map entries, so it seems the existing synchronization is sufficient.

A month or so ago I looked at a strange non-INVARIANTS panic that's happened a couple of times on a ThunderX used as a package builder. The panic was an alignment exception that occurred while handling a different panic that I wasn't able to identify. kgdb wasn't able to unwind the stack but when dumping the thread stack I saw return addresses indicating that the initial panic happened in pmap_remove_pages(), so I had a preconceived notion that there's a subtle bug here and jumped the gun.

As @mmel pointed out on the mailing lists, there is in fact a problem with the KASSERT because PCPU_GET() isn't atomic on arm64, but that's obviously unrelated to this diff.

vmspace != vm_map, and vm_map lock is only supposed to protect vm_map.

The change puts the refcounting for vmspace in line with use of fences for our refcount(9) KPI, so I believe it is useful.

In D26923#600887, @kib wrote:

vmspace != vm_map, and vm_map lock is only supposed to protect vm_map.

True, but there is no vmspace lock, so vmspace fields must either be const or be protected by some other lock.

The change puts the refcounting for vmspace in line with use of fences for our refcount(9) KPI, so I believe it is useful.

I think we could use refcount(9) directly. Currently that's not possible but a refcount_release_if_last() would let us write:

bool released;

if (!(released = refcount_release_if_last(&vm->vm_refcount))) {
    if (p->p_vmspace != &vmspace0) {
        p->p_vmspace = &vmspace0;
        pmap_activate(td);
    }
    released = refcount_release(&vm->vm_refcnt);
}
if (released) {
    if (p->p_vmspace != vm) {
        p->p_vmspace = vm;
        pmap_activate(td);
    }
    /* Free vmspace. */
}

In particular, once the thread has switched to vmspace0, I believe we can do an unconditional decrement of the refcount. Does this seem right?

refcount_release_if_last sounds almost like refcount_release, and I suspect that refcount_release() might be enough. As I understand, the problem is that pmap_remove_pages() has to operate on current pmap, which causes all that troubles. What if we just use refcount_release() and keep p_vmspace with refcount zero until pmap_remove_pages() finish ?

In D26923#601558, @kib wrote:

refcount_release_if_last sounds almost like refcount_release, and I suspect that refcount_release() might be enough. As I understand, the problem is that pmap_remove_pages() has to operate on current pmap, which causes all that troubles. What if we just use refcount_release() and keep p_vmspace with refcount zero until pmap_remove_pages() finish ?

The problem is that curpmap is an implicit reference. If a pmap is active on a CPU and the thread calls vmspace_exit(), the thread must deactivate that pmap before releasing the reference, otherwise it may trigger a use-after-free when it goes off-CPU (e.g., when it updates the pm_active bitset). Only the thread performing a 1->0 transition can avoid deactivating curpmap.

To use refcount_release(), vmspace_exit() would have to deactivate the curpmap unconditionally, i.e.,:

if (p->p_vmspace != &vmspace0) {
    p->p_vmspace = &vmspace0;
    pmap_activate(td);
}
if (refcount_release(&vm->vm_refcnt)) {
    if (p->p_vmspace != vm) {
        p->p_vmspace = vm;
        pmap_activate(td);
    }
    pmap_remove_pages(vmspace_pmap(vm));
    <...>
}

but this would be a pessimization in the common case where the caller holds the sole reference.

sys/vm/vm_map.c
435

We can put whatever pmap * into curpmap in critical section, if we do full userspace VA TLB invalidation inside the section.

Anyway, whatever the method you prefer, I think it is worth doing correct fences in the refcount use.