pmap_remove_pages() is called without the pmap lock, so nothing
guarantees that all updates to the pmap are visible when removing all of
its mappings. Use a pair of fences to ensure that all updates are
visible to the thread calling pmap_remove_pages().
Details
Diff Detail
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 34346 Build 31469: arc lint + arc unit
Event Timeline
sys/vm/vm_map.c | ||
---|---|---|
422 | Is this comment about vnodes still true somehow? |
sys/vm/vm_map.c | ||
---|---|---|
422 | But does anything in this function drop a vnode reference? I believe it would happen earlier, when the vm_map is destroyed. |
I'm afraid that I can't make sense out of the summary. Is this a problem with the KASSERT that precedes the pmap lock acquire in arm64's pmap_remove_pages()? Omce pmap_remove_pages() has performed the lock acquire, prior changes to the pmap should be visible to it.
sys/vm/vm_map.c | ||
---|---|---|
422 | It occurs in vmspace_dofree()->vm_map_remove()->vm_map_entry_deallocate(), and then vm_map_unlock() for deferred processing. When vm_map destroyed, we really do not have anything to process already. |
sys/vm/vm_map.c | ||
---|---|---|
435 | Should not be the same loop in vmspace_acquire_ref () also converted to using of atomic_load_int (& vm-> vm_refcnt)? refcnt = vm->vm_refcnt; do { ... } } while (!atomic_fcmpset_int (&vm->vm_refcnt, &refcnt, refcnt + 1)); represents undefined behavior. The compiler is free to optimize refcnt back to vm-> vm_refcnt. |
I somehow managed to miss the fact that pmap_remove_pages() acquires the pmap lock, so you're right, and this change doesn't really make sense. Similarly, vmspace_dofree() acquires the map lock when destroying map entries, so it seems the existing synchronization is sufficient.
A month or so ago I looked at a strange non-INVARIANTS panic that's happened a couple of times on a ThunderX used as a package builder. The panic was an alignment exception that occurred while handling a different panic that I wasn't able to identify. kgdb wasn't able to unwind the stack but when dumping the thread stack I saw return addresses indicating that the initial panic happened in pmap_remove_pages(), so I had a preconceived notion that there's a subtle bug here and jumped the gun.
As @mmel pointed out on the mailing lists, there is in fact a problem with the KASSERT because PCPU_GET() isn't atomic on arm64, but that's obviously unrelated to this diff.
vmspace != vm_map, and vm_map lock is only supposed to protect vm_map.
The change puts the refcounting for vmspace in line with use of fences for our refcount(9) KPI, so I believe it is useful.
True, but there is no vmspace lock, so vmspace fields must either be const or be protected by some other lock.
The change puts the refcounting for vmspace in line with use of fences for our refcount(9) KPI, so I believe it is useful.
I think we could use refcount(9) directly. Currently that's not possible but a refcount_release_if_last() would let us write:
bool released; if (!(released = refcount_release_if_last(&vm->vm_refcount))) { if (p->p_vmspace != &vmspace0) { p->p_vmspace = &vmspace0; pmap_activate(td); } released = refcount_release(&vm->vm_refcnt); } if (released) { if (p->p_vmspace != vm) { p->p_vmspace = vm; pmap_activate(td); } /* Free vmspace. */ }
In particular, once the thread has switched to vmspace0, I believe we can do an unconditional decrement of the refcount. Does this seem right?
refcount_release_if_last sounds almost like refcount_release, and I suspect that refcount_release() might be enough. As I understand, the problem is that pmap_remove_pages() has to operate on current pmap, which causes all that troubles. What if we just use refcount_release() and keep p_vmspace with refcount zero until pmap_remove_pages() finish ?
The problem is that curpmap is an implicit reference. If a pmap is active on a CPU and the thread calls vmspace_exit(), the thread must deactivate that pmap before releasing the reference, otherwise it may trigger a use-after-free when it goes off-CPU (e.g., when it updates the pm_active bitset). Only the thread performing a 1->0 transition can avoid deactivating curpmap.
To use refcount_release(), vmspace_exit() would have to deactivate the curpmap unconditionally, i.e.,:
if (p->p_vmspace != &vmspace0) { p->p_vmspace = &vmspace0; pmap_activate(td); } if (refcount_release(&vm->vm_refcnt)) { if (p->p_vmspace != vm) { p->p_vmspace = vm; pmap_activate(td); } pmap_remove_pages(vmspace_pmap(vm)); <...> }
but this would be a pessimization in the common case where the caller holds the sole reference.
sys/vm/vm_map.c | ||
---|---|---|
435 | We can put whatever pmap * into curpmap in critical section, if we do full userspace VA TLB invalidation inside the section. Anyway, whatever the method you prefer, I think it is worth doing correct fences in the refcount use. |