Details

Reviewers

markj
alc
rmacklem

Commits

rG0b8643eaf6e7: vmmeter(): Fix detection of the named swap objects
rG4cf9f5d807a7: vm_object: restore handling of shadow_count for all type of objects

Summary

instead of only OBJ_ANON objects that are backing, as it is now.
This is required for e.g. vm_meter is_object_active() detection, and
should be useful in some more cases.

Use refcount KPI for all objects, regardless of owning the object lock,
and the fact that currently OBJ_ANON cannot change for the live object.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Dec 17 2021, 9:00 PM

Herald added a subscriber: imp. · View Herald TranscriptDec 17 2021, 9:00 PM

kib requested review of this revision.Dec 17 2021, 9:00 PM

kib added inline comments.

sys/vm/vm_object.c
665	This check is still fine for !OBJ_ANON because we recheck the object type below (does not matter really).

Can it be a regular atomic_add/sub_int? I do not see why slightly more expensive refcount semantics are needed. Also the shadow_count field should be defined as volatile, or loads should be annotated.

I am somewhat concerned that this will introduce significant contention for frequently loaded executable and DSO files.

Use atomics directly instead of refcount.
Use atomic_load().

In D33549#758001, @markj wrote:

I am somewhat concerned that this will introduce significant contention for frequently loaded executable and DSO files.

BTW, in all that cases we already own (shared) lock on the vnode for the mapped object. So it should be mostly the same, I believe.

My kernel is not completely up to date, but with this patch,
I get a panic() during booting:
panic: vm_object_anon_deallocate: ref_count: 1, shadow_count: 3
vm_object_deallocate
vm_map_process_deferred
vm_map_remove
exec_new_vmspace
exec_elf64_imgact
kern_execve
sys_execve
amd64_syscall
fast_syscall_common

I think this is the "hand crafted" exec of init, since the panic happens
right after "mountroot".

Just tried the patch on a kernel from main as of to-day
and got the same crash during boot.

Fix sillyness: atomic_substract_int(&x, -1)

It boots now.

This time it crashed on shutdown.
panic: object 0x,,, shadow_count = -1
vm_object_zdtor
uma_zfree_arg
vnode_destroy_vobject
vgonel
vflush
ffs_flushfiles
softdep_flushfiles
ffs_unmount
dounmount
vfs_unmountall
bufshutdown
kern_reboot
sys_reboot
amd64_syscall
fast_syscall_common

Fix shadow_count on collapse.

Works fine now for the NFS related testing. (Of course, the unpatched
sources worked, as well.)

Since I know nothing about the VM, I can't really review it.
ps: I'm guessing that vm_object_is_active() should also use atom_load_int()?

In D33549#758511, @rmacklem wrote:

Works fine now for the NFS related testing. (Of course, the unpatched
sources worked, as well.)

It would probably make sense to finish this discussion before switching NFS to vm_object_is_active(), but since it is committed, I think that it should be left as is for now.
Also, if we go this route, I will ask Peter for test before committing this patch.

Since I know nothing about the VM, I can't really review it.
ps: I'm guessing that vm_object_is_active() should also use atom_load_int()?

Yes, although the atomic_load_int() does not add any value to the code. It is only formality, I updated the patch.

Use atomic_load(shadow_count) in vm_object_is_active().

Reviving this review.

I just tried to build python on NFS mount, and it failed miserably. Almost all test c sources generated by the configure script contained bursts of spurious NUL bytes, and config.log file was randomly stashed with NULs as well. Applying this patch is not enough to fix the breakage. Reverting all three commits

867c27c23a5c469b27611cf53cc2390b5a193fa5
150da1e3cd51e552f7d6bc159b387e540c5a373
b70042adfebbc4ee90d6a88dd6dc34d3f8ed5c37

that do direct io for IO_APPEND did fixed things nicely, for my relief.

In fact, doing a quick read of ncl_write(), I see that vinvalbuf() is only done when NMODIFIED nnode flag is already set. This means that we are not only incoherent with the page cache, but also with the buffer cache, which explains the problem.

So, vm_object_is_active() infers that an object is mapped if ref_count > shadow_count, but as you noted in an earlier review, this is not generally correct since a reference maybe held by the pager. So for, e.g., tmpfs objects it always returns true. It happens to work for vnode objects because vnode_create_vobject() explicitly releases the reference.

To detect possible mappings, the NFS client could implement VOP_MMAPPED to set a flag indicating that the vnode might be mapped; it would not be cleared when the last mapping is removed, but I think this is ok for the purposes of the IO_APPEND optimization.

This diff looks ok to me. I do think that vm_object_is_active() should not be exported, and that maintaining shadow_count for all objects is not very useful. vmtotal() is buggy regardless, e.g., it skips vnode objects with exactly one mapping, no?

It turned out that NFS did not need to know if a file
was mmap'd, since it had to invalidate pages
unconditionally for the Append case.

I actually started out with a VOP_MMAPED() call
and flag, but Kostik suggested this as a better
alternative. And then it turned out it was not
needed at all.