Page MenuHomeFreeBSD

Export a list of VM objects via a sysctl.
ClosedPublic

Authored by jhb on Apr 10 2015, 8:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
May 21 2024, 10:39 AM
Unknown Object (File)
May 8 2024, 2:04 AM
Unknown Object (File)
May 7 2024, 9:53 PM
Unknown Object (File)
Apr 20 2024, 7:55 PM
Unknown Object (File)
Apr 20 2024, 7:54 PM
Unknown Object (File)
Apr 8 2024, 10:23 PM
Unknown Object (File)
Mar 19 2024, 4:48 AM
Unknown Object (File)
Mar 19 2024, 3:25 AM
Subscribers

Details

Summary

Export a list of VM objects in the system via a sysctl. The list
can be examined via 'vmstat -o'. It can be used to determine which
files are using physical pages of memory and how much each is using.

This is a port of the vm_objects kld/program I have had at
www.freebsd.org/~jhb/vm_object/ to the base system.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb retitled this revision from to Export a list of VM objects via a sysctl..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added reviewers: kib, alc.

I don't know why arc thinks that I copied those two manpages on top of each other. I just added the new page as a new file in git.

lib/libutil/kinfo_getvmobject.c
51 ↗(On Diff #4784)

Allocate 2*len in advance ?

sys/sys/user.h
498 ↗(On Diff #4784)

Probably it makes sense to change all residency counters to uint64 in advance, not waiting for vm_object to update the types. It is already can be overflowed, with the machines at the affordable price.

503 ↗(On Diff #4784)

NumbeR

504 ↗(On Diff #4784)

Add both int32 and int64 for spares, please. Making int64 from int spares in MI way is too complicated.

sys/vm/vm_object.c
2322 ↗(On Diff #4784)

Note that the loop in vm_meter.c:vmtotal() only does trylock under vm_object_list_mtx. I am not sure why, this might be a fossil.

Still, I think that a OBJT_DEAD sentinel is a better way to remember the current position in the list.

2329 ↗(On Diff #4784)

This empty line does not make sense.

2377 ↗(On Diff #4784)

Why MGTDEVICE is excluded ?

lib/libutil/kinfo_getvmobject.c
51 ↗(On Diff #4784)

The in-kernel sysctl already overestimates a bit when given a NULL length.

As a side note, it would be nice if the KERN_PROC_FILEDESC and KERN_PROC_VM<foo> sysctls would optimize the req->oldptr == NULL case to just give an estimate rather than doing all the work to look up pathnames, etc.

sys/sys/user.h
498 ↗(On Diff #4784)

So all the size counts (resident, active, inactive)?

503 ↗(On Diff #4784)

Fixed.

504 ↗(On Diff #4784)

Will do.

sys/vm/vm_object.c
2322 ↗(On Diff #4784)

Hmm, Alan added the trylock in r124083. The message just says there was a LOR, but I don't see it looking at the current vm_object.c code. Can we create a new VM object while an existing one is locked? It looks like vm_object_shadow() and vm_object_split() are both careful to drop their object lock while allocating a new one. We also always use M_WAITOK to allocate an object in vm_object_allocate(), and it looks like we don't hold any locks when dequeueing the object when freeing it.

2329 ↗(On Diff #4784)

Removed.

2377 ↗(On Diff #4784)

Whoops. I first wrote this on 8.x before MGTDEVICE existed.

  • Typo, add 64-bit spares, use 64-bit for page counts.
  • Remove blank, add OBJT_MGTDEVICE.

I'll make a tangential remark. Given that vm objects are allocated from a NOFREE zone, it actually strikes me as silly that we keep adding and removing them to and from the global list. Either we decide that iteration over the list is not performance critical and accept the cost of iterating over a lot of dead objects or, perhaps, implement some kind of lazy removal, i.e., during iteration over the list remove dead objects and during allocation insert if necessary.

sys/vm/vm_object.c
2322 ↗(On Diff #4784)

My recollection is that the execution path for destroying an object still held the object lock when the object was removed from the global list. That is no longer the case, so the trylock could be eliminated.

sys/sys/user.h
498 ↗(On Diff #4784)

I think yes.

sys/vm/vm_object.c
2298–2307 ↗(On Diff #4807)

Could you add a comment explaining what this block is trying to do? Unless someone has read the user-space code, this seems puzzling at first.

2313 ↗(On Diff #4807)

"it's" -> "its"

2314 ↗(On Diff #4807)

"... stable while the list is unlocked. To ..."

2322 ↗(On Diff #4807)

How about acquiring the reference on the object first and then downgrading the lock to a read lock for the iteration over the memq?

sys/vm/vm_object.c
2322 ↗(On Diff #4784)

Even at the time the commit was made to add the try lock that was still the case, so I'm worried there might be some other case (e.g. calling vm_object_deallocate() on one object while holding the lock of a different VM object).

I also decided to skip OBJ_DEAD objects entirely. They don't really serve any useful purpose to list and this would allow us to consider Alan's proposal in the future without need for any further changes.

sys/vm/vm_object.c
2298–2307 ↗(On Diff #4807)

Will do. It's actually a somewhat common idiom used in other sysctl handlers that return variable-sized tables.

2313 ↗(On Diff #4807)

Fixed.

2314 ↗(On Diff #4807)

Done.

2322 ↗(On Diff #4807)

Good idea. Done.

jhb edited edge metadata.
  • Fixes from alc@ and skip OBJ_DEAD objects.
kib edited edge metadata.
This revision is now accepted and ready to land.Apr 15 2015, 3:08 PM
alc edited edge metadata.

I got a LOR when I tried this due to vm_object_list and vm_object. I added the reverse order to WITNESS to tease out where it occurs and this is what I got:

lock order reversal:
1st 0xfffff80011c36a00 vm object (vm object) @ vm/vm_object.c:576
2nd 0xffffffff818b5908 vm object_list (vm object_list) @ vm/vm_object.c:677
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe085cf602a0
witness_checkorder() at witness_checkorder+0xe26/frame 0xfffffe085cf60330
mtx_lock_flags() at mtx_lock_flags+0xa8/frame 0xfffffe085cf60380
vm_object_destroy() at vm_object_destroy+0x23/frame 0xfffffe085cf603a0
vm_object_collapse() at vm_object_collapse+0x32d/frame 0xfffffe085cf603f0
vm_object_deallocate() at vm_object_deallocate+0x2b/frame 0xfffffe085cf60450
vm_map_process_deferred() at vm_map_process_deferred+0x89/frame 0xfffffe085cf60480
vm_map_remove() at vm_map_remove+0xc8/frame 0xfffffe085cf604b0
exec_new_vmspace() at exec_new_vmspace+0x180/frame 0xfffffe085cf60510
exec_elf64_imgact() at exec_elf64_imgact+0x700/frame 0xfffffe085cf605e0
kern_execve() at kern_execve+0x4e6/frame 0xfffffe085cf60940
sys_execve() at sys_execve+0x37/frame 0xfffffe085cf609a0
amd64_syscall() at amd64_syscall+0x27f/frame 0xfffffe085cf60ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe085cf60ab0

  • syscall (59, FreeBSD ELF64, sys_execve), rip = 0x42402a, rsp = 0x7fffffffe738, rbp = 0x7fffffffe740 ---

The problem here being that vm_object_collapse() destroys backing_object while 'object' is locked. I'm not sure how best to rectify this. Even if I use a marker object to walk the list, I believe I still need the vm_object_list mutex to keep each object stable so I can grab a reference, but that alone would trigger the LOR.

I assume it wouldn't really be safe to change vm_object_collapse() to bump object's ref count and drop the lock around destroy.

OTOH, given that it is a NOFREE zone, I could adopt Alan's suggestion of leaving DEAD objects on the list and only adding objects to the list when they are originally allocated. That would remove the TAILQ_REMOVE from vm_object_destroy() and thus remove the LOR.

So you are going to restore the trylock for the commit ? I think that the change to keep even unallocated objects on the list should be separate.

John, how do you want to proceed? I see two possibilities. 1. Commit a revised version of this patch that uses trylock, and implement lazy dead object removal later. 2. Wait for the implementation of lazy dead object removal to be committed first, and then update this patch.

I'm swamped right now. If you want to take option #2, could one of you implement the lazy dead object removal?

Ah, I've just been preempted by other stuff. I already have a patch I haven't yet uploaded (or tested) that changes the list to never remove objects and only add new ones to the list when they are init'd. I can split that out into a separate change and have that go in first before this one.

jhb edited edge metadata.

Rebased on newer HEAD after the object list changes.

  • Update copyright holder.
  • Simplify the locking to take advantage of type stability, etc.
  • OBJ_DEAD != OBJT_DEAD.
This revision now requires review to proceed.May 27 2015, 3:37 PM
kib edited edge metadata.
This revision is now accepted and ready to land.May 27 2015, 4:55 PM
This revision was automatically updated to reflect the committed changes.