Page MenuHomeFreeBSD

Extend swap pager to put tmpfs vnodes on lazy list.
ClosedPublic

Authored by mjg on Apr 30 2021, 9:07 PM.

Details

Summary

This fixes tmpfs scanning all vnodes in order to manage mtime for memory mappings, almost always finding nothing to do. This is of huge importance when e.g., running poudriere with tmpfs-based worlds.

I don't know if decoupling OBJT_SWAP from tmpfs is viable.

The new OBJ_TMPFS_VREF flag is added to allow asserting on the reference not being there.

Sample result from stock kernel running tinderbox in a tmpfs-only environment:

tmpfs_update_mtime: called for mp 0xfffffe032c6ce080 ; 0 lazy size; 8 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe064c2fe040 ; 0 lazy size; 152056 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe032c6ce080 ; 0 lazy size; 8 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe064c2fe040 ; 0 lazy size; 224955 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe032c6ce080 ; 0 lazy size; 8 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe064c2fe040 ; 0 lazy size; 254417 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe032c6ce080 ; 0 lazy size; 8 inspected; 0 were dirty
tmpfs_update_mtime: called for mp 0xfffffe064c2fe040 ; 0 lazy size; 284780 inspected; 0 were dirty

Patched kernel almost never catches anything.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

mjg requested review of this revision.Apr 30 2021, 9:07 PM
mjg created this revision.

Don't you need to check writemapping > 0 in tmpfs_alloc_vp() and put the vnode on lazy list as needed?

I don't know if decoupling OBJT_SWAP from tmpfs is viable.

I do not know what do you mean.

sys/vm/swap_pager.c
3125

Don't you need to assert that TMPFS_VREF is set? Or even better, check that the flag is set, and not vrele() if it is not. I am not sure that we never do update_writecount with effective value of 0.

3162

I think for swap objects, release_writecount() can just call update_writecount() and decide what to do on the new writemapping value being 0 (clear flag) or not (set flag).

  • dedup tmpfs handling into one routine
  • more asserts
In D30065#674641, @kib wrote:

Don't you need to check writemapping > 0 in tmpfs_alloc_vp() and put the vnode on lazy list as needed?

I don't see how this can be needed, as in by the time the vnode gets a mapping it already has to be already allocated.

I don't know if decoupling OBJT_SWAP from tmpfs is viable.

I do not know what do you mean.

I mean there is no OBJT_TMPFS, instead there is OBJT_SWAP + OBJ_TMPFS special cases sprinkled all over.

In D30065#674658, @mjg wrote:
In D30065#674641, @kib wrote:

Don't you need to check writemapping > 0 in tmpfs_alloc_vp() and put the vnode on lazy list as needed?

I don't see how this can be needed, as in by the time the vnode gets a mapping it already has to be already allocated.

Imagine that vnode gets mapped, then reclaimed. And then the vnode for the mapped node is reallocated.

I don't know if decoupling OBJT_SWAP from tmpfs is viable.

I do not know what do you mean.

I mean there is no OBJT_TMPFS, instead there is OBJT_SWAP + OBJ_TMPFS special cases sprinkled all over.

Initially it was only two cases, one in vm_object_dereference(), and another in vm_object_might_be_dirty(). Now there are under 10, counting your addition.
We would need some kind of loadable vm pager first, and then add some number of new pager methods.

sys/vm/swap_pager.c
3158–3169

'new' is arguably useless

  • justify swap_pager_update_tmpfs with a comment
In D30065#674668, @kib wrote:
In D30065#674658, @mjg wrote:
In D30065#674641, @kib wrote:

Don't you need to check writemapping > 0 in tmpfs_alloc_vp() and put the vnode on lazy list as needed?

I don't see how this can be needed, as in by the time the vnode gets a mapping it already has to be already allocated.

Imagine that vnode gets mapped, then reclaimed. And then the vnode for the mapped node is reallocated.

Getting any writable mappings bumps v_usecount which only goes away if all of them disappear, so I think this creates an invariant that the vnode is not reclaimed in this scenario.

I don't know if decoupling OBJT_SWAP from tmpfs is viable.

I do not know what do you mean.

I mean there is no OBJT_TMPFS, instead there is OBJT_SWAP + OBJ_TMPFS special cases sprinkled all over.

Initially it was only two cases, one in vm_object_dereference(), and another in vm_object_might_be_dirty(). Now there are under 10, counting your addition.
We would need some kind of loadable vm pager first, and then add some number of new pager methods.

mjg marked 2 inline comments as done.May 1 2021, 12:00 AM
mjg added inline comments.
sys/vm/swap_pager.c
3158–3169

It does not add anything per se, but I think it reads better.

In D30065#674672, @mjg wrote:
In D30065#674668, @kib wrote:
In D30065#674658, @mjg wrote:
In D30065#674641, @kib wrote:

Don't you need to check writemapping > 0 in tmpfs_alloc_vp() and put the vnode on lazy list as needed?

I don't see how this can be needed, as in by the time the vnode gets a mapping it already has to be already allocated.

Imagine that vnode gets mapped, then reclaimed. And then the vnode for the mapped node is reallocated.

Getting any writable mappings bumps v_usecount which only goes away if all of them disappear, so I think this creates an invariant that the vnode is not reclaimed in this scenario.

v_usecount does not prevent reclaim. It stops non-forced unmounts and vnlru, but the VFS interface is that only vnode lock prevents reclamation.

I presume this will be reworked with D30070.

  • rebase
  • handle forced unmount
sys/fs/tmpfs/tmpfs_subr.c
142

I don't know if this assert is of any use, cargo culted this bit from the original swap pager.

In D30065#674675, @kib wrote:

v_usecount does not prevent reclaim. It stops non-forced unmounts and vnlru, but the VFS interface is that only vnode lock prevents reclamation.

I don't see any mechanisms to recover from even failed forced unmount. Unless I'm misreading something, usecount from VI_TEXT_REF is also expected to be sufficient.

There was a bug though where forced unmount would tip on the assert that the vnode is no longer referenced, which I fixed.

In D30065#679647, @mjg wrote:
In D30065#674675, @kib wrote:

v_usecount does not prevent reclaim. It stops non-forced unmounts and vnlru, but the VFS interface is that only vnode lock prevents reclamation.

I don't see any mechanisms to recover from even failed forced unmount. Unless I'm misreading something, usecount from VI_TEXT_REF is also expected to be sufficient.

VI_TEXT_REF sufficient for what?

Anyway, if you insist that the situation with object->un_pager.swp.writemappings > 0 is impossible in tmpfs_alloc_vp(), then please put the corresponding assert there.

There was a bug though where forced unmount would tip on the assert that the vnode is no longer referenced, which I fixed.

sys/fs/tmpfs/tmpfs_subr.c
142

It does make sense, but perhaps replace the word 'Splittable' with 'tmpfs vm'

597

There should be a blank line after the last line of the text where the multi-line comment applies.

sys/fs/tmpfs/tmpfs_vfsops.c
113

This should be spelled as vm_object_mightbedirty(obj) or vm_object_mightbedirty_(obj).

143

You did checked for obj == NULL in tmpfs_update_mtime_lazy_filter(), but not there. tmpfs_reclaim() only owns the vnode lock when clearing v_object to NULL. So either the vnode cannot appear on lazy list while being reclaimed (I doubt it), or this place needs the check.

  • more asserts
  • remove stale obj access
markj added inline comments.
sys/fs/tmpfs/tmpfs_subr.c
109
113
This revision is now accepted and ready to land.May 14 2021, 10:45 PM

tested by pho, no problems seen

sys/fs/tmpfs/tmpfs_subr.c
113

fixed locally