Page MenuHomeFreeBSD

5/7 ufs: use lazy list instead of active list for syncer
ClosedPublic

Authored by mjg on Jan 1 2020, 10:57 PM.

Details

Summary

Quota code is temporarily regressed to do a full vnode scan. I find it rather iffy and would prefer to not convert it right now. I don't think it's a big deal.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mjg created this revision.Jan 1 2020, 10:57 PM
jeff added inline comments.
sys/ufs/ufs/inode.h
154 ↗(On Diff #66225)

This is big enough to want to be a function.

mjg added inline comments.Jan 2 2020, 8:45 PM
sys/ufs/ufs/inode.h
154 ↗(On Diff #66225)

I don't have a strong opinion, I see I wrote the macro a little bit poorly though.

Note that passed flags are known at compilation time, so should the code be:

if (_flags &  UFS_INODE_FLAG_DIRTY_MASK)
        vdirty(_vp);

the compiler can *always* elide the branch and either do nothing or insert a mandatory vdirty call.

mjg updated this revision to Diff 66294.Jan 3 2020, 12:33 PM
  • check newly set flags to see if vdirty is needed
  • properly test if there are any flags missing
kib added inline comments.Jan 4 2020, 4:52 AM
sys/ufs/ffs/ffs_vfsops.c
1485 ↗(On Diff #66294)

So this is basically written on assumption that dirty vnode must have IN_ flag set. I do not think it is true, IN_ might be cleared by ffs_update() regardless of the dirtyness of the data blocks.

And reverse, although most places that dirty data blocks set IN_MODIFIED, it was never an intent to ensure that _each_ data modification is accompanied by a later flag set, Flags could be coalesced, set later and so on. In fact, you might have better luck looking at the length of the dirty buffers queue length for the vnode.

mjg added inline comments.Jan 4 2020, 10:42 AM
sys/ufs/ffs/ffs_vfsops.c
1485 ↗(On Diff #66294)

dirty vnodes are explicitly allowed to not have any IN_ flags set, this is part of why pre-filtering was added (here with ffs_sync_lazy_filter). Longer intent is to allow dirty vnodes to get very lazily dequeued (not necessarily on last vdrop). Note that stock kernel also walks vnodes which might have no relevant flags set.

ffs_sync_lazy only acts on vnodes depending on a subset of IN_ flags. With the patchset the invariant is that any vnode which has any flag from the subset is also present on the dirty list. iow stock kernel may iterate over more vnodes (since active list is likely to be bigger), but all the vnodes it would decide to work on will be the same vnodes which on patched kernel are both on the dirty list and have appropriate IN_* flags set. It is illegal to have appropriate IN_* flags and not be on the dirty list.

That said, with the invariant in place, I consider the behavior to be equivalent in terms of what gets worked on. If for whatever reason the flags are not sufficient, head is already buggy.

mjg retitled this revision from 3/5 ufs: use dirty list instead of active list for syncer to 5/7 ufs: use dirty list instead of active list for syncer.Jan 5 2020, 4:06 AM

The heavy use of qsync() is in ffs_sync_lazy() and ffs_sync(). These should be using qsyncvp() in their loops rather than separately calling qsync() as show in my attached diff.


This change is independent of your changes, so I should just check it in as a separate change.

kib added inline comments.Jan 7 2020, 2:41 PM
sys/ufs/ffs/ffs_vfsops.c
1485 ↗(On Diff #66294)

In which sense 'head is already buggy'. Vnode may be dirty, that is, require either {a,m,c}time update, or generic inode update, or any data or extdata block updates. The later two cases are not covered by flags.

May be it is a terminology issue, so what you call dirty list is not, and just a list of vnodes requiring some syncing. E.g. you still need a global scan to sync all data blocks. But then I would suggest avoid using the 'dirty list' term.

mjg added a comment.Jan 7 2020, 5:22 PM

The heavy use of qsync() is in ffs_sync_lazy() and ffs_sync(). These should be using qsyncvp() in their loops rather than separately calling qsync() as show in my attached diff.


This change is independent of your changes, so I should just check it in as a separate change.

I noted I don't know what guaratees are needed for quota files, thus I for safety I regressed the quota-induced scan to go over all vnodes in the mount point. Getting rid of the extra loop is a laudable goal of course, but I don't know how it should look like on a kernel with my patch. If the current places which put the vnode on the dirty list are sufficient that's great, but chances are excellent quota code needs extra calls.

That said, can you please hold off with this patch until this stuff gets into the tree?

mjg added inline comments.Jan 7 2020, 5:42 PM
sys/ufs/ffs/ffs_vfsops.c
1485 ↗(On Diff #66294)

I have no reservations against renaming this, but I don't have good alternatives. To perhaps keep it clearer I'll refer to the list as NEW in this comment.

I maintain that the following part of the loop:

if ((error = vget(vp, LK_EXCLUSIVE | LK_NOWAIT | LK_INTERLOCK,
      td)) != 0)
          continue;
  if (sync_doupdate(ip))
          error = ffs_update(vp, 0);

will be executed for the same vnodes on both stock and patched kernel. 2 caveats given the racy nature of these checks:

  • prefiltering may make it more likely to skip the vnode than on stock kernel
  • VI_OWEINACT vnodes are not immediately guaranteed to be on the NEW list, but even if they are not, whoever is doing vput will take care of them OR they will defer inactive processing at which point said vnode will be added to the NEW list

Thus, if this is not sufficient to do everything syncer should do, stock head is already buggy in this regard and I consider the matter to be orthogonal to the patch.

I noted I don't know what guaratees are needed for quota files, thus I for safety I regressed the quota-induced scan to go over all vnodes in the mount point. Getting rid of the extra loop is a laudable goal of course, but I don't know how it should look like on a kernel with my patch. If the current places which put the vnode on the dirty list are sufficient that's great, but chances are excellent quota code needs extra calls

Quotas guarantees are the same as promises fora file's data being on disk. Specifically, quota on-disk update should get to disk with the same promise as file data. So, if a lazy sync pushes file data it should push quota data. An fsync (which is a hard promise of data on disk) should ensure that quota data is also written. My proposed change meets these criterion.

That said, can you please hold off with this patch until this stuff gets into the tree?

I will delay making my change until you have completed this one (assuming that you complete this one in the next few weeks).

mjg added a comment.Jan 7 2020, 6:59 PM

Quotas guarantees are the same as promises fora file's data being on disk. Specifically, quota on-disk update should get to disk with the same promise as file data. So, if a lazy sync pushes file data it should push quota data. An fsync (which is a hard promise of data on disk) should ensure that quota data is also written. My proposed change meets these criterion.

I know. I meant that stock head will inspect the vnode just for the basis of v_holdcnt > 0. With my patchset this will no longer be the case. It may that vdirty() calls added by the patchset provide the guarantee that all vnodes which need to be inspected will be found on the dirty list, but that I have no easy means of verifying (thus the regression to scan the entire list). Should it be that the calls as it is are sufficient, great. If they are not, I'm sure someone familiar with this code can add missing calls and then switch this to scan the dirty list.

I will delay making my change until you have completed this one (assuming that you complete this one in the next few weeks).

Thank you.

kib added a comment.Jan 7 2020, 7:16 PM
In D22996#505630, @mjg wrote:

Quotas guarantees are the same as promises fora file's data being on disk. Specifically, quota on-disk update should get to disk with the same promise as file data. So, if a lazy sync pushes file data it should push quota data. An fsync (which is a hard promise of data on disk) should ensure that quota data is also written. My proposed change meets these criterion.

I know. I meant that stock head will inspect the vnode just for the basis of v_holdcnt > 0. With my patchset this will no longer be the case. It may that vdirty() calls added by the patchset provide the guarantee that all vnodes which need to be inspected will be found on the dirty list, but that I have no easy means of verifying (thus the regression to scan the entire list). Should it be that the calls as it is are sufficient, great. If they are not, I'm sure someone familiar with this code can add missing calls and then switch this to scan the dirty list.

Quota data is not per-file, the min chunk is aggregated per-uid or per-gid summary. So even if you write out the qvp data for given vnode, it should not be expected that after the system crash the quota data is reasonable, because other files metadata from the same uid might be not synced. That means that effectively quotas should be on disk only on full sync (syncer vnode fsync), and there we can give any guarantees only when the volume is either being unmounted or suspended.

I believe it is more reasonable to treat quotas for the purpose of sync same as any other file, with the caveat that writing out quota file data should be preceded by spit of qvp's data to the quota file data block.

mjg added a comment.Jan 7 2020, 11:22 PM

I don't want to deal with quote code myself and I don't think it's a blocker for this patch given the trivial work around (of temporarily regressing it to do a full vnode scan).

Do you have further objections when it comes to the dirty list?

Perhaps I should clarify that longer intent is to let vnodes stay on it even after hold count drops to 0 so that traffic on this list does not become a bottleneck. They would be dequeued on recycle and perhaps sometimes by code iterating the list. I think "dirty" with an implied ("possibly") is a ok name.

mjg added a comment.Jan 9 2020, 3:43 PM

Would s/dirty/worklist/ be fine? name stolen from syncer

kib added a comment.Jan 11 2020, 3:20 PM
In D22996#506421, @mjg wrote:

Would s/dirty/worklist/ be fine? name stolen from syncer

worklist is already used by SU code for something different in the context of UFS.

Might be lazylist or lazysynclist.

mjg updated this revision to Diff 66626.Jan 11 2020, 4:26 PM
mjg retitled this revision from 5/7 ufs: use dirty list instead of active list for syncer to 5/7 ufs: use lazy list instead of active list for syncer.
  • rename dirty to lazy
jeff accepted this revision.Jan 12 2020, 9:25 PM

I believe mjg is correct and this provides the same guarantee that exited before.

This revision is now accepted and ready to land.Jan 12 2020, 9:25 PM
This revision was automatically updated to reflect the committed changes.