Page MenuHomeFreeBSD

SU+J: all writes to SU journal must be exempt from runningbufspace throttling
ClosedPublic

Authored by kib on Nov 12 2024, 6:36 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 12, 7:08 AM
Unknown Object (File)
Sun, Oct 12, 6:38 AM
Unknown Object (File)
Sep 17 2025, 8:09 AM
Unknown Object (File)
Sep 15 2025, 11:16 PM
Unknown Object (File)
Sep 10 2025, 8:32 AM
Unknown Object (File)
Sep 5 2025, 2:53 PM
Unknown Object (File)
Jul 29 2025, 8:23 AM
Unknown Object (File)
Jul 11 2025, 12:24 PM
Subscribers

Details

Reviewers
mckusick
markj
Summary
bufwrite(): style

Use bool for vp_md.  Compactify the calculation.
Explicitly check for non-zero when testing flags.

bufwrite(): adjust the comment

The statement about 'do not deadlock there' is false, since this write
might need other writes to finish, which cannot be started due to
runningbufspace.

SU+J: all writes to SU journal must be exempt from runningbufspace throttling

regardless whether they come from the system thread or initiated from a
normal thread helping the system.  If we block waiting for other writes,
that writes might not finish because our journal updates block that.

Set TDP_NORUNNINGBUF around softdep_process_journal().

PR:     282449

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Nov 12 2024, 6:36 AM

I ran all of the stress2 SU+J tests without seeing any issues with this patch.

I think this change is probably ok.

The larger invariant to preserve is that the filesystem cannot issue async I/O to implement bstrategy(). If it does, then the maximum number of recursive I/Os must be less than lorunningspace/maxphys (on small memory systems this can be 0), OR, the code issuing the async writes must set TDP_NORUNNINGSPACE.

I tried implementing your suggestion to make bufwait(B_ASYNC) block if the write exceeds the hirunningspace threshold. I think this solves the problem, but it's hard to be sure, as the bawrite() in softdep_process_journal() is hard to trigger (normally this work is handled by the flushing thread). It is a bit complex (requires a new B_ASYNCWAIT buf flag and some locking to synchronize the sleep/wakeup), but I believe it relaxes this invariant. I'm not sure whether to continue working on it.

This revision is now accepted and ready to land.Nov 13 2024, 2:49 PM

If we block waiting for other writes, that writes might not finish because our journal updates block that.

Another solution is to use bwrite() instead of bawrite() if the current thread is subject to the runningbufspace limit. I don't know if it's a good idea.

I believe the commit message should mention the runningbufspace mechanism somehow.

I think this change is probably ok.

The larger invariant to preserve is that the filesystem cannot issue async I/O to implement bstrategy(). If it does, then the maximum number of recursive I/Os must be less than lorunningspace/maxphys (on small memory systems this can be 0), OR, the code issuing the async writes must set TDP_NORUNNINGSPACE.

I tried implementing your suggestion to make bufwait(B_ASYNC) block if the write exceeds the hirunningspace threshold. I think this solves the problem, but it's hard to be sure, as the bawrite() in softdep_process_journal() is hard to trigger (normally this work is handled by the flushing thread). It is a bit complex (requires a new B_ASYNCWAIT buf flag and some locking to synchronize the sleep/wakeup), but I believe it relaxes this invariant. I'm not sure whether to continue working on it.

Well, I initally mean to simply convert B_ASYNC to sync by clearing the flag, if runningbufspace limit would be hit. My reasoning to switch to TDP_NORBS is to be more like existing solution for the buffer daemon instead, to not provide two separate hacks for the same problem. I do not think that B_ASYNCWAIT is needed.

If we block waiting for other writes, that writes might not finish because our journal updates block that.

Another solution is to use bwrite() instead of bawrite() if the current thread is subject to the runningbufspace limit. I don't know if it's a good idea.

I believe the commit message should mention the runningbufspace mechanism somehow.

Isn't the commit herald line already mentions runningbufspace? But I added the note about alternate solution into the commit msg.

In D47523#1084446, @kib wrote:

I think this change is probably ok.

The larger invariant to preserve is that the filesystem cannot issue async I/O to implement bstrategy(). If it does, then the maximum number of recursive I/Os must be less than lorunningspace/maxphys (on small memory systems this can be 0), OR, the code issuing the async writes must set TDP_NORUNNINGSPACE.

I tried implementing your suggestion to make bufwait(B_ASYNC) block if the write exceeds the hirunningspace threshold. I think this solves the problem, but it's hard to be sure, as the bawrite() in softdep_process_journal() is hard to trigger (normally this work is handled by the flushing thread). It is a bit complex (requires a new B_ASYNCWAIT buf flag and some locking to synchronize the sleep/wakeup), but I believe it relaxes this invariant. I'm not sure whether to continue working on it.

Well, I initally mean to simply convert B_ASYNC to sync by clearing the flag, if runningbufspace limit would be hit. My reasoning to switch to TDP_NORBS is to be more like existing solution for the buffer daemon instead, to not provide two separate hacks for the same problem. I do not think that B_ASYNCWAIT is needed.

This is what I tried first, it doesn't work. bufdone() needs to know about this special case, since it handles completion differently depending on whether B_ASYNC was set or not.

In D47523#1084446, @kib wrote:

Well, I initally mean to simply convert B_ASYNC to sync by clearing the flag, if runningbufspace limit would be hit. My reasoning to switch to TDP_NORBS is to be more like existing solution for the buffer daemon instead, to not provide two separate hacks for the same problem. I do not think that B_ASYNCWAIT is needed.

This is what I tried first, it doesn't work. bufdone() needs to know about this special case, since it handles completion differently depending on whether B_ASYNC was set or not.

For me, bufdone() does brelse() itself for async case. If the buffer is switched to sync, the wait and brelse happen in bufwait(), while bufdone does only the wakeup. Let me be more clean: I propose to clear B_ASYNC as if bawrite() did not set the flag, so all processing in bufwrite() is for bwrite().