On a large machine (600GB RAM, 128c), I ran into an issue at boot with INVARIANTS where the VFS code was locking and unlocking 32K bucket and vnode locks, and was tripping over the KASSERT in TD_LOCKS_DEC(), due to the number of locks it was holding exceeding the carrying capacity of a 16b signed number. Making td_locks unsigned (and adjusting the KASSERT) was not enough, but making td_locks an int fixes it. Note that, at least on 64b platforms, this change does not bloat the thread struct, as the increased width of td_locks is absorbed by an implicit alignment pad between td_stopsched and td_blocked.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
I think the need to bump it comes from the pattern in the VFS code to take all the bucket and vnode locks, which are autoscaled based on the size of the machine. Are there other patterns like this with RW, SX and lockmanager locks?
I'm reluctant to make large changes to core data structures which could impact cacheline alignment and hence performance this close to a release. If you're brave enough to do that, I'm happy to close this review and let you make changes to the other fields in additions to this one.
the current data placement is a total cluster-f-word as it is with no real coherency to any of it. so I don't think it requires any /bravery/.
all that said, if you insist on not doing the change yourself, i'm happy to do it myself
so I just pushed two commits, including https://cgit.FreeBSD.org/src/commit/?id=9a19595cad9247902fbdadbb2b8fe61bb3a1dab1
this can be closed then