Page MenuHomeFreeBSD

Separate callbacks into Giant locked and non-Giant locked.
Needs ReviewPublic

Authored by hselasky on Jul 5 2021, 1:39 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Apr 7, 5:59 PM
Unknown Object (File)
Mar 31 2024, 9:00 AM
Unknown Object (File)
Mar 6 2024, 3:25 AM
Unknown Object (File)
Dec 24 2023, 11:38 AM
Unknown Object (File)
Dec 13 2023, 11:50 AM
Unknown Object (File)
Nov 17 2023, 11:56 PM
Unknown Object (File)
Nov 17 2023, 8:42 PM
Unknown Object (File)
Sep 16 2023, 8:59 AM
Subscribers
None

Details

Summary

Giant is a lock which may be locked for a longer period of time.
Unfortunately some important subsystems like the console,
still use it, invoking regular timers.

Giant is also used to protect device attach and detach. Sometimes when
doing attach and detach, the Giant lock may be locked for a longer period
of time, due to tight polling loops on the hardware and other heavy
initialization.

When Giant is locked, innocent callbacks using the Giant lock, blocks
the whole queue of completed callbacks for an unacceptable amount of
time. To solve this, two measures are taken:

  1. Avoid blocking other callouts which do not use Giant, move all Giant

locked callbacks to a separate completion queue.

  1. Trylock Giant, to avoid dragging the callout subsystem down into the

congestion, when Giant is locked.

MFC after: 1 week
Sponsored by: NVIDIA Networking

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

hselasky created this revision.
hselasky retitled this revision from Separate callbacks into Giant locked and non-Giant locked. to Separate callbacks into Giant locked and non-Giant locked..Jul 5 2021, 1:42 PM

"Giant is going away" and "Giant is not going away"

I suggest taking a different approach to it. For instance, we can replace all callouts with Giant as the interlock, by taskqueue_enqueue_timeout() for some special single-purpose taskqueue.

I believe this will give us the same effect, but also avoids growing more Giant-aware code.

I agree with Konstantin. Just recently without seeing this review I've fixes several most annoying to me cases of Giant callouts. After that quick search shows me only 38 remaining instances of callout_reset(..., 0), out of which only 6 seem like really use Giant, while most others I bet just need quick code look to be marked mpsafe.

Looking on this particular patch though I don't see what will reschedule SWI again after it fail to acquire Giant lock. If instead of tris approach we created separate Giant clock SWI, it could safely wait for Giant. Besides, it looks like this patch tries to take Gian while CC_LOCK is held, which is a spin lock.

@mav : The best is to get rid of Giant.

The trylock is not the problem. If you cannot get Giant locked, then we simply try again at the next tick or event, assuming periodic=1 .

There is also another thing which is called adaptive locking, which makes Giant a terrible thing blocking callout worker threads at random:

See D31053, if you are interested.

@mav : The best is to get rid of Giant.

I am for it. ;) At least remaining non-MP-safe callouts don't look scary.

Are you familiar with sys/dev/usb/wlan code, or we need somebody from wlan area, same as for rtwn and wtap?

The trylock is not the problem. If you cannot get Giant locked, then we simply try again at the next tick or event, assuming periodic=1 .

Ah, yes. Just possibly half second later. BTW, what is the point of replacing TAILQ_EMPTY() with TAILQ_FIRST()?

In D31052#717779, @mav wrote:

@mav : The best is to get rid of Giant.

I am for it. ;) At least remaining non-MP-safe callouts don't look scary.

Are you familiar with sys/dev/usb/wlan code, or we need somebody from wlan area, same as for rtwn and wtap?

Yes, but mostly the USB part. All drivers have a softc and each softc has a mutex which you can use for the callouts, but I don't know about LOR which might happen inside the WLAN core code.

The trylock is not the problem. If you cannot get Giant locked, then we simply try again at the next tick or event, assuming periodic=1 .

Ah, yes. Just possibly half second later. BTW, what is the point of replacing TAILQ_EMPTY() with TAILQ_FIRST()?

Do you like HPS-style ? I always thought it was overhead to use "!TAILQ_EMPTY()" and then "TAILQ_FIRST()" to get the first entry.

I always thought it was overhead to use "!TAILQ_EMPTY()" and then "TAILQ_FIRST()" to get the first entry.

For compilers they should be identical, while for humans I think !EMPTY is more explicit. I agree about not using both, but here it is not the case.

In D31052#717779, @mav wrote:

Are you familiar with sys/dev/usb/wlan code, or we need somebody from wlan area, same as for rtwn and wtap?

Yes, but mostly the USB part. All drivers have a softc and each softc has a mutex which you can use for the callouts, but I don't know about LOR which might happen inside the WLAN core code.

I mean just quickly review whether Giant help there at all, or they just need to be declared mpsafe.