Page MenuHomeFreeBSD

led: Eliminate sleepable mutex
Changes PlannedPublic

Authored by kbowling on Oct 9 2021, 3:10 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 9, 4:28 PM
Unknown Object (File)
Mon, Jan 9, 8:10 AM
Unknown Object (File)
Jan 1 2023, 6:54 PM
Unknown Object (File)
Dec 14 2022, 6:08 AM

Details

Reviewers
jhb
markj
emaste
Summary

This causes a LOR with the iflib ctx sx when creating led devices in the framework.

Lock order reversal: (sleepable after non-sleepable)                                                                                
 1st 0xffffffff81be8dd8 LED mtx (LED mtx, sleep mutex) @ /usr/src/sys/dev/led/led.c:298                                             
 2nd 0xfffff800018dc980 iflib ctx lock (iflib ctx lock, sx) @ /usr/src/sys/net/iflib.c:4544                                         
lock order LED mtx -> iflib ctx lock attempted at:                                                                                  
#0 0xffffffff80c9a63f at witness_checkorder+0xd8f                                                                                   
#1 0xffffffff80c34077 at _sx_xlock+0x67                                                                                             
#2 0xffffffff80d6c1ad at iflib_led_func+0x2d                                                                                        
#3 0xffffffff80716e39 at led_create_state+0x189                                                                                     
#4 0xffffffff80d66a22 at iflib_device_register+0x1792

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Thank you!

Side note:

Also see: D31215

Would this solve the problem on those buses for you?

Separately I wonder if there is a race in destroy and the callout that could hit here, I may need to solve liveliness between the two.

Would this solve the problem on those buses for you?

I'll test with an RPi and let you know.

sys/dev/led/led.c
59

Unfortunately this is not right either: led_timeout() is a callout function, and is thus not allowed to sleep.

302

I wonder if we can simply stop holding the mutex across the sc->func call. The led device won't be destroyed concurrently since nothing has a reference to it yet.

Thanks @markj what do you think of this approach using epoch?

It was pointed out to me there is still an issue in that we pick up a new sx within the epoch, both in create and timeout with my patch as is For some reason the epoch _preempt variety doesn't produce a diagnostic so I will look into that code.

It's not immediately obvious to me what the best path forward is, will a thread like D31215 be necessary?

sys/dev/led/led.c
302

Wouldn't the locking still be problematic in the _timeout?

I was away most of the weekend, sorry.

It was pointed out to me there is still an issue in that we pick up a new sx within the epoch, both in create and timeout with my patch as is For some reason the epoch _preempt variety doesn't produce a diagnostic so I will look into that code.

Yes, epoch readers must not sleep. I believe we log a warning only when a thread in a non-sleepable section actually goes to sleep. In this case, you'll only see it if there's contention for led_sx, causing a thread to block. Probably we should modify the sx_* implementation to check THREAD_CAN_SLEEP() on every operation (only when INVARIANTS is on). Callout threads (see softclock_call_cc()) and epoch_enter_preempt() both use TD_NO_SLEEPING() to signal their requirement to other subsystems.

It's not immediately obvious to me what the best path forward is, will a thread like D31215 be necessary?

led(4) assumes that the driver's callback does not sleep. That was ok pre-iflib because the driver lock was a mtx(9). To restore led(4) support in em, either the callback needs to become non-sleepable, or led(4) needs to schedule a taskqueue thread from the callout as is suggested in D31215. I guess the main question is, what should happen if something tries to blink the LED while the interface is being reset? I note for example that em_if_stop() turns off the LED. One strategy might be to have the callback try to acquire the ctx lock, and upon failure record the request in the ctx (perhaps with the state mutex held) so that iflib_init_locked() can toggle the LED as requested once the reset is finished.