Page MenuHomeFreeBSD

ucom(4): synchronously execute param changes
AcceptedPublic

Authored by kevans on May 17 2019, 5:46 PM.

Details

Reviewers
hselasky
ian
trasz
Summary

Param change command is queued into a separate thread for completion and control immediately returns to ucom_param to schedule the transfer enable and line state change commands, along with a quick return to the tty layer. As as side effect, tcsetattr (for instance) may return long before params reach the hardware and things can go a bit belly up without introducing artificial delays.

While here, push the knowledge of what gets executed synchronously out of ucom_queue_command. While it's likely an OK assumption that ucom_cfg_close should always be executed synchronously, there's only one caller for it and this makes it a bit easier to determine at the call-site whether the command will be executed asynchronously or not.

I also tend to wonder if ucom_cfg_open and/or ucom_cfg_line_state should also be synchronous changes, but I don't have as strong of feelings about these.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 24284

Event Timeline

kevans created this revision.May 17 2019, 5:46 PM
hselasky accepted this revision.May 18 2019, 10:25 AM

Just make sure that ucom_param() won't race when you drop the lock.

This revision is now accepted and ready to land.May 18 2019, 10:25 AM

Just make sure that ucom_param() won't race when you drop the lock.

Can you elaborate on which lock you're talking about? My understanding is that ucom_param is serialized by the ucom mtx, but only locking change here should be that the usb process lock may (will) get dropped while we wait for the queue to drain.

What happens if there are two threads calling the function where you drop the lock?

What happens if there are two threads calling the function where you drop the lock?

As far as I can tell, nothing- we're still holding the ucom lock and we drop the process lock so the proc thread can drain the queue. ucom lock effectively stops everything else that could queue and screw up our ordering.

The part that runs off the USB process threads is fine. But what contexts can ucom_param() be called from. Is there any chance of race here?

The part that runs off the USB process threads is fine. But what contexts can ucom_param() be called from. Is there any chance of race here?

Ah, yeah, no- it can only be called with the ucom mtx held and we don't drop that one.

ian added a comment.Jun 6 2019, 6:15 PM

I was going through the code putting "I think this one needs to be sync too" comments here and there. Then I realized I was pretty tagging every call to ucom_queue_command() as "needs to be sync".

So I guess that leads to the base question: why is ANYTHING done async here? In pretty much every case where a call to a tty-layer function results in a call down into this code, this code should not return to the tty layer until the requested action is completed. That's especially true of things like line state and param changes... if you configure a baud rate then send data, you expect that new data to go out at the new line speed.

sys/dev/usb/serial/usb_serial.c
744

Does anybody know what this 100ms wait is for?

798

This comment says to queue the transfer enable last, but that's not what happens... the following 4 function calls also queue async calls to the lower layer. I think that means transfers get enabled before the line state has been properly programmed in the hardware.

810

IMO, there needs to be some kind of wait here so that we don't return to the tty layer until all the things related to processing the open call have completed. Maybe that means every one of the preceeding queue calls need to use the sync flag, or maybe there's some value in queueing them all async and then doing one wait at the end? I don't know, since I've never understood the purpose of doing anything async with multiple threads in this code (there's a comment somewhere that says the *cfg* names are run on a different thread, but I've never seen a comment that says why that is).

1274

Another unexplained 100ms wait... why?

I understand you want synchronous behaviour, but how about implementing a drain command, which is called unlocked from the TTY layer, just before the end of the IOCTL return. Won't that fix the problems you see?

sys/dev/usb/serial/usb_serial.c
1274

Sometimes if you program values in a loop, the firmware in the USB chip might overflow or choke. Make sure we nice programming a bit.

ian added a comment.Jun 6 2019, 6:44 PM

I understand you want synchronous behaviour, but how about implementing a drain command, which is called unlocked from the TTY layer, just before the end of the IOCTL return. Won't that fix the problems you see?

I don't understand what you're saying at all. It's just a normal sane expectation that open/close/read/write/ioctl calls all behave synchronously, and it's especially imporant in serial comms where there is interaction between the read/write data and the other calls that configure the comms or manipulate line state. Also, I've never seen anything documented in relation to ttydevsw that even hints that it's okay to return before the action is complete.

sys/dev/usb/serial/usb_serial.c
1274

Then this should be in the lower-layer driver of a chip that has that problem. It's crazy to sleep in open() long enough to open/close the device hundreds of times if the sleep weren't there.

In D20293#443794, @ian wrote:

I understand you want synchronous behaviour, but how about implementing a drain command, which is called unlocked from the TTY layer, just before the end of the IOCTL return. Won't that fix the problems you see?

I don't understand what you're saying at all. It's just a normal sane expectation that open/close/read/write/ioctl calls all behave synchronously, and it's especially imporant in serial comms where there is interaction between the read/write data and the other calls that configure the comms or manipulate line state. Also, I've never seen anything documented in relation to ttydevsw that even hints that it's okay to return before the action is complete.

I agree about that, but as you know the TTY layer holds a mutex while calling us, so sleeping is not allowed or am I wrong? You don't solve mutex problems by just dropping the TTY lock, that leaves races wide open.

I suggest a new method in the ttydevsw which drain any pending commands off the underlying layer, which can be called outside the tty_lock() so we don't mess with the TTY's internal state.

hselasky added inline comments.Jun 6 2019, 6:53 PM
sys/dev/usb/serial/usb_serial.c
744

You can try removing the delay if you want.

ian added a comment.Jun 7 2019, 3:19 PM

I agree about that, but as you know the TTY layer holds a mutex while calling us, so sleeping is not allowed or am I wrong? You don't solve mutex problems by just dropping the TTY lock, that leaves races wide open.
I suggest a new method in the ttydevsw which drain any pending commands off the underlying layer, which can be called outside the tty_lock() so we don't mess with the TTY's internal state.

Oh right, I had forgotten about tty_lock() being held during calls to ttydevsw stuff. Now I remember looking into all this once before, a few years ago. What I discovered (and re-discovered yesterday) is that lots of places now are dropping and reacquiring the tty lock, unsafely. At a minimum, you have to check tty_gone() after recaquiring it. In addition you have to consider the other parts of tty state that the lock protects and what might have changed there, and that's hard to do because what the lock protects is not documented.

But now at least I have a better idea of what kind of trouble to look for in evaluating these changes.

imp added a comment.Jun 7 2019, 3:22 PM
In D20293#444094, @ian wrote:

I agree about that, but as you know the TTY layer holds a mutex while calling us, so sleeping is not allowed or am I wrong? You don't solve mutex problems by just dropping the TTY lock, that leaves races wide open.
I suggest a new method in the ttydevsw which drain any pending commands off the underlying layer, which can be called outside the tty_lock() so we don't mess with the TTY's internal state.

Oh right, I had forgotten about tty_lock() being held during calls to ttydevsw stuff. Now I remember looking into all this once before, a few years ago. What I discovered (and re-discovered yesterday) is that lots of places now are dropping and reacquiring the tty lock, unsafely. At a minimum, you have to check tty_gone() after recaquiring it. In addition you have to consider the other parts of tty state that the lock protects and what might have changed there, and that's hard to do because what the lock protects is not documented.

We've learned a long time ago that this sort of locking is bad for performance, so we do things like reference counting, epoch guarantees, etc to ensure proper lifetime and concurrent protection. It's long past time to revisit the tty_lock(), but I fear there's too little RoI on that for someone to do it on their own.

But now at least I have a better idea of what kind of trouble to look for in evaluating these changes.