Page MenuHomeFreeBSD

Attempt to fix race conditions and improve flexibility of iic(4), fix locking in iicbus_request_bus() and iicbus_release_bus()
ClosedPublic

Authored by jah on Mar 25 2015, 5:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 28, 6:51 AM
Unknown Object (File)
Nov 13 2024, 4:08 PM
Unknown Object (File)
Nov 3 2024, 1:59 PM
Unknown Object (File)
Oct 15 2024, 2:59 PM
Unknown Object (File)
Oct 2 2024, 5:31 PM
Unknown Object (File)
Oct 2 2024, 4:15 PM
Unknown Object (File)
Oct 2 2024, 2:42 PM
Unknown Object (File)
Sep 29 2024, 5:12 PM
Subscribers

Details

Summary

Fix numerous issues in iic(4) and iicbus(4):
--Allow multiple open iic fds by storing addressing state in cdevpriv
--Fix, as much as possible, the baked-in race conditions in the iic ioctl interface by requesting bus ownership on I2CSTART, releasing it on I2CSTOP/I2CRSTCARD, and requiring bus ownership by the current cdevpriv to use the I/O ioctls
--Reduce internal iic buffer size and remove 1K read/write limit by iteratively calling iicbus_read/iicbus_write
--Eliminate dynamic allocation in I2CWRITE/I2CREAD
--Move handling of I2CRDWR to separate function and improve error handling
--Add new I2CSADDR ioctl to store address in current cdevpriv so that I2CSTART is not needed for read(2)/write(2) to work
--Redesign iicbus_request_bus() and iicbus_release_bus():

--iicbus_request_bus() no longer falls through if the bus is already owned by the requesting device.  Multiple threads on the same device may want exclusive access.  Also, iicbus_release_bus() was never device-recursive anyway.
--Previously, if IICBUS_CALLBACK failed in iicbus_release_bus(), but the following iicbus_poll() call succeeded, IICBUS_CALLBACK would not be issued again
--Do not hold iicbus mtx during IICBUS_CALLBACK call.  There are several drivers that may sleep in IICBUS_CALLBACK, if IIC_WAIT is passed.
Test Plan

Tested with various i2c devices on cx88 capture cards, also used i2c(8) to read chip ID of i2c demodulator

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

jah retitled this revision from to Attempt to fix race conditions and improve flexibility of iic(4), fix locking in iicbus_request_bus() and iicbus_release_bus().
jah updated this object.
jah edited the test plan for this revision. (Show Details)

I agree with most of your changes (and thank you for tackling this), but maybe it is time to get rid of the busy loop on EWOULDBLOCK case.

EWOULDBLOCK could only happen when the caller request the iicbus with IIC_DONTWAIT, wouldn't be better to just return EWOULDBLOCK in this case ?

Then the caller could deal with retrying if necessary.

I think this would also make the lock dropping on iicbus_poll() unnecessary.

What do you think ?

sys/dev/iicbus/iiconf.c
151 ↗(On Diff #4408)

The brace should be on if() line.

In D2140#5, @loos wrote:

I agree with most of your changes (and thank you for tackling this), but maybe it is time to get rid of the busy loop on EWOULDBLOCK case.

EWOULDBLOCK could only happen when the caller request the iicbus with IIC_DONTWAIT, wouldn't be better to just return EWOULDBLOCK in this case ?

Then the caller could deal with retrying if necessary.

I think this would also make the lock dropping on iicbus_poll() unnecessary.

What do you think ?

Haha, I was wondering if someone would bring that up. That approach seems more logical to me, and in fact my first attempt (yesterday) to rewrite iicbus_request_bus() did exactly what you describe: the same logic as now, but no outer for loop, iicbus_poll was called while (!error && owner != NULL), and didn't drop the lock.

I'm a little worried about what might break if I did that, which is why I chickened out of doing it that way. A quick grep of sys/ tells me:
arm/xscale/ixp425/cambria/cambria_gpio.c: always passes IIC_DONTWAIT
dev/iicbus/if_ic.c: always passes IIC_WAIT
dev/etherswitch/rtl8366/rtl8366rb.c: passes IIC_WAIT in one case, can pass either in the other case (based on sleep param to smi_acquire())

Everything else that calls it is either acting as a passthrough (iicsmb.c), or acting on behalf of usermode (iic.c).

--For iicsmb.c, it looks like the only callers of smbus_request_bus() either pass SMB_WAIT, or are smb.c. Not surprisingly, smb(4) and smbus_request_bus() look like they have the same problems as iic.

--If acting on behalf of usermode, it would make sense to honor O_NONBLOCK as much as possible and just return EWOULDBLOCK. There's maybe some theoretical risk that would make some existing application fail, but that seems unlikely to me.

So there's some risk, but maybe it's easy enough to deal with?

Don't loop on EWOULDBLOCK.
I think we can deal with any fallout. There might be none.

sys/dev/iicbus/iic.c
48 ↗(On Diff #4435)

Why 128 and not 1024? There were some changes to smbus recently to allow 1024, and smbus is just a fancy iic bus.

170 ↗(On Diff #4435)

priv can't be NULL here. clang will warn about this someday.
Coverity warns about it today.

201 ↗(On Diff #4435)

How do you know nothing can or will sleep on this lock?

301 ↗(On Diff #4435)

buf can't be NULL.

313 ↗(On Diff #4435)

usrbufs can't be NULL.

322 ↗(On Diff #4435)

m->buf can't be NULL here.

sys/dev/iicbus/iiconf.c
108 ↗(On Diff #4435)

why wakeup_one and not just wakeup?

136 ↗(On Diff #4435)

Some more explanation of why this is necessary is needed. It would be good to enshrine this requirement in the IICBUS_CALLBACK wrapper somehow, but since that's generated by the newbus stuff, perhaps that would be hard since the only fallback is to testing this in each bus' callback routine.

146 ↗(On Diff #4435)

Same question as above.

sys/dev/iicbus/iic.c
48 ↗(On Diff #4435)

Why use up 1K in the softc when it isn't necessary to do so?
This size is no longer a hard upper limit on the transfer length. If more than 128 bytes need to be moved, iicuio_move will call the necessary I/O function iteratively. I2C is slow enough that the smaller chunk size shouldn't matter for performance.

170 ↗(On Diff #4435)

Oops, I didn't realize M_WAITOK mallocs aren't *ever* allowed to fail.
But, that brings up another question: should we try to honor O_NONBLOCK by passing M_NOWAIT in that case?

201 ↗(On Diff #4435)

What do you mean by this? This lock is part of cdevpriv, so it can only be used in cdev calls. It's being destroyed in the cdevpriv dtor, which isn't executed until any outstanding cdev calls are purged. That would include anything that might contend or sx_sleep() on it.

sys/dev/iicbus/iiconf.c
108 ↗(On Diff #4435)

Why schedule every waiting thread when only (at most) one of them will be able to take ownership of the bus?

--Shrink chunk size to 32 bytes (same as smbus blocksize), move buffer to stack
--Respect O_NONBLOCK for mallocs
--Document IICBUS_CALLBACK locking assumptions in iiconf.c and iicbus interface

Change chunk size back to 128 bytes. That should still be OK to keep on the kernel stack. 32 might be too stingy for faster iic controllers.

imp, loos: can you guys look over this again when you have time?
I also need some clarification on Warner's comment about the lock.

Most code uses wakeup() instead of wakeup_one(), though in this case you might be fine to use wakeup_one().

Also, I think it is probably worth cleaning up the smbus code as well. However, there's another pending change to that code that you will want to avoid colliding with in D1955

sys/dev/iicbus/iic.c
133 ↗(On Diff #4638)

You can drop these casts while you are here (I realize they are in the original code) since C allows implicit conversion from void *.

213 ↗(On Diff #4638)

Please format block comments as:

/*
 * This is a block comment.
 */
223 ↗(On Diff #4638)

Should you loop on a short write or perhaps convert that to an explicit error? (That is check for num_bytes == transferred_bytes after this completes.)

225 ↗(On Diff #4638)

Please put the 'else' on the line as the '}'

300 ↗(On Diff #4638)

I don't know if other drivers interpret O_NONBLOCK to mean M_NOWAIT for malloc(). That might be fine, but it strikes me as unusual.

303 ↗(On Diff #4638)

You could collapse:

error = ENOMEM;
return (error);

to:

return (ENOMEM);

The latter seems more common in the kernel. error variables are typically used either to use goto to jump to common cleanup handling, or to save the return value of a function before dropping a lock.

398 ↗(On Diff #4638)

Please move this '{' up to the previous line. (You follow this convention already in most other places)

411 ↗(On Diff #4638)

"release release" ? :)

416 ↗(On Diff #4638)

For non-boolean checks, testing against zero is preferred. For example, priv->started is a boolean, so '!priv->started' is fine, but error can have a range of values, so please use 'if (error == 0)' instead. Note that 'if (error)' is a commonly (ab)used violation of this rule (should be 'if (error != 0)', but I believe that most code is better about using '== 0' to check for lack of error.

sys/dev/iicbus/iic.c
223 ↗(On Diff #4638)

Hmm, good point. I already allow short reads by honoring transferred_bytes in uiomove. I think I'd like to go for maximum flexibility and loop on short writes. I am operating on the assumption that the underlying bus code won't be stupid enough to, for example, perpetually return 0 bytes read/written without also returning some error.

I think in most, if not all, cases a short read or write will correspond with an iic bus error of some kind.

300 ↗(On Diff #4638)

I dug around the mailing lists a little, and the consensus seems to be that O_NONBLOCK shouldn't imply M_NOWAIT, because malloc isn't expected to block for any length of time, and if it does you probably have bigger problems. I'm fine with just deleting that code & assuming the mallocs won't fail.

303 ↗(On Diff #4638)

Oops, that was leftover from a goto I realized I didn't need.

411 ↗(On Diff #4638)

Hey, at least I'm only double-releasing in a comment :)

addressing jhb's feedback:

  • looping on short writes
  • always using M_WAITOK
  • style(9) fixes
jhb edited edge metadata.

I think this looks fine. I'm not really an i2c expert though I probably did the damage of the initial locking here.

This revision is now accepted and ready to land.Apr 9 2015, 9:00 PM

@imp, @loos: can you guys look over this again as soon as you get a chance and see if you find it acceptable from an iic perspective?

I'll commit this change as soon as everyone's OK with it. After that, my plan is to update iic.4 to fix some stale info. I'd like for that to be a separate review.
Once that's done, I'll look at smb(4) to see what needs to be done there, after D1955 is committed. I don't know if I'll have time to do anything there before the 11.0 freeze though.

imp edited edge metadata.

I think we're good.

jah updated this revision to Diff 4937.

Closed by commit rS281828 (authored by @jah).