Page MenuHomeFreeBSD

Fix data race in scsi cd driver
ClosedPublic

Authored by smalukav_gmail.com on Aug 30 2021, 11:36 AM.

Details

Summary

There is a data race between cdsysctlinit https://reviews.freebsd.org/source/src/browse/main/sys/cam/scsi/scsi_cd.c$508 and cdcheckmedia https://reviews.freebsd.org/source/src/browse/main/sys/cam/scsi/scsi_cd.c$2680 . Both functions change softc->flags without synchronization.

We encountered the following scenario:

  1. One thread enters cdsysctlinit
  2. Another thread enters g_raid_taste → g_access → cdopen → cdcheckmedia
  3. Both threads change softc->flags at the same time
  4. cdcheckmedia fails to set CD_FLAG_MEDIA_WAIT and hangs off there https://reviews.freebsd.org/source/src/browse/main/sys/cam/scsi/scsi_cd.c$2701

To fix we lock periph in cdsysctlinit

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

While it looks OK to me, I think it would be good for different drivers to be as close as possible. And I see da got their locking slightly different in af1823cde891.

In D31726#715924, @mav wrote:

While it looks OK to me, I think it would be good for different drivers to be as close as possible. And I see da got their locking slightly different in af1823cde891.

Do you mean using cam_periph_acquire and cam_periph_release_locked? I suppose we should ref the periph in cdsysctlinit. We use the same pattern in cdopen https://reviews.freebsd.org/source/src/browse/main/sys/cam/scsi/scsi_cd.c$756 .

In D31726#715924, @mav wrote:

While it looks OK to me, I think it would be good for different drivers to be as close as possible. And I see da got their locking slightly different in af1823cde891.

Do you mean using cam_periph_acquire and cam_periph_release_locked? I suppose we should ref the periph in cdsysctlinit. We use the same pattern in cdopen https://reviews.freebsd.org/source/src/browse/main/sys/cam/scsi/scsi_cd.c$756 .

I meant taking cam_periph_lock() just around softc->flags |= DA_FLAG_SCTX_INIT; to make drivers textually close. You can not add sysctls while holding periph lock, since the functions use M_WAITOK, that is why taskqueue is used at all. Plus while there, it would be good to review what else may need to be protected, alike to done in the mentioned patch.

Move using periph lock to around softc->flags change. Add owning asserts.

This revision is now accepted and ready to land.Sep 11 2021, 4:21 PM
This revision was automatically updated to reflect the committed changes.