Page MenuHomeFreeBSD

g_vfs_done: Only report ENXIO once
ClosedPublic

Authored by imp on Apr 23 2022, 2:46 PM.

Details

Summary

The contract with the lower layers is that once ENXIO is reported, all
further I/O to the device is not possible. This is reported when the
device departs for good or changes in some material manner out from
underneath the system. Since the lower layers terminate all pending I/O
when this is detected with ENXIO, reporting more than one provides no
extra value.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

imp requested review of this revision.Apr 23 2022, 2:46 PM
imp added reviewers: phk, mav, chs, mckusick, kib, jhb, markj.

I don't have strong feelings about the behaviour or its default. I can't remember if I ever found those messages helpful.

Did you consider printing only a message for the first ENXIO to a given g_vfs_softc? With that approach, I think you could avoid adding a sysctl.

share/man/man4/geom.4
449 ↗(On Diff #105345)

Maybe clarify a bit: "reports" means "prints to the console".

sys/geom/geom_kern.c
232 ↗(On Diff #105345)
sys/geom/geom_vfs.c
153–154

I think, if such option is ever present, you should also add a logging of the sc_enxio_active = 1 event. In other words, we should clearly report that all errors are converted to ENXIO from now on and then g_report_enxio rules.

sys/geom/geom_vfs.c
153–154

So add something like

if (sc->sc_enxio_active == 0)
    g_print("g_vfs_done() converting all future errors to ENXIO for %s", mumble);

before we set sc_enxio_active = 1? Or do it conditionally based on g_report_enxio? I think we just want to do that always on the 0->1 transition, independent of this option. There's no other place to do this only once, and UFS, the current only user of CVTENXIO doesn't report anything.

You say 'if such an option'. does that mean you'd rather not see an option and just never report ENXIO or are you saying you are just luke warm on this notion?

sys/geom/geom_vfs.c
153–154

Yes, I think that new lines you pasted above are fine, together with existing addition to g_vfs_done() in this review. By 'if such option' I mean that the line above ("... converting ...") becomes quite useful if you also add the existing silencing change already in the review.

I do not object to or propose to abandon g_report_enxio.

imp retitled this revision from geom: kern.geom.report_enxio to g_vfs_done: Only report ENXIO once.Apr 23 2022, 9:05 PM
imp edited the summary of this revision. (Show Details)
imp edited the test plan for this revision. (Show Details)

Rework per markj's suggestion

I like his approach of reporting the first one.
A new review for entering ENXIO mode, which is independent of this
but a little related, will be forthcoming

imp marked 2 inline comments as done.Apr 23 2022, 9:10 PM
sys/geom/geom_vfs.c
154–160

Is the check for sc_enxio_reported racy? I mean, could two threads execute this condition and then assign true, simultaneously?

If yes, I do not think it is worth fixing, but at least noting it in a comment, for the reader' sake, would be useful.

This revision is now accepted and ready to land.Apr 23 2022, 10:22 PM
sys/geom/geom_vfs.c
154–160

I'll add a comment. It's only for a printf...

The race is tricky to trigger. Normally, we hold the periph lock during most of the completion process, but we drop it just before we call biodone(). This biodone might have a ENXIO return code. The first ENXIO error will trigger posting an AC_LOST_DEVICE to xpt_async_td, which will eventually wind up calling cam_periph_async() which will invalidate the device. That code will grab the periph lock and invalidate all the software queued I/O. These two things might race, but it's unlikely (but not impossible) they'd wind up racing each other through g_vfs_done(). If we lose the race, we'd lose the printf, which isn't the end of the world.

A simple cmpset would fix this. This would cause a flood of atomic activity (one per pending I/O when the device fails), but since this is only the error path for a dying device that happens rarely, it wouldn't be terrible. I'm torn on fixing it, though.

This revision now requires review to proceed.Apr 24 2022, 12:08 AM
sys/geom/geom_vfs.c
154–160

since it's easy enough to fix, I just fixed it (I hope)

This seems like a useful change.

This revision is now accepted and ready to land.Apr 24 2022, 12:28 AM
sys/geom/geom_vfs.c
154

Isn't it spelled atomic_cmpset_int ?

This revision now requires review to proceed.Apr 24 2022, 1:30 AM

Remove manpages?

Done. I had a man page change, but then went in a different direction that eliminated that need. Thanks for the reminder.

This revision is now accepted and ready to land.Apr 24 2022, 1:30 PM
This revision was automatically updated to reflect the committed changes.