Don't try to write out bufs that have already failed with ENXIO. This
fixes some panics after disconnecting mounted disks.
Submitted by: imp (slightly different version, which I've then lost)
Differential D9674
Handle ENXIO bufs. trasz on Feb 19 2017, 5:09 PM. Authored by Tags None Referenced Files
Details Don't try to write out bufs that have already failed with ENXIO. This Submitted by: imp (slightly different version, which I've then lost)
Diff Detail
Event TimelineComment Actions Do you also have patches in flight for fixes for the msdos thing we talked about on irc?
Comment Actions That's true, softdeps code has an explicit workaround for ENXIO. That said, I don't think we need to worry about softdep at all at this time - if it panics, it panics; it's not a regression, because it always panicked. It would be nice if UFS without softdep, or msdosfs, didn't panic, and for that, we need to handle those bufs somewhere.
Comment Actions Perhaps we could just add an additional check to not do this in case of softdep, ie if the buf has dependencies? Or rather, just add a conditional a bit lower to not call buf_deallocate() of them? Comment Actions Here is a change that will at least clean up the soft updates associated with the buffer, though as noted if the filesystem keeps running it will put itself into an inconsistent and possibly unrecoverable state. This is no worse than what would happen with a filesystem running without soft updates if you continue operating after failing to do the I/O. Index: sys/ufs/ffs/ffs_softdep.c =================================================================== --- sys/ufs/ffs/ffs_softdep.c (revision 313991) +++ sys/ufs/ffs/ffs_softdep.c (working copy) @@ -14241,6 +14241,10 @@ } } +static int DoNotCorruptFilesystemOnError = 1; +SYSCTL_INT(_vfs_ffs, OID_AUTO, DoNotCorruptFilesystemOnError, CTLFLAG_RW, + &DoNotCorruptFilesystemOnError, 0, + "Panic when unfixable I/O error occurs rather continuing to run"); /* * Called whenever a buffer that is being invalidated or reallocated * contains dependencies. This should only happen if an I/O error has @@ -14258,8 +14262,18 @@ else printf("softdep_deallocate_dependencies: " "got error %d while accessing filesystem\n", bp->b_error); - if (bp->b_error != ENXIO) + /* + * Call softdep_disk_write_complete below to clean up the dependencies + * on this buffer. This will leave the filesystem in an inconsistent + * state, since soft updates thinks the I/O has completed successfully + * when in fact it has not. So we should mark the filesystem as + * read-only to prevent further damage, but at this time do not have + * a way to do so. + */ + if (bp->b_error != ENXIO || DoNotCorruptFilesystemOnError) panic("softdep_deallocate_dependencies: unrecovered I/O error"); + else + softdep_disk_write_complete(bp); } /* Comment Actions This might work, but I liked the trasz' proposal to not terminate the buffer on ENXIO for which dependencies exist. May be, both the optional call to softdep_disk_write_complete() as above, and trasz' idea, should be committed, managed by the common user-settable knob. Also, some explicit explanation to users what are pro and contra of the available setting, need to be provided. Comment Actions I think there's a difference in our notions of what happens next. When the driver starts returning ENXIO, no further I/O is possible and all future I/Os will return ENXIO. You can never read or write further from the device in that situation. The filesystem cannot become any more or less corrupted than it was when the device was given up for lost. Completing the buffers allow them to be recycled. Ideally, we'd be invalidating all buffers associated with the mount point and transitioning the filesystem to a 'deadfs' mode where it fails all future user requests. The state of softupdates doesn't matter, so long as it doesn't cause panics while the cleanup is going on. If we aren't doing this now, I'd love to work on code to make that happen. So while we may want to do the harder 'read-only' thing for other errors, when you get ENXIO, the underlying storage is gone and we should free up the resources ASAP. This is quite different than what would happen with EIO or other errors that might be transient (which currently don't take this path, and won't after). This change would only convert ENXIO errors into this failure path. It would leave all other errors as they currently are: retry because it might work if you try again. So we're only calling the invalidation path on ENXIO when no further I/O is possible to the device. As such, I'm not sure I think that the proposed change that Kirk has is needed. We're already in a situation where whatever we have in memory is all we will ever have and we need to evict it as soon as possible because only some I/Os will complete for the user, and the ones that will are unpredictable. Perhaps we should chat about this next week at the storage summit.
Comment Actions I concur with imp's analysis. Once ENXIO is returned no more I/O will be possible, so the disk will remain in whatever state it is in. So it is always safe to just call softdep_disk_write_complete(bp); to clean up the dependencies and free their storage. There is still the issue that clearing some dependencies will allow others to proceed and those other may try to read data from the disk (such as the update of an inode). Almost always the update will be for something that is already cached, but occationally there will be a cache miss and so to ensure success we will need to augment soft updates to handle those cases. This is probably best done by setting a state flag (MNT_DISKGONE?) that indicates that the disk is gone and we are shutting down. In the (rather few) places in the soft depedencies code that we do reads, we check for that flag and provide a usable default which in most cases will just be a zeroed out buffer. My user setable knob would not be needed here (i.e., we just add the else softdep_disk_write_complete(bp); code) . But as kib points out, it might be useful to incorporate into trasz's proposed change. Comment Actions SU items reference other items, and processing of items requires that the blocks are available. All proposals that argue that it is fine to drop buffers with pending dependencies assume that it is fine to add complexity to the SU code to handle more cases. This is what I strongly disagree with. SU is already extremely complex piece of code, and making all code pathes handle additional cases does not look too promising. And these cases are hard to test since we have basically only one 'shot' after the device is removed. In my opinion, pretending that device is still there to allow the normal SU code to get out of the dependencies by completing writes is the only workable scenario to handle device loss under SU mount. If a buffer has dependencies, then they did not completed and this is because there are other deps which reference this buffer. Comment Actions
Allow me to restate to make sure I understand. You'd like to complete buffers with dependencies on them as if they had actually completed w/o error, but keep them valid until the references from the other buffers go away? And you'd like to do this with as minimal impact to the SU code as possible (ideally with no impact). The SU code needs to be able to peek into the data in these buffers to make decisions about what to do with them. Were we to slam the door shut prematurely, they would try to access the buffers which may lead to a crash if they have been reclaimed. Any change in this area should not only make things better, but should ideally help prevent crashes that are easily foreseeable even if such crashes were possible before my (Warner's) change that took out bp->b_errror == EIO and that might have been 'fixed' by that but would now be in play again. Is that what you'd like to see? Comment Actions Yes, and the indication of such buffer is the bp->b_dep being non-empty.
SU needs to undo some changes to data in buffers with dependencies, when doing writes and dependencies are not complete yet (i.e. because other buffers are not yet cleaned). Normally, any buffer with incomplete dependencies is dirty, and cannot be cleared until deps are complete, SU code re-dirties such buffer after write and putting back the reverted data.
As explained above, SU code assumes that buffers with dependencies cannot go away because the buffers are dirty until dependencies are complete.
I am not sure what do you mean there. From the high-level view, my opinion on fixing buffer cache interaction with the device errors should consider the main buffer cache consumer UFS and its non-trivial integration with the buffer cache (see above about b_dep and dirty buffers as example). I.e. in my opinion, we should not e.g. fix msdosfs until we know that the fix does not make UFS situation worse or completely paint us into corner for fixing UFS. Because fixing msdosfs is 'just finding corner cases and coding' so to say, while fixing UFS is an architectural issue.
What you explained before the last sentence, is probably a workable strategy for dealing with SU and might be even with SU+J (but +J would certainly require special code to handle journal write errors). I do not claim that this is the only possible or even the best strategy for UFS, but so far nobody thought publicitely about UFS and no other approach was provided. I.e. I do not insist on it, but I consider it as viable and making buffer cache incompatible with this approach without providing either arguments why the approach is not viable or alternative appoach is not good. Comment Actions Update to incorporate the idea of adding an exception for softupdates. Note that this is only compile-tested for now. |