alq(9): Record any write failures and return the last in alq_close(9).
AcceptedPublic
Actions

Authored by bdrewery on Mar 22 2016, 8:34 PM.

Details

Reviewers

kib
markj
lstewart

Group Reviewers

manpages

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 3048
Build 3080: arc lint + arc unit

Event Timeline

bdrewery updated this revision to Diff 14530.Mar 22 2016, 8:34 PM

bdrewery retitled this revision from to alq(9): Record any write failures and return the last in alq_close(9)..

bdrewery updated this object.

bdrewery edited the test plan for this revision. (Show Details)

Herald added a reviewer: manpages. · View Herald TranscriptMar 22 2016, 8:34 PM

Herald added a subscriber: imp. · View Herald Transcript

bdrewery added reviewers: markj, kib, lstewart.Mar 22 2016, 8:35 PM

markj added inline comments.Mar 23 2016, 12:03 AM

sys/kern/kern_alq.c
390	Should we overwrite previous errors?
483	This comment is misleading - there can't be any pending I/O since no one has a reference to this alq yet. All this does is close the vnode.

kib added inline comments.Mar 23 2016, 6:27 AM

sys/kern/kern_alq.c
297	Please explain this change.
381–382	If mac_vnode_check_write() failed, should we return this error ? Otherwise code returns zero, with no write.

bdrewery added inline comments.Mar 23 2016, 4:28 PM

sys/kern/kern_alq.c
381–382	Ah true, I missed this.
390	I don't see a problem with it. It depends on how complicated the API should become and what behavior should happen when a write fails. Should the entire log be closed and reject further writes? Should the error be returned from alq_flush() as well? Should flushing it cause the error to reset? I have no opinion on any of those, I'll leave it to Lawrence who maintains this more. My only interest is determining if any write failed so I can explore using this elsewhere.
483	Yes you're right. I didn't read the code here, I just moved the call to the parent since I wanted it out of alq_destroy().

bdrewery marked 4 inline comments as done.Mar 23 2016, 6:06 PM

bdrewery added inline comments.

sys/kern/kern_alq.c
297	alq_shutdown() flushes pending IO before destroy. I need to flush any pending IO in alq_close() before freeing the alq pointer so that alq->error can be retrieved. Calling something like alq_flush() here instead would just be redundant since alq_shutdown() would then try flushing again.

Fix findings

Herald edited edge metadata. · View Herald TranscriptMar 23 2016, 6:12 PM

kib added inline comments.Mar 24 2016, 9:39 AM

sys/kern/kern_alq.c
297	Then, could you assert that no penind io is queued, there ?
381–382	It is much easier to read restructured code as jhb prefers it #ifdef MAC error = mac_vnode_check_write(); if (error == 0) #endif error = VOP_WRITE(); and also it shows cleanly that the addition of 'error = 0;' assignment above is redundant.

Fix unrelated memory leak in ald_shutdown
Add more asserts
Retrieve error from vn_close
Return error rather than 0 from alq_close() on shutting down, which seems to be only possible with an invalid pointer anyhow

Herald edited edge metadata. · View Herald TranscriptMar 28 2016, 7:28 PM

bdrewery marked 4 inline comments as done.Mar 28 2016, 7:29 PM

bdrewery added inline comments.

sys/kern/kern_alq.c
245	FYI this is an unrelated memory leak fix that can happen when unloading the module.

markj accepted this revision.Mar 28 2016, 7:37 PM

markj edited edge metadata.

markj added inline comments.

sys/kern/kern_alq.c
390	Hm, I'm not suggesting any API changes - just that we might wish to save the first write error rather than the most recent error: if (error != 0 && alq->aq_error == 0) alq->aq_error = error; I think this is generally more useful for debugging in the face of cascading errors. But I can't think of a specific alq failure mode where this will make a difference.

This revision is now accepted and ready to land.Mar 28 2016, 7:37 PM

It looks fine to me from the patch, but I did not read the alq code.

share/man/man9/alq.9
422	I do not think that this is either correct or useful explanation. E.g. it does not mention mac checks. It would be more useful to say that errors encountered while final file operations are returned, without descending into specific ops.

Apologies for the delay in getting to this, still heads down wrapping up my PhD thesis. The comments from Kib and Mark all appear to have been addressed and the changes look good. I don't really have an opinion on the "report first or most recent" error issue.

sys/kern/kern_alq.c
245	Good catch.
390	No strong opinion either way from me - there are reasonable arguments for both approaches.
934	This conflates two unrelated sources of "error". Any reason not to return alq->aq_error here even though it's possible the value may change by the time the shutdown actually completes (which this change doesn't currently capture anyway)

Note also that my use case will require an alq_open_file (as in struct file*) or alq_open_fd or alq_open_vnode which I have a separate patch coming for soon. Currently I am using alq_open_vnode for testing.

sys/kern/kern_alq.c
390	I am leaning towards changing it to the first error as Mark suggested.
934	Agreed about conflating. I wavered on this line a lot. I changed it probably 5 times before submitting because I was unsure what to do with it. As far as I can tell it is only possible with an invalid pointer (which is documented as "undefined behavior"). This only returned EBUSY if the module is unloading which is only possible to get into if the alq_queues is empty which presumably means that this pointer passed in alq_close is invalid. If somehow the pointer passed in is "valid" but just not in the queue anymore (thus allowing shutdown to start since the queue is empty), then the pointer will be made invalid soon anyhow since it will be freed soon by the "unrelated memory leak" fix above. I might be missing something.