Page MenuHomeFreeBSD

Close iSCSI sessions on shutdown
ClosedPublic

Authored by smh on Dec 7 2015, 3:06 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 24, 12:53 AM
Unknown Object (File)
Dec 22 2023, 9:47 PM
Unknown Object (File)
Aug 23 2023, 8:08 AM
Unknown Object (File)
Jun 24 2023, 9:07 PM
Unknown Object (File)
May 3 2023, 5:46 PM
Unknown Object (File)
Apr 20 2023, 1:56 AM
Unknown Object (File)
Jan 10 2023, 10:14 PM
Unknown Object (File)
Jan 5 2023, 8:30 AM
Subscribers

Details

Summary

Ensure that all iSCSI sessions are correctly terminated during shutdown.

This:

  • Enhances the changes done by r286226 (D3052) .
  • Add shutdown post sync event to run after filesystem shutdown (SHUTDOWN_PRI_FIRST) but before CAM shutdown (SHUTDOWN_PRI_DEFAULT).
  • Changes iscsi_maintenance_thread to processes terminate in preference to reconnect.
Test Plan

Test #1

  1. Connect iSCSI session
  2. newfs <dev>
  3. mount <dev> /mnt
  4. ensure fail_on_disconnection is disabled: sysctl kern.iscsi.fail_on_disconnection=0
  5. disconnect session e.g. ifconfig <if> down
  6. reboot
  7. Confirm reboot succeeds

Test #2

  1. configure multipath to seperate san iSCSI sessions
  2. disconnect some of the iSCSI sessions
  3. reboot
  4. Confirm reboot succeeds

Both tests run with idle and busy (outstanding BIOs) under DEBUG kernel which can now perform a shutdown without panic, although BIOs are failed.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

smh retitled this revision from to Close iSCSI sessions on shutdown.
smh updated this object.
smh edited the test plan for this revision. (Show Details)

style(9) fix for iscsi_initiator

I don't quite see how this avoids the shutdown hang, which ought to happen during filesystem sync - basically, sync causes writes, which hang indefinitely waiting for reconnection; this happens before iscsi_shutdown() gets run. Previously it worked because iscsi_shutdown() was called earlier, and disabled reconnections.

Overall, I suspect we just need two eventhandlers here - a pre_sync one that disables reconnections, just like it did before, and post_sync one that terminates the sessions.

Oh, and the iscsi_initiator/iscsi.c change is fine, reviewed.

smh edited edge metadata.

After much testing its clear that both a pre-sync and post-sync handler are required to achieve a clean reboot when session is disconnected.

The pre-sync handler fails disconnected sessions early and the post-sync handler cleans up the sessions fully.

It should be noted that when the pre-sync handler runs all queued BIOs will be failed, potentially causing loss of date which may have been avoided should we let the session reconnect.

The current behaviour for fail on shutdown is maintained but it can be disabled by setting the new sysctl kern.iscsi.fail_on_shutdown=0

!is_connected is now used as the condition instead of is_waiting_for_iscsid to ensure cleanup always happens as is_waiting_for_iscsid was never set due to the handler actioning quickly and instead entering the is_login_phase wait.

An additional check was also added to the reconnect handler to ensure it doesn't retry if shutdown is in progress.

In order to ensure that everything is cleaned up the wait from the unload has been extracted an reused for shutdown. During testing of this it was noticed that some sessions didn't finish cleanup before reboot due to early removal from the sessions list. To fix this session removal was moved as late a possible.

Outstanding questions:

  1. Is it safe to move the session removal in iscsi_maintenance_thread_terminate to the end?
  2. Should the default for kern.iscsi.fail_on_shutdown be 0?
In D4429#93465, @trasz wrote:

I don't quite see how this avoids the shutdown hang, which ought to happen during filesystem sync - basically, sync causes writes, which hang indefinitely waiting for reconnection; this happens before iscsi_shutdown() gets run. Previously it worked because iscsi_shutdown() was called earlier, and disabled reconnections.

Sync actually happens before any shutdown, at least if you use shutdown, reboot etc. The only way to avoid that is reboot -n (no sync) which unfortunately not only prevents the sync run by reboot directly but also prevents the kernel level sync as well, which I'll look to address separately.

My testing with the current version unfortunately shows that it fails to cleanup disconnected sessions straight away and just continues to retry due, which I've also addressed in the new revision.

Overall, I suspect we just need two eventhandlers here - a pre_sync one that disables reconnections, just like it did before, and post_sync one that terminates the sessions.

Agreed and implemented.

smh edited the test plan for this revision. (Show Details)
sys/dev/iscsi/iscsi.c
103 ↗(On Diff #10944)

I don't get this. In which situation the user would set it to 0? I mean, what effect would it have apart from hanging the shutdown? When the handler gets run, there is zero chance the session could reconnect, because we no longer have the userspace (ie iscsid(8)) running.

456 ↗(On Diff #10944)

It's a good idea, but the code assumes sessions that are on the sessions list can be safely dereferenced. Could you try moving sx_xlock() back to where it was, so that it's held the whole time?

2193 ↗(On Diff #10944)

Could you commit this separately? Looks unrelated to the rest of this commit.

2356 ↗(On Diff #10944)

I prefer the previous comments here, to be honest. :-)

2378 ↗(On Diff #10944)

Duplicated empty line.

2409 ↗(On Diff #10944)

Punctuation; "shutdown - otherwise" perhaps? I'd also move this to iscsi_shutdown_post(), iscsi_shutdown_pre() already has a comment explaining its purpose.

Address trasz's latest set of feedback.

smh marked 4 inline comments as done.Dec 12 2015, 6:57 PM
smh added inline comments.
sys/dev/iscsi/iscsi.c
103 ↗(On Diff #10944)

In the case where its a temporary interruption and they don't want to loose the data that's currently queued.

When testing I had it recover when set to 0, clearly scsid(8) must have still been running. It could well have been when I was testing without the -N option to reboot, I'll do some more testing to confirm.

456 ↗(On Diff #10944)

That's what I suspected, will do.

2356 ↗(On Diff #10944)

The previous comment described what its doing but not why, which is important to capture, I'll revise it.

2409 ↗(On Diff #10944)

Will correct the punctuation but the comment is regarding the use of SHUTDOWN_PRI_DEFAULT -1 and not the action the shutdown post handler does so its more appropriate to keep it here.

Sorry for the lack of feedback; I've been ill. I'll get this done tomorrow, I promise.

trasz edited edge metadata.

Looks good and fixes many more problems I thought there are. Please let me know what's the purpose of the change in iscsi_callout() - or remove it, if it's not actually neccessary. Thanks!

sys/dev/iscsi/iscsi.c
103 ↗(On Diff #11182)

Ok, I've talked to kib@ and looks like my assumption that at this point the userspace processes definitely won't be running was actually wrong. So, yeah, it's the right solution.

457 ↗(On Diff #11182)

Just to make sure, you've tested this with WITNESS and it doesn't introduce any locking warnings, right?

616 ↗(On Diff #11182)

Why is this needed? We have a check for this earlier, at the beginning of the function; if it's for handling cases where is_terminating and is_reconnecting are both set then it should already be handled by the change in iscsi_maintenance_thread()?

This revision is now accepted and ready to land.Dec 21 2015, 10:06 PM
smh marked 4 inline comments as done.Dec 31 2015, 12:10 PM
smh added inline comments.
sys/dev/iscsi/iscsi.c
457 ↗(On Diff #11182)

Confirmed.

616 ↗(On Diff #11182)

This is to handle the case that terminating is set while this is running. Without this additional check it can be rescheduled for reconnect when its not needed. Its doesn't eliminate the race on this but it does help.

This revision was automatically updated to reflect the committed changes.
smh marked an inline comment as done.