Page MenuHomeFreeBSD

ctld: kernel-sourced portal groups are not dummies
Needs ReviewPublic

Authored by kevans on Thu, Aug 7, 4:47 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Aug 22, 12:34 PM
Unknown Object (File)
Mon, Aug 18, 3:36 PM
Unknown Object (File)
Fri, Aug 15, 10:36 PM
Unknown Object (File)
Thu, Aug 14, 9:45 AM
Unknown Object (File)
Mon, Aug 11, 10:27 PM
Unknown Object (File)
Fri, Aug 8, 6:11 PM
Unknown Object (File)
Thu, Aug 7, 5:48 PM
Subscribers

Details

Reviewers
asomers
jhb
mav
Summary

The current and historical versions of ctld would flag our initial set
of kernel ports as dummies, because their portal groups were empty since
portals come from the configuration on-disk.

As a result, we would never try to remove a kernel port at startup that
didn't exist in the configuration (possibly a feature if you wanted
concurrent ctld(8)), and we would always try to port->kernel_add() on
ports in the configuration (even if they actually did have an existing
kernel port).

Flag these portal groups as kernel groups so that we avoid trying to add
ports that already exist. It may be the case that the kernel_remove()
loop in conf::apply() needs to do something other than the current
oldport->is_dummy() to avoid removing ports that it isn't supposed to
be managing, but that wuld also seem to apply to LUNs that would be
removed today.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 66051
Build 62934: arc lint + arc unit

Event Timeline

kevans requested review of this revision.Thu, Aug 7, 4:47 AM

Is there a way to reproduce this bug using unpatched FreeBSD?

Is there a way to reproduce this bug using unpatched FreeBSD?

Sure, though it's maybe debatable if this is a supported use-case. Consider this configuration:

root@ifrit:/usr/src # cat /etc/ctl.conf 
portal-group pg0 {
        discovery-auth-group no-authentication
        listen 0.0.0.0
}

target iqn.2012-06.dev.kevans:target0 {
        auth-group no-authentication
        portal-group pg0

        lun 0 {
                path /dev/zvol/zroot/blk
                size 20G
        }
}

And pre-creating the port with ctladm port:

root@ifrit:/usr/src # ctladm port -c -d iscsi -O cfiscsi_portal_group_tag=10 -O ctld_portal_group_name=pg0 -O cfiscsi_target=iqn.2012-06.dev.kevans:target0
Port created successfully
frontend: iscsi
port:     5
root@ifrit:/usr/src # ctladm port -p 5 -o on
Front End Ports enabled

Then running ctld:

root@ifrit:/usr/src # ctld -d                                                                                   
ctld: obtaining configuration from /etc/ctl.conf
ctld: constructed port pg0-iqn.2012-06.dev.kevans:target0\012
ctld: auth-group "default" not defined; going with defaults
ctld: portal-group "default" not defined; going with defaults
ctld: transport-group "default" not defined; going with defaults
ctld: opening pidfile /var/run/ctld.pid
ctld: obtaining previously configured CTL luns from the kernel
ctld: CTL port 0 "camsim" wasn't managed by ctld; 
ctld: CTL port 1 "ioctl" wasn't managed by ctld; 
ctld: CTL port 2 "tpc" wasn't managed by ctld; 
ctld: CTL port 3 "ioctl/1" wasn't managed by ctld; 
ctld: adding lun "iqn.2012-06.dev.kevans:target0,lun,0" 
ctld: adding port "pg0-iqn.2012-06.dev.kevans:target0"
ctld: error returned from port creation request: target "iqn.2012-06.dev.kevans:target0" for portal group tag 10 already exists
ctld: failed to add port pg0-iqn.2012-06.dev.kevans:target0
ctld: listening on 0.0.0.0, portal-group "pg0"
ctld: not listening on portal-group "default", not assigned to any target
ctld: not listening on transport-group "default", not assigned to any target

With the patch applied, ctld(8) will happily update the port instead of trying to re-add it.

Alternatively, let ctld create it and find yourself in a situation where ctld gets SIGKILLed or otherwise terminated uncleanly:

root@ifrit:/usr/src # ctld -d                  
ctld: obtaining configuration from /etc/ctl.conf   
ctld: auth-group "default" not defined; going with defaults 
ctld: portal-group "default" not defined; going with defaults                              
ctld: transport-group "default" not defined; going with defaults
ctld: opening pidfile /var/run/ctld.pid               
ctld: obtaining previously configured CTL luns from the kernel                                                                                                                                                                 
ctld: CTL port 0 "camsim" wasn't managed by ctld;                                                              
ctld: CTL port 1 "ioctl" wasn't managed by ctld; 
ctld: CTL port 2 "tpc" wasn't managed by ctld;                                                                 
ctld: CTL port 3 "ioctl/1" wasn't managed by ctld;                                                             
ctld: adding lun "iqn.2012-06.dev.kevans:target0,lun,0" 
ctld: adding port "pg0-iqn.2012-06.dev.kevans:target0"
ctld: listening on 0.0.0.0, portal-group "pg0"
ctld: not listening on portal-group "default", not assigned to any target
ctld: not listening on transport-group "default", not assigned to any target
Killed
root@ifrit:/usr/src # ctld -d
ctld: obtaining configuration from /etc/ctl.conf
ctld: auth-group "default" not defined; going with defaults
ctld: portal-group "default" not defined; going with defaults
ctld: transport-group "default" not defined; going with defaults
ctld: opening pidfile /var/run/ctld.pid
ctld: obtaining previously configured CTL luns from the kernel
ctld: CTL port 0 "camsim" wasn't managed by ctld; 
ctld: CTL port 1 "ioctl" wasn't managed by ctld; 
ctld: CTL port 2 "tpc" wasn't managed by ctld; 
ctld: CTL port 3 "ioctl/1" wasn't managed by ctld; 
ctld: found CTL lun 0 "iqn.2012-06.dev.kevans:target0,lun,0"
ctld: device-id for lun "iqn.2012-06.dev.kevans:target0,lun,0", CTL lun 0 changed; removing
ctld: adding lun "iqn.2012-06.dev.kevans:target0,lun,0"
ctld: adding port "pg0-iqn.2012-06.dev.kevans:target0"
ctld: error returned from port creation request: target "iqn.2012-06.dev.kevans:target0" for portal group tag 256 already exists
ctld: failed to add port pg0-iqn.2012-06.dev.kevans:target0
ctld: listening on 0.0.0.0, portal-group "pg0"
ctld: not listening on portal-group "default", not assigned to any target
ctld: not listening on transport-group "default", not assigned to any target

Instead of this, I think a better solution would be for ctld to ignore EEXIST errors. That would not only fix your situation, but would also go some way towards recovering from a ctld that crashed (currently, if ctld crashes the only way to recover is to reboot or to manually remove every target with ctladm). The port::kernel_add() method would need to be be adjusted to return an errno, of course. But the ioctls already do. I'm not sure what error code they currently return; that might need to be adjusted.

Instead of this, I think a better solution would be for ctld to ignore EEXIST errors. That would not only fix your situation, but would also go some way towards recovering from a ctld that crashed (currently, if ctld crashes the only way to recover is to reboot or to manually remove every target with ctladm). The port::kernel_add() method would need to be be adjusted to return an errno, of course. But the ioctls already do. I'm not sure what error code they currently return; that might need to be adjusted.

I had considered that, but this dummy check wasn't clear: https://cgit.freebsd.org/src/tree/usr.sbin/ctld/ctld.cc#n2038 -> this loop effectively does nothing at startup because all kernel ports are dummies. If the intention was to allow them to persist under the assumption they're being managed elsewhere, it seems like we should also have some mechanism to avoid destroying pre-existing LUNs from the kernel in the next loop -> thus I arrive at maybe they *should* be removed as well.

I'm more than happy to fix this to however people (like you) that actively work on ctld see fit, though.

I note that this seems to revert us to pre https://cgit.freebsd.org/src/commit/usr.sbin/ctld/ctld.c?h=stable/14&id=9215e501b9e7d3f425677ffd1d3f1542c9fbb4dc behavior

Instead of this, I think a better solution would be for ctld to ignore EEXIST errors. That would not only fix your situation, but would also go some way towards recovering from a ctld that crashed (currently, if ctld crashes the only way to recover is to reboot or to manually remove every target with ctladm). The port::kernel_add() method would need to be be adjusted to return an errno, of course. But the ioctls already do. I'm not sure what error code they currently return; that might need to be adjusted.

I had considered that, but this dummy check wasn't clear: https://cgit.freebsd.org/src/tree/usr.sbin/ctld/ctld.cc#n2038 -> this loop effectively does nothing at startup because all kernel ports are dummies. If the intention was to allow them to persist under the assumption they're being managed elsewhere, it seems like we should also have some mechanism to avoid destroying pre-existing LUNs from the kernel in the next loop -> thus I arrive at maybe they *should* be removed as well.

I'm more than happy to fix this to however people (like you) that actively work on ctld see fit, though.

I note that this seems to revert us to pre https://cgit.freebsd.org/src/commit/usr.sbin/ctld/ctld.c?h=stable/14&id=9215e501b9e7d3f425677ffd1d3f1542c9fbb4dc behavior

Hmm. Yes, that seems inconsistent. I've often wondered if anybody was actually using this "leave alone ports that are managed outside of ctld" behavior. If not, the cleanest solution would be to eliminate it. That way, /etc/ctl.conf would be treated as the source of truth, and restarting ctld would always "correct" the configuration.

Instead of this, I think a better solution would be for ctld to ignore EEXIST errors. That would not only fix your situation, but would also go some way towards recovering from a ctld that crashed (currently, if ctld crashes the only way to recover is to reboot or to manually remove every target with ctladm). The port::kernel_add() method would need to be be adjusted to return an errno, of course. But the ioctls already do. I'm not sure what error code they currently return; that might need to be adjusted.

I had considered that, but this dummy check wasn't clear: https://cgit.freebsd.org/src/tree/usr.sbin/ctld/ctld.cc#n2038 -> this loop effectively does nothing at startup because all kernel ports are dummies. If the intention was to allow them to persist under the assumption they're being managed elsewhere, it seems like we should also have some mechanism to avoid destroying pre-existing LUNs from the kernel in the next loop -> thus I arrive at maybe they *should* be removed as well.

I'm more than happy to fix this to however people (like you) that actively work on ctld see fit, though.

I note that this seems to revert us to pre https://cgit.freebsd.org/src/commit/usr.sbin/ctld/ctld.c?h=stable/14&id=9215e501b9e7d3f425677ffd1d3f1542c9fbb4dc behavior

Hmm. Yes, that seems inconsistent. I've often wondered if anybody was actually using this "leave alone ports that are managed outside of ctld" behavior. If not, the cleanest solution would be to eliminate it. That way, /etc/ctl.conf would be treated as the source of truth, and restarting ctld would always "correct" the configuration.

Having taken a little more time to put my thoughts together, in no particular order:

re: accepting EEXISTS and updating the port:

  • For concurrent ctld(8) instances, I'd be worried that we would miss some scenario where there's overlap that should have been detected; only accepting it for ports we saw at startup minimizes the chances that we're doing something wrong, and still allows recovery from a previous crash since they're in effectively the same state as the pre-created ports noted in my previous comments
  • For "one ctld instance to rule them all", I think it'd be fine to paper over the weird EEXISTS because we're supposed to own our iscsi ports

re: concurrent ctd(8), I'd note that we're really not that far off from where we would want to be for that to work, at a glance. Maybe it makes more sense to mark the kernel-sourced oldconf instead and avoid doing port/LUN removal. We could special-case the port addition there and ignore EEXISTS if it's the initial configuration, but I'd kind of wonder why the port randomly appeared after we started up and enumerated the existing ports

re: mav's 9215e501b9e7 / r316677, I suspect the fact that it had this side effect of ignoring initial kernel ports is a bug rather than something he actually wanted, based on the commit message