Page MenuHomeFreeBSD

bhyve: 'xhci,tablet' snapshot fixes
ClosedPublic

Authored by gusev.vitaliy_gmail.com on Sep 1 2020, 10:29 AM.

Details

Summary
There were some errors during suspend:

  - pci_xhci_snapshot: invalid address: sc->opregs.cr_p
  - pci_xhci_snapshot: invalid address: sc->opregs.dcbaa_p
  - pci_xhci_snapshot: invalid address: sc->rtsregs.erst_p
  - pci_xhci_snapshot: invalid address: sc->rtsregs.erstba_p
  - pci_xhci_snapshot: invalid aaddress: dev->dev_ctx
Test Plan

"bhyve suspend" command finishes successfully and then resumed VM works.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

So is the case here that the device hasn't been initialized by the guest OS so registers are left as NULL and the guest2hostaddr fails on the NULL pointers?

It feels like the better solution is that on resume, the guest should just rederive these pointers from the registers, e.g. set erstba_p to XHCI_GADDR(sc, sc->rtsregs.intrreg.erstba & ~0x3FUL) rather than doing a save/restore of it explicitly.

usr.sbin/bhyve/pci_xhci.c
3094 ↗(On Diff #76459)
In D26264#593183, @jhb wrote:

So is the case here that the device hasn't been initialized by the guest OS so registers are left as NULL and the guest2hostaddr fails on the NULL pointers?

Yes, it all are not initialised, i.e. NULL

It feels like the better solution is that on resume, the guest should just rederive these pointers from the registers, e.g. set erstba_p to XHCI_GADDR(sc, sc->rtsregs.intrreg.erstba & ~0x3FUL) rather than doing a save/restore of it explicitly.

Yes, it could be done so but there are some consideration why I would leave it as is:

  1. another variables like xfer_block->buf must be saved directly, and use one approach makes the code simpler.
  2. re-deriving pointers introduces complex logic and doubling/copy-paste code (what if some setting of sc->opregs.cr_p, sc->opregs.dcbaa_p, sc->rtsregs.erstba_p, ..., will be changed?)
  3. re-deriving would be not just plain assignment var = XHCI_GADDR(). For example

    sc->opregs.dcbaa_p = XHCI_GADDR(sc, sc->opregs.dcbaap & ~0x3FUL);

    it also requires additional check on error if a register is not initialised, otherwise dcbaa_p would be set to -1.

Anyway, @jhb If you still think, that XHCI_GADDR() approach is better, I can correct it and update diff.

Correct code style for if (dev->dev_ctx)

In D26264#593183, @jhb wrote:

So is the case here that the device hasn't been initialized by the guest OS so registers are left as NULL and the guest2hostaddr fails on the NULL pointers?

Yes, it all are not initialised, i.e. NULL

Ok.

It feels like the better solution is that on resume, the guest should just rederive these pointers from the registers, e.g. set erstba_p to XHCI_GADDR(sc, sc->rtsregs.intrreg.erstba & ~0x3FUL) rather than doing a save/restore of it explicitly.

Yes, it could be done so but there are some consideration why I would leave it as is:

  1. another variables like xfer_block->buf must be saved directly, and use one approach makes the code simpler.
  2. re-deriving pointers introduces complex logic and doubling/copy-paste code (what if some setting of sc->opregs.cr_p, sc->opregs.dcbaa_p, sc->rtsregs.erstba_p, ..., will be changed?)
  3. re-deriving would be not just plain assignment var = XHCI_GADDR(). For example

    sc->opregs.dcbaa_p = XHCI_GADDR(sc, sc->opregs.dcbaap & ~0x3FUL);

    it also requires additional check on error if a register is not initialised, otherwise dcbaa_p would be set to -1.

I think in general we want to move to saving less of this type of state and instead rederiving as a general rule. While it was ok for the initial version, we really should be moving to a model that saves / restores "architectural" registers and other state defined by a specification (e.g. in this case what XHCI defines), and avoid saving implementation details. This gives us a format that is more robust against future implementation changes that might remove these helper variables or give them different meanings.

I do agree that the duplicate logic is not ideal. In some ways what would be nice is if the restore operation actually invoked the 'write_register' callback to avoid duplicating the logic. For now your diff is ok (I wanted to understand why it needed to permit NULL, and it's probably worth stating that reason in the commit log). I think even though it is more complex to do this "right" way it is worth moving towards that model where possible. It does mean that we might end up with separate save/restore callbacks instead of a macro that changes behavior based on the operation though. However, such changes should probably be done on a driver-wide basis and one driver at a time. They should also not be mixed in with the other fixes you are making here.

A better log might include something like "Permit suspend/resume of a XHCI device model that has not been attached to by a driver in a guest OS."

This revision is now accepted and ready to land.Oct 5 2020, 5:26 PM
This revision was automatically updated to reflect the committed changes.