Page MenuHomeFreeBSD

hyperv: storvsc: Fix multiple invalid disk issue for FreeBSD on windows 10/2016
ClosedPublic

Authored by honzhan_microsoft.com on Jan 14 2016, 1:02 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 26, 12:58 AM
Unknown Object (File)
Fri, Apr 26, 12:58 AM
Unknown Object (File)
Fri, Apr 26, 12:58 AM
Unknown Object (File)
Fri, Apr 26, 12:58 AM
Unknown Object (File)
Fri, Apr 26, 12:58 AM
Unknown Object (File)
Fri, Apr 26, 12:58 AM
Unknown Object (File)
Fri, Apr 26, 12:22 AM
Unknown Object (File)
Fri, Apr 26, 12:22 AM

Details

Summary

The problem can be reproduced for FreeBSD installed on windows 10 and 2016 server.

The valid disk was randomly switched, for example, from da0 to da1 when restarting
the VM. As a result, the boot process will hang.

camcontrol command lists 16 disk, but only one is valid. See the output
of 'camcontrol devlist' on windows 10

camcontrol devlist

root@honzhan-dev2:/usr/src # camcontrol devlist
<Msft Virtual CD/ROM 1.0> at scbus0 target 0 lun 0 (cd0,pass0)
<Msft Virtual Disk 1.0> at scbus1 target 0 lun 0 (da1,pass2)
< > at scbus1 target 1 lun 1 (da0,pass1)
< > at scbus2 target 0 lun 1 (da2,pass3)
< > at scbus2 target 0 lun 2 (da4,pass5)
< > at scbus2 target 0 lun 3 (da6,pass7)
< > at scbus2 target 0 lun 4 (da8,pass9)
< > at scbus2 target 0 lun 5 (da10,pass11)
< > at scbus2 target 0 lun 6 (da12,pass13)
< > at scbus2 target 0 lun 7 (da14,pass15)
< > at scbus2 target 1 lun 1 (da3,pass4)
< > at scbus2 target 1 lun 2 (da5,pass6)
< > at scbus2 target 1 lun 3 (da7,pass8)
< > at scbus2 target 1 lun 4 (da9,pass10)
< > at scbus2 target 1 lun 5 (da11,pass12)
< > at scbus2 target 1 lun 6 (da13,pass14)
< > at scbus2 target 1 lun 7 (da15,pass16)

dmesg also shows us the similar information:

dmesg

da0 at blkvsc0 bus 0 scbus1 target 1 lun 1
da0: < > Fixed Direct Access SCSI device
da0: 300.000MB/s transfers
da0: 0MB (0 512 byte sectors: 0H 0S/T 0C)
da0: Delete methods: <NONE(*)>
da1 at blkvsc0 bus 0 scbus1 target 0 lun 0
da1: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
da1: 300.000MB/s transfers
da1: 20480MB (41943040 512 byte sectors: 255H 63S/T 2610C)
da2: < > Fixed Direct Access SCSI device
da2: 300.000MB/s transfers
da2: 0MB (0 512 byte sectors: 0H 0S/T 0C)
da2: Delete methods: <NONE(*)>
da3 at storvsc1 bus 0 scbus2 target 1 lun 1
da3: < > Fixed Direct Access SCSI device
da3: 300.000MB/s transfers
da3: 0MB (0 512 byte sectors: 0H 0S/T 0C)
da3: Delete methods: <NONE(*)>
da5: < > Fixed Direct Access SCSI device
da5: 300.000MB/s transfers
da5: 0MB (0 512 byte sectors: 0H 0S/T 0C)
da5: Delete methods: <NONE(*)>
da4: < > Fixed Direct Access SCSI device
da4: 300.000MB/s transfers
da4: 0MB (0 512 byte sectors: 0H 0S/T 0C)
da4: Delete methods: <NONE(*)>

Submitted by: Hongjiang Zhang <honzhan microsoft com>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

honzhan_microsoft.com retitled this revision from to hyperv: storvsc: Fix multiple invalid disk issue for FreeBSD on windows 10/2016.
honzhan_microsoft.com updated this object.
sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1931

inline?

1937

shall we use better and cheap way to determine? like check the disk size > 0?

sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1931

It has already been inlined in binary. Anyway, I will "inline" it explicitly.

1937

Before this fix, I have tried to check disk size. But I did not find the interface for getting the size.
Of course, I can get it by myself. The code snippet is like:

struct cam_periph *periph = xpt_path_periph(ccb->ccb_h.path);
struct da_softc *softc = (struct da_softc *)periph->softc;
struct disk_params *dp = &(softc->params);
// disk size
uintmax_t disk_size = ((uintmax_t)dp->secsize *
        dp->sectors) / (1024*1024);

The problem is "struct da_softc" and "struct disk_params" are both defined in sys/cam/scsi/scsi_da.c. I have to copy those structures to my own file. It is more expensive than the current way.

sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1937

There is no capacity information, when the INQUIRY is send to the device (I think the hyperv storvsc is doing a 1:1 SCSI sim).

And as about directly using da/disk, it is also wrong, since they may not be created, e.g. in the probing phase.

I am fine w/ this function, inline or not inline; does not matter, since it will not be used on I/O path as your patch. See my comment in the later section.

BTW, what's in the inquiry data vendor/product exactly? All zero or all white space?

1997

We don't need this here.

2049

Leave the SCSI_STATUS_OK check unchanged

if (status == SCSI_STATUS_OK) {
    const struct scsi_generic *cmd;
    cmd = (const struct scsi_generic *)((ccb->ccb_h.flags & CAM_CDB_POINTER) ?
        csio->cdb_io.cdb_ptr : csio->cdb_io.cdb_bytes)
    if (cmd->opcode != INQUIRY || is_scsi_valid(csio->data_ptr))
         ccb->ccb_h.status |= CAM_REQ_CMP;
}

In this way, you don't need to do the inquiry data check upon every I/O request

2054

Haha, nice catch!

sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1937

vendor data is all zero.
cam_strvis' implementation wants to remove leading whitespace and tailing zero. Here I just use it to extract vendor/product/version information.

2049

ok. That is elegant.

sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1961

why not return ret = 1 when type is not in the list? so that we can just check if (type != T_NODEVICE). the code here will be headache when we add a new type we have to touch here and add the new value in thie enum

sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1961

You are right, listing enumeration is not a good way. But I did not find a better solution.

"type != T_NODEVICE" is not enough. See "default" also returns 0.

Here the "type" is 5-bit value (0x0~0x1F). This "switch" statement filters 16 valid type values, the other 15 values are invalid.

Maybe next time we could add -U999999 to generate the patch? So we could have context here. Thanks.

sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
1932

const struct scsi_inquiry_data *inq

1944

Why not remove the temp var 'ret' and just return here? Make the code simpler and easy to follow.

1970–1974

Same, as the above suggestion, we could just return instead of mark-check-retrun.

1988

Please just return from here, no need to have a 'ret'

2067

I'd prefer to have the stack variable declaration at the nearest block, i.e. here the beginning of the function.

Maybe next time we could add -U999999 to generate the patch? So we could have context here. Thanks.

Use arc diff to upload patch will generate the context.

This revision is now accepted and ready to land.Jan 22 2016, 1:15 AM
adrian edited edge metadata.
This revision was automatically updated to reflect the committed changes.