- User Since
- Dec 7 2017, 1:03 PM (184 w, 1 d)
Wed, Jun 16
Tue, Jun 15
New version with change to strndup
Mon, Jun 14
Fri, Jun 11
Thank you for finding this bug!
Thu, Jun 10
May 17 2021
I have several systems with zpools on top of SAS drives behind an Adaptec Smart Storage PQI SAS (rev 01) controller. These zpools host Zvols used by VMs running under bhyve. On 13.0-RELEASE, the Linux guests have a workload which consistently hangs. On the FreeBSD host, dmesg contains errors like:
May 17 09:20:17 xxxxxxxx kernel: [ERROR]::[92:655.0][0,70,0][CPU 58][pqi_map_request]:bus_dmamap_load_ccb failed = 36 count = 1036288 May 17 09:20:17 xxxxxxxx kernel: [WARN]:[92:655.0][CPU 58][pqisrc_io_start]:In Progress on 70
Applying this patch to releng/13.0 fixes the hangs seen in the Linux guests.
May 3 2021
Apr 20 2021
FWIW, applying this patch via git apply D29584.diff results in a number of trailing whitespace complaints:
/home/tuffli/D29584.diff:70: trailing whitespace. /home/tuffli/D29584.diff:100: trailing whitespace. /home/tuffli/D29584.diff:1126: trailing whitespace. /home/tuffli/D29584.diff:1850: trailing whitespace. /home/tuffli/D29584.diff:1905: trailing whitespace. /home/tuffli/D29584.diff:2113: trailing whitespace. /home/tuffli/D29584.diff:3307: trailing whitespace. /home/tuffli/D29584.diff:4254: trailing whitespace. /home/tuffli/D29584.diff:4326: trailing whitespace. /home/tuffli/D29584.diff:5418: trailing whitespace. /home/tuffli/D29584.diff:6860: trailing whitespace.
Apr 13 2021
This makes sense to me, but I'm not an expert on these scripts.
Apr 12 2021
Apr 11 2021
Apr 8 2021
Apr 5 2021
Mar 11 2021
Mar 8 2021
I've update zpool to use root_hold_wait only after import fails, but I wasn't quite sure how to make similar changes for dumpon. Any suggestions?
- only set root_hold_wait if zpool import fails
Thanks for the update. If I'm understanding this compiler flag, it is not warning of an actual misalignment, more that this pattern may lead to unaligned accesses.
I'm having trouble matching pci_nvme.c:903 to current, stable/12 or stable/13. Which branch is this? I'm curious to see why the compiler doesn't like this instance as (naively) everything should be nicely aligned.
I'm happy to help out on the NVMe part but don't see this warning while building x86_64 on current. Can you point me in the direction of how to reproduce this?
Mar 5 2021
No, I hadn't considered that. Are you think of something like the following for zpool
Feb 12 2021
From past experience, I've noticed that HGST/WDC drives set CFS on power loss before being NVME_GONE. In this case, would the state machine recover as it doesn't check NVME_GONE again?
Jan 19 2021
Jan 8 2021
Jan 4 2021
Jan 2 2021
Any feedback on the change of failing_lba to uint8_t ? Does anything else stick out?
Dec 23 2020
Dec 14 2020
Fixed various endianess issues:
- made log page structure element failing_lba a byte array
- added htole32() conversion in the selftest command
Dec 9 2020
Addressed new-line in the man page and added a swap bytes routine for the device self-test log page
Oops, I missed that. Thanks!
Dec 8 2020
Dec 4 2020
Dec 2 2020
Dec 1 2020
Sep 20 2020
After thinking about this and talking it over with some of the other bhyve developers, the better approach would be to create a new device model specifically for passing commands and data between a real NVMe device and a guest. The new device model (perhaps pci_nvme_proxy.c) might have a small amount of overlap with the NVMe device model with respect to hooking PCI reads and writes and probably queue creation. But the majority of the functionality would be handled by either nvme(4) or cam(3) requests. Using nvme(4) ioctl's as you have is a good model for Admin commands, but you will want to experiment with how best to implement the I/O path. Let me if you have questions.
Sep 18 2020
I'd love to see the addition of a warning, but am happy with these changes.
FWIW, I checked with our NVMe standards person who says:
Sep 10 2020
Aug 24 2020
Aug 12 2020
I'm happy with this change but need to wait for maintainer approval before I commit it
Aug 10 2020
We've successfully tested a slightly modified version this patch with a CentOS-based VM that used RDTSCP on 12.1-RELEASE. Without the patch, the appliance image core dumps. With this patch, it runs correctly. Checked that module unload works. Thank you for fixing this!
Aug 2 2020
Looks good. Thank you for contributing this!
Aug 1 2020
While this is an interesting approach, the fix you proposed in D24202 is more in line with the goal of emulating an NVMe device, and I'd be more comfortable committing those changes.
This looks good to me. Please rebase against the latest and I'd be happy to commit this.
Jul 31 2020
Thank you for determining why some guest OS's (I'm guessing Windows?) believe that the device isn't healthy!
A couple of observations:
- If the backing storage for the namespace isn't an NVMe device, what will this code do?
- Were you able to determine which fields are important to the OS in question? If so, a better approach might be to fix the missing fields in the current implementation. This would have the added benefit of working with a ZVol or file-based backing storage.
Jul 20 2020
Jul 19 2020
Jul 6 2020
As an update, there is work underway to significantly enhance bhyve configuration. Once that lands, it should make many of the concerns here easier to address.