Have you tested the current patch set? if not can you please complete it ASAP, Since it has been idle from one month.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 16 2021
May 28 2021
In D30182#685386, @imp wrote:i've staged this commit at
https://github.com/bsdimp/freebsd/tree/D30182
to make sure I've got the details right. This branch builds for me. I have no way to test it, but it's identical here.Please check to make sure that I've given the proper credit (for author) and the folks that have reviewed it here. I'm unsure if they are co-authors of this or not, but would be happy to credit them as such.
Once this one is sorted out, I'll do the merge using the other review that's going on for 12.2.
This review applies to both stable/13 and to -current as far as I can tell, though I've only test compiled on -current.
May 26 2021
Hi Yuripv,
May 21 2021
In D30299#682319, @trasz wrote:Sure, I'll hold off.
What's the relationship between https://reviews.freebsd.org/D24428 and https://reviews.freebsd.org/D30182?
May 20 2021
Hi Yuripv,
Fixed lockup issues and trailing white space errors.
FreeBSD 12.2 and 13.0 code reviews patches doesn't have lockup code fix, it
will be pulled once it's verified in main branch.
Fixed lockup issue
May 17 2021
Quoted Text
In D30299#680677, @trasz wrote:Indeed; I've added the author to the reviewers list.
(Also: tinderboxed.)
May 13 2021
Hi Rainer,
May 10 2021
In D24428#677744, @rainer_ultra-secure.de wrote:FYI.
The server hasn't crashed anymore so far (20d uptime).Maybe if this could be committed to 12-stable, so it will at some point make it to a release?
Not sure how and when I'm going to test 13, TBH.
Added lockup code detection in driver
Added lockup code code detection in driver
May 4 2021
In D29584#669304, @yuripv wrote:In D29584#669299, @papani.srikanth_microchip.com wrote:In D29584#668301, @yuripv wrote:With this patch applied, it fails much faster for me completely locking up ZFS, usually after some minutes of light load (e.g. I was doing git gc when this happened):
[ERROR]::[4:655.0][CPU 0][pqisrc_heartbeat_timer_handler][178]:controller is offline (da1:smartpqi0:0:65:0): WRITE(10). CDB: 2a 00 05 82 4f e0 00 07 e8 00 (da2:smartpqi0:0:66:0): WRITE(10). CDB: 2a 00 05 82 7d 98 00 00 58 00 (da1:smartpqi0:0:65:0): CAM status: Unable to abort CCB request (da1:smartpqi0:0:65:0): Error 5, Unretryable error (da1:smartpqi0:0:65:0): WRITE(10). CDB: 2a 00 05 82 67 98 00 07 e8 00 (da1:smartpqi0:0:65:0): CAM status: Unable to abort CCB request (da1:smartpqi0:0:65:0): Error 5, Unretryable error ... da0 at smartpqi0 bus 0 scbus0 target 64 lun 0 da0: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WX32D7088CCV detached (da1:smartpqi0:0:65:0): Error 5, Unretryable error (da1:smartpqi0:0:65:0): WRITE(10). CDB: 2a 00 05 82 38 28 00 07 e8 00 (da1:smartpqi0:0:65:0): CAM status: Unable to abort CCB request (da1:smartpqi0:0:65:0): Error 5, Unretryable error da1 at smartpqi0 bus 0 scbus0 target 65 lun 0 da1: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WX42D70CHZS7 detached (da2:smartpqi0:0:66:0): CAM status: Unable to abort CCB request (da2:smartpqi0:0:66:0): Error 5, Unretryable error (da2:smartpqi0:0:66:0): WRITE(10). CDB: 2a 00 05 82 75 b0 00 07 e8 00 (da2:smartpqi0:0:66:0): CAM status: Unable to abort CCB request (da2:smartpqi0:0:66:0): Error 5, Unretryable error da2 at smartpqi0 bus 0 scbus0 target 66 lun 0 da2: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WXC2D90D7YAX detached da3 at smartpqi0 bus 0 scbus0 target 67 lun 0 da3: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WX12DB0N8F4X detached ses0 at smartpqi0 bus 0 scbus0 target 68 lun 0 ses0: <Adaptec Smart Adapter 3.53> s/n 7A4263EAB3E detached pass5 at smartpqi0 bus 0 scbus0 target 1088 lun 1 pass5: <Adaptec 1100-8i 3.53> s/n 7A4263EAB3E detached (ses0:smartpqi0:0:68:0): Periph destroyed (pass5:smartpqi0:0:1088:1): Periph destroyed Solaris: WARNING: Pool 'data' has encountered an uncorrectable I/O failure and has been suspended. Apr 16 11:36:45 sun ZFS[45956]: catastrophic pool I/O failure, zpool=dataNow every FS access is stuck. I was able to save the kernel dump using NMI (in case we can get something interesting from it).
This same system works just fine using the same HBA, same disks, same cabling under illumos (using our internal smartpqi driver) transferring TBs worth of data, so I don't expect this to be hardware issue.
HBA is 1100-8i, disks are 4x WDC WD40PURZ SATA3 HDDs, connected using breakout cable.
Thanks for testing the changes, The lockup issue is new to us. Could you please provide the Z-Pool reproduction steps so we can reflect the setup in our lab too for testing.
Pool configuration is simple -- raidz of 4 sata disks connected using breakout cable, i.e.:
# for i in $(seq 0 3); do gpart create -s gpt da$i; gpart add -t freebsd-zfs da$i; done da0 created da0p1 added da1 created da1p1 added da2 created da2p1 added da3 created da3p1 added # zpool create -O atime=off -O compression=zstd data raidz da0p1 da1p1 da2p1 da3p1 # zpool status data pool: data state: ONLINE config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da0p1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da2p1 ONLINE 0 0 0 da3p1 ONLINE 0 0 0 errors: No known data errorsI don't know what exactly triggers the issue, usually it's some simple operation like git clone, git pull, git gc and so on.
In D29584#670456, @chuck wrote:FWIW, applying this patch via git apply D29584.diff results in a number of trailing whitespace complaints:
/home/tuffli/D29584.diff:70: trailing whitespace. /home/tuffli/D29584.diff:100: trailing whitespace. /home/tuffli/D29584.diff:1126: trailing whitespace. /home/tuffli/D29584.diff:1850: trailing whitespace. /home/tuffli/D29584.diff:1905: trailing whitespace. /home/tuffli/D29584.diff:2113: trailing whitespace. /home/tuffli/D29584.diff:3307: trailing whitespace. /home/tuffli/D29584.diff:4254: trailing whitespace. /home/tuffli/D29584.diff:4326: trailing whitespace. /home/tuffli/D29584.diff:5418: trailing whitespace. /home/tuffli/D29584.diff:6860: trailing whitespace.
- Added lockup code info in driver,
Re-try the IO's if there is a lack of DMA resources instead of deferring.
Apr 19 2021
In D29584#668301, @yuripv wrote:With this patch applied, it fails much faster for me completely locking up ZFS, usually after some minutes of light load (e.g. I was doing git gc when this happened):
[ERROR]::[4:655.0][CPU 0][pqisrc_heartbeat_timer_handler][178]:controller is offline (da1:smartpqi0:0:65:0): WRITE(10). CDB: 2a 00 05 82 4f e0 00 07 e8 00 (da2:smartpqi0:0:66:0): WRITE(10). CDB: 2a 00 05 82 7d 98 00 00 58 00 (da1:smartpqi0:0:65:0): CAM status: Unable to abort CCB request (da1:smartpqi0:0:65:0): Error 5, Unretryable error (da1:smartpqi0:0:65:0): WRITE(10). CDB: 2a 00 05 82 67 98 00 07 e8 00 (da1:smartpqi0:0:65:0): CAM status: Unable to abort CCB request (da1:smartpqi0:0:65:0): Error 5, Unretryable error ... da0 at smartpqi0 bus 0 scbus0 target 64 lun 0 da0: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WX32D7088CCV detached (da1:smartpqi0:0:65:0): Error 5, Unretryable error (da1:smartpqi0:0:65:0): WRITE(10). CDB: 2a 00 05 82 38 28 00 07 e8 00 (da1:smartpqi0:0:65:0): CAM status: Unable to abort CCB request (da1:smartpqi0:0:65:0): Error 5, Unretryable error da1 at smartpqi0 bus 0 scbus0 target 65 lun 0 da1: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WX42D70CHZS7 detached (da2:smartpqi0:0:66:0): CAM status: Unable to abort CCB request (da2:smartpqi0:0:66:0): Error 5, Unretryable error (da2:smartpqi0:0:66:0): WRITE(10). CDB: 2a 00 05 82 75 b0 00 07 e8 00 (da2:smartpqi0:0:66:0): CAM status: Unable to abort CCB request (da2:smartpqi0:0:66:0): Error 5, Unretryable error da2 at smartpqi0 bus 0 scbus0 target 66 lun 0 da2: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WXC2D90D7YAX detached da3 at smartpqi0 bus 0 scbus0 target 67 lun 0 da3: <ATA WDC WD40PURZ-85A 0A80> s/n WD-WX12DB0N8F4X detached ses0 at smartpqi0 bus 0 scbus0 target 68 lun 0 ses0: <Adaptec Smart Adapter 3.53> s/n 7A4263EAB3E detached pass5 at smartpqi0 bus 0 scbus0 target 1088 lun 1 pass5: <Adaptec 1100-8i 3.53> s/n 7A4263EAB3E detached (ses0:smartpqi0:0:68:0): Periph destroyed (pass5:smartpqi0:0:1088:1): Periph destroyed Solaris: WARNING: Pool 'data' has encountered an uncorrectable I/O failure and has been suspended. Apr 16 11:36:45 sun ZFS[45956]: catastrophic pool I/O failure, zpool=dataNow every FS access is stuck. I was able to save the kernel dump using NMI (in case we can get something interesting from it).
This same system works just fine using the same HBA, same disks, same cabling under illumos (using our internal smartpqi driver) transferring TBs worth of data, so I don't expect this to be hardware issue.
HBA is 1100-8i, disks are 4x WDC WD40PURZ SATA3 HDDs, connected using breakout cable.
Apr 12 2021
In D24428#666449, @rainer_ultra-secure.de wrote:Hi,
I have this slated for production in one week.
I will have to wait a couple of days to see if it fixes the hangs I have.
In D24428#664005, @papani.srikanth_microchip.com wrote:In D24428#663851, @imp wrote:In D24428#663568, @papani.srikanth_microchip.com wrote:Can someone approve/review the code ?
I'll look at it this afternoon.
Do you also need someone to commit it to the FreeBSD tree too?
I would be happy if someone commit it to the repository.
Apr 7 2021
In D24428#663851, @imp wrote:In D24428#663568, @papani.srikanth_microchip.com wrote:Can someone approve/review the code ?
I'll look at it this afternoon.
Do you also need someone to commit it to the FreeBSD tree too?
Apr 6 2021
Can someone approve/review the code ?
Can someone approve/review the code ?
Apr 5 2021
Updated copyright section
In D24428#662938, @yuripv wrote:In D24428#661566, @papani.srikanth_microchip.com wrote:In D24428#661402, @yuripv wrote:In D24428#658934, @papani.srikanth_microchip.com wrote:In D24428#657918, @imp wrote:There's issues with applying this patch. Something seems to have gone amiss in its generation.
It almost applied cleanly to stable/12 branch, but not to the main branch:
% find . -name \*.rej ./sys/dev/smartpqi/smartpqi_mem.c.rej ./sys/dev/smartpqi/smartpqi_queue.c.rej ./sys/dev/smartpqi/smartpqi_defines.h.rej ./sys/dev/smartpqi/smartpqi_cam.c.rej ./sys/dev/smartpqi/smartpqi_misc.c.rej ./sys/dev/smartpqi/smartpqi_main.c.rej ./sys/dev/smartpqi/smartpqi_request.c.rejIn the main branch, it crashed patch :(.
A quick sample of the .rej files shows the diffs likely are easy to resolve by hand, but with such a large patch I'm leery to do so. Add '-l' to patch to cope with whitespace changes didn't seem to help.
So it looks like this patch needs to be regenerated and/or moved to git where patch generation and uploading is a bit more reliable.(Also commented on a couple of nits that didn't look quite right in the copyright stuff, but that can wait for the patch to get done).
Hi,
- 12.0 stable branch and 12.2 main branch has two different source codes. I've pulled the 12.0 source code and applied the patch. (12.0 stable branch has bug fixes which is done by community but I do not see the same changes in 12.2 main branch because of that the patch is failing on 12.2 main branch).
"main" mentioned is literally main git branch, where this change should go first (and it's 14.0-CURRENT at the moment). I have HBA 1100-8i that is misbehaving under load, so I'd really like to try this patch -- could you please rebase this against main?
This patch is for 12.2 only, I will push a new patch separately to the main branch.
Is there a review/patch for main branch yet to test?
Here is the review patch for main branch.
https://reviews.freebsd.org/D29584
Mar 31 2021
In D24428#661402, @yuripv wrote:In D24428#658934, @papani.srikanth_microchip.com wrote:In D24428#657918, @imp wrote:There's issues with applying this patch. Something seems to have gone amiss in its generation.
It almost applied cleanly to stable/12 branch, but not to the main branch:
% find . -name \*.rej ./sys/dev/smartpqi/smartpqi_mem.c.rej ./sys/dev/smartpqi/smartpqi_queue.c.rej ./sys/dev/smartpqi/smartpqi_defines.h.rej ./sys/dev/smartpqi/smartpqi_cam.c.rej ./sys/dev/smartpqi/smartpqi_misc.c.rej ./sys/dev/smartpqi/smartpqi_main.c.rej ./sys/dev/smartpqi/smartpqi_request.c.rejIn the main branch, it crashed patch :(.
A quick sample of the .rej files shows the diffs likely are easy to resolve by hand, but with such a large patch I'm leery to do so. Add '-l' to patch to cope with whitespace changes didn't seem to help.
So it looks like this patch needs to be regenerated and/or moved to git where patch generation and uploading is a bit more reliable.(Also commented on a couple of nits that didn't look quite right in the copyright stuff, but that can wait for the patch to get done).
Hi,
- 12.0 stable branch and 12.2 main branch has two different source codes. I've pulled the 12.0 source code and applied the patch. (12.0 stable branch has bug fixes which is done by community but I do not see the same changes in 12.2 main branch because of that the patch is failing on 12.2 main branch).
"main" mentioned is literally main git branch, where this change should go first (and it's 14.0-CURRENT at the moment). I have HBA 1100-8i that is misbehaving under load, so I'd really like to try this patch -- could you please rebase this against main?
Mar 26 2021
Updated driver.
Mar 25 2021
In D24428#657918, @imp wrote:There's issues with applying this patch. Something seems to have gone amiss in its generation.
It almost applied cleanly to stable/12 branch, but not to the main branch:
% find . -name \*.rej ./sys/dev/smartpqi/smartpqi_mem.c.rej ./sys/dev/smartpqi/smartpqi_queue.c.rej ./sys/dev/smartpqi/smartpqi_defines.h.rej ./sys/dev/smartpqi/smartpqi_cam.c.rej ./sys/dev/smartpqi/smartpqi_misc.c.rej ./sys/dev/smartpqi/smartpqi_main.c.rej ./sys/dev/smartpqi/smartpqi_request.c.rejIn the main branch, it crashed patch :(.
A quick sample of the .rej files shows the diffs likely are easy to resolve by hand, but with such a large patch I'm leery to do so. Add '-l' to patch to cope with whitespace changes didn't seem to help.
So it looks like this patch needs to be regenerated and/or moved to git where patch generation and uploading is a bit more reliable.(Also commented on a couple of nits that didn't look quite right in the copyright stuff, but that can wait for the patch to get done).
Mar 15 2021
Feb 19 2021
In D24428#644782, @rainer_ultra-secure.de wrote:Please excuse me, I am only a user, I only try to get this working.
(f-hosting </root>) 130 # cd /usr/src
(f-hosting <src>) 0 # git apply /root/D24428.diff
/root/D24428.diff:1875: trailing whitespace.
_softs->pci_mem_handle.pqi_bhandle, _offset)
/root/D24428.diff:3930: space before tab in indent.DBG_FUNC("IN\n");/root/D24428.diff:3944: space before tab in indent.
DBG_FUNC("OUT\n");/root/D24428.diff:6642: trailing whitespace.
- Populate hostwellness time variables in bcd format from FreeBSD format
/root/D24428.diff:7949: space before tab in indent.
sizeof(raid_req->lun_number));error: dev/smartpqi/smartpqi_cam.c: No such file or directory
error: dev/smartpqi/smartpqi_cmd.c: No such file or directory
error: dev/smartpqi/smartpqi_defines.h: No such file or directory
error: dev/smartpqi/smartpqi_discovery.c: No such file or directory
error: dev/smartpqi/smartpqi_event.c: No such file or directory
error: dev/smartpqi/smartpqi_helper.c: No such file or directory
error: dev/smartpqi/smartpqi_includes.h: No such file or directory
error: dev/smartpqi/smartpqi_init.c: No such file or directory
error: dev/smartpqi/smartpqi_intr.c: No such file or directory
error: dev/smartpqi/smartpqi_ioctl.h: No such file or directory
error: dev/smartpqi/smartpqi_ioctl.c: No such file or directory
error: dev/smartpqi/smartpqi_main.c: No such file or directory
error: dev/smartpqi/smartpqi_mem.c: No such file or directory
error: dev/smartpqi/smartpqi_misc.c: No such file or directory
error: dev/smartpqi/smartpqi_prototypes.h: No such file or directory
error: dev/smartpqi/smartpqi_queue.c: No such file or directory
error: dev/smartpqi/smartpqi_request.c: No such file or directory
error: dev/smartpqi/smartpqi_response.c: No such file or directory
error: dev/smartpqi/smartpqi_sis.c: No such file or directory
error: dev/smartpqi/smartpqi_structures.h: No such file or directory
error: dev/smartpqi/smartpqi_tag.c: No such file or directoryI know it's not your job - but can you walk me through what I have got to do to get this to compile on 12.2-RELEASE-p3?
In D24428#644381, @rainer_ultra-secure.de wrote:In D24428#641982, @papani.srikanth_microchip.com wrote:Fixed system crash while creating and deleting logical volume in a continuous loop.
Fixed where the volume size is not exposing to OS when it expands.
Added HC3 pci id's.
Fixed compiler issues in 12.2 kernel.I can't seem to apply the patch cleanly on 12.2-RELEASE-p3.
See my comment in the PR above.
Fixed white space errors
Feb 15 2021
Updated year in Copyright section.
Fixed system crash while creating and deleting logical volume in a continuous loop.
Fixed where the volume size is not exposing to OS when it expands.
Added HC3 pci id's.
Fixed compiler issues in 12.2 kernel.
Feb 12 2021
In D24428#640413, @rainer_ultra-secure.de wrote:This does not compile with 12.2.
See
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240145Can someone look at this?
Yes, we're working on the issues. I will update the latest driver soon.
Jul 13 2020
Updated Inbox Driver version to match with latest Out-of-box driver 1x.4014.0.105
Added newly fixed system crashes
Jul 10 2020
Hi All,
May 14 2020
comments resolved
May 7 2020
Added bug fixes which is happened recently