You are not logged in.
Hello.
I have 5 jbod and i reach them each one with LSI 3008.
Last day i lost 1 jbod because mpt3sas_cm2 reset my LSI card. After few sec i was able to use jbod again.
I wanna know why?
Where is the problem? Where should I focus?
I do not want to live the problem again because I needed a resilver for about 120 hours right now.
Kernel log= https://gist.github.com/morphinz/26812c … 300ec76bd0
I lost index 2.
LSISAS3008: FWVersion(15.00.00.00), ChipRevision(0x02), BiosVersion(08.35.00.00
My Kernel= Linux 4.13.11-1-ARCH #1 SMP PREEMPT Thu Nov 2 10:25:56 CET 2017 x86_64 GNU/Linux
sas3ircu list
Avago Technologies SAS3 IR Configuration Utility.
Version 15.00.00.00 (2016.11.21)
Copyright (c) 2009-2016 Avago Technologies. All rights reserved.
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
0 SAS3008 1000h 97h 00h:01h:00h:00h 15d9h 0808h
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
1 SAS3008 1000h 97h 00h:03h:00h:00h 1028h 1f46h
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
2 SAS3008 1000h 97h 00h:81h:00h:00h 1000h 30a0h
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
3 SAS3008 1000h 97h 00h:82h:00h:00h 1028h 1f46h
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
4 SAS3008 1000h 97h 00h:83h:00h:00h 1028h 1f46h
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
5 SAS3008 1000h 97h 00h:84h:00h:00h 1028h 1f46h
SAS3IRCU: Utility Completed Successfully.
Last edited by morphin (2017-11-15 07:10:26)
Offline
This is still happening on latest kernel. This is all different system but problem is the same.
I'm going to lose my mind..
Why 1 disk failure ruins my storage! I have 290 disk.. I'm really really scared because of the problem I'm facing to data lose..
Please help me.
Linux FKM1 4.14.35-1-lts #1 SMP Thu Apr 19 10:38:59 CEST 2018 x86_64 GNU/Linux
DMESG: https://paste.ubuntu.com/p/vFs92n6JXz/
___________________________________________________________
sas3flash -listall
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.
Adapter Selected is a Avago SAS: SAS3008(C0)
Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------
0 SAS3008(C0) 15.00.02.00 0e.00.00.08 08.29.01.00 00:01:00:00
1 SAS3008(C0) 13.00.00.00 0b.02.00.2c 08.31.00.00 00:03:00:00
2 SAS3008(C0) 13.00.00.00 0b.02.00.2c 08.31.00.00 00:81:00:00
3 SAS3008(C0) 15.00.02.00 0e.00.00.08 08.35.00.00 00:82:00:00
4 SAS3008(C0) 15.00.02.00 0e.00.00.08 08.35.00.00 00:83:00:00
Last edited by morphin (2018-05-02 22:43:35)
Offline
What does upstream zfs make of the issue?
zio_wait+0x10f/0x260 [zfs]
Offline
What does upstream zfs make of the issue?
zio_wait+0x10f/0x260 [zfs]
I don't know. When a disk disappears ZFS should do something like expected.
But the reset diag comes exactly after "zio_wait+0x10f/0x260 [zfs]"
Do you think this issue related to ZFS?
I was going to open a ticked to avago because I started to think the problem related to Avago driver.
Offline
Are you using kernel modules supplied by avago instead of the in tree mpt2sas?
Offline
Are you using kernel modules supplied by avago instead of the in tree mpt2sas?
I dont use mpt2sas, I'm just using mpt3sas. Its just a base kernel and running ZFS.
Also I have updated my LSI cards to latest firmware.
In this setup I have 3 "Dell LSI 3008" and 2 "LSI Logic 3008". This is why Firmware's differents. "13.0*" is the LSI logic cards.
Last edited by morphin (2018-05-03 19:43:17)
Offline
The point I failed to make was if the mtp3sas module is from kernel.org not directly from LSI/AVAGO/BROADCOM then I would be surprised if they support it.
kernel.org does not support zfs, its not a bug for the arch bug tracker unless its a packaging issue so that would leave upstream zfs as a source of information that
might have encountered the issue before.
Offline
The point I failed to make was if the mtp3sas module is from kernel.org not directly from LSI/AVAGO/BROADCOM then I would be surprised if they support it.
kernel.org does not support zfs, its not a bug for the arch bug tracker unless its a packaging issue so that would leave upstream zfs as a source of information that
might have encountered the issue before.
Yes I'm using kernel mpt3sas. I didnt do any special thing for mtp3sas. I just added to mkinitcpio.conf.
MODULES=(ehci_pci xhci_pci hid vmw_balloon vmw_pvscsi vsock vmw_vsock_vmci_transport ahci megaraid_sas mpt3sas hpsa mptsas dm_mod)
Offline
After 3 day the problem showed up again. I still cant find what is the reason.
Offline
Increasing disk timeout saved my life 2-3 months.
for drive in /sys/block/sd*; do echo 180 > $drive/device/timeout; done
But last weak I got same problem again. Atleast this is something... You can recover your pool via setting bigger timeout to your disks.
I'm still looking for better solutions.
Offline