You are not logged in.

#1 2017-11-15 07:09:00

morphin
Member
Registered: 2017-11-15
Posts: 20

mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

Hello.

I have 5 jbod and i reach them each one with LSI 3008.
Last day i lost 1 jbod because mpt3sas_cm2 reset my LSI card. After few sec i was able to use jbod again.
I wanna know why?
Where is the problem? Where should I focus?
I do not want to live the problem again because I needed a resilver for about 120 hours right now.

Kernel log= https://gist.github.com/morphinz/26812c … 300ec76bd0

I lost index 2.
LSISAS3008: FWVersion(15.00.00.00), ChipRevision(0x02), BiosVersion(08.35.00.00

My Kernel=  Linux 4.13.11-1-ARCH #1 SMP PREEMPT Thu Nov 2 10:25:56 CET 2017 x86_64 GNU/Linux


sas3ircu list
Avago Technologies SAS3 IR Configuration Utility.
Version 15.00.00.00 (2016.11.21)
Copyright (c) 2009-2016 Avago Technologies. All rights reserved.


         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   0     SAS3008       1000h   97h    00h:01h:00h:00h      15d9h   0808h

         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   1     SAS3008       1000h   97h    00h:03h:00h:00h      1028h   1f46h

         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   2     SAS3008       1000h   97h    00h:81h:00h:00h      1000h   30a0h

         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   3     SAS3008       1000h   97h    00h:82h:00h:00h      1028h   1f46h

         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   4     SAS3008       1000h   97h    00h:83h:00h:00h      1028h   1f46h

         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   5     SAS3008       1000h   97h    00h:84h:00h:00h      1028h   1f46h
SAS3IRCU: Utility Completed Successfully.

Last edited by morphin (2017-11-15 07:10:26)

Offline

#2 2018-05-02 22:20:30

morphin
Member
Registered: 2017-11-15
Posts: 20

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

This is still happening on latest kernel. This is all different system but problem is the same.
I'm going to lose my mind..
Why 1 disk failure ruins my storage! I have 290 disk.. I'm really really scared because of the problem I'm facing to data lose..
Please help me.


Linux FKM1 4.14.35-1-lts #1 SMP Thu Apr 19 10:38:59 CEST 2018 x86_64 GNU/Linux

DMESG: https://paste.ubuntu.com/p/vFs92n6JXz/

___________________________________________________________

sas3flash -listall
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

    Adapter Selected is a Avago SAS: SAS3008(C0)

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS3008(C0)  15.00.02.00    0e.00.00.08    08.29.01.00     00:01:00:00
1  SAS3008(C0)  13.00.00.00    0b.02.00.2c    08.31.00.00     00:03:00:00
2  SAS3008(C0)  13.00.00.00    0b.02.00.2c    08.31.00.00     00:81:00:00
3  SAS3008(C0)  15.00.02.00    0e.00.00.08    08.35.00.00     00:82:00:00
4  SAS3008(C0)  15.00.02.00    0e.00.00.08    08.35.00.00     00:83:00:00

Last edited by morphin (2018-05-02 22:43:35)

Offline

#3 2018-05-02 23:48:52

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

What does upstream zfs make of the issue?

zio_wait+0x10f/0x260 [zfs]

Offline

#4 2018-05-03 07:54:50

morphin
Member
Registered: 2017-11-15
Posts: 20

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

loqs wrote:

What does upstream zfs make of the issue?

zio_wait+0x10f/0x260 [zfs]

I don't know. When a disk disappears ZFS should do something like expected.
But the reset diag comes exactly after "zio_wait+0x10f/0x260 [zfs]"
Do you think this issue related to ZFS?
I was going to open a ticked to avago because I started to think the problem related to Avago driver.

Offline

#5 2018-05-03 10:31:02

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

Are you using kernel modules supplied by avago instead of the in tree mpt2sas?

Offline

#6 2018-05-03 19:26:56

morphin
Member
Registered: 2017-11-15
Posts: 20

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

loqs wrote:

Are you using kernel modules supplied by avago instead of the in tree mpt2sas?

I dont use mpt2sas, I'm just using mpt3sas. Its just a base kernel and running ZFS.
Also I have updated my LSI cards to latest firmware.
In this setup I have 3 "Dell LSI 3008" and 2 "LSI Logic 3008". This is why Firmware's  differents.  "13.0*" is the LSI logic cards.

Last edited by morphin (2018-05-03 19:43:17)

Offline

#7 2018-05-03 20:03:47

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

The point I failed to make was if the mtp3sas module is from kernel.org not directly from LSI/AVAGO/BROADCOM then I would be surprised if they support it.
kernel.org does not support zfs,  its not a bug for the arch bug tracker unless its a packaging issue so that would leave upstream zfs as a source of information that
might have encountered the issue before.

Offline

#8 2018-05-03 20:29:23

morphin
Member
Registered: 2017-11-15
Posts: 20

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

loqs wrote:

The point I failed to make was if the mtp3sas module is from kernel.org not directly from LSI/AVAGO/BROADCOM then I would be surprised if they support it.
kernel.org does not support zfs,  its not a bug for the arch bug tracker unless its a packaging issue so that would leave upstream zfs as a source of information that
might have encountered the issue before.

Yes I'm using kernel mpt3sas. I didnt do any special thing for mtp3sas. I just added to mkinitcpio.conf.

MODULES=(ehci_pci xhci_pci hid vmw_balloon vmw_pvscsi vsock vmw_vsock_vmci_transport ahci megaraid_sas mpt3sas hpsa mptsas dm_mod)

Offline

#9 2018-05-05 11:37:15

morphin
Member
Registered: 2017-11-15
Posts: 20

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

After 3 day the problem showed up again. I still cant find what is the reason.

https://paste.ubuntu.com/p/9yMkp5N5HR/

Offline

#10 2018-08-01 08:23:11

morphin
Member
Registered: 2017-11-15
Posts: 20

Re: mpt3sas_cm2: attempting host reset! scmd(ffff9109c4689548)

Increasing disk timeout saved my life 2-3 months.
for drive in /sys/block/sd*; do echo 180 >  $drive/device/timeout; done

But last weak I got same problem again. Atleast this is something... You can recover your pool via setting bigger timeout to your disks.
I'm still looking for better solutions.

Offline

Board footer

Powered by FluxBB