You are not logged in.

#1 2017-10-08 20:00:43

Cobra_Fast
Member
Registered: 2016-06-25
Posts: 7

Having trouble with a SAS2008 controller and hard disk standby/wakeup

Hello! I'm having trouble with timeouts during disk spin-up/wakeup from standby on my SAS2008 based controller.
I've been looking around for hours the past few days but couldn't find any definitive solution, so now I'm trying here (also since the affected machine is actually running Arch).

I'm collecting all my research at https://serverfault.com/q/876750/81089 but I'll outline the most important aspects here again:

Everytime a hard drive wakes up from standby and has to spin up, it causes errors:

[77517.340649] sd 0:0:6:0: attempting task abort! scmd(ffff8838c3e0cd48)
[77517.340653] sd 0:0:6:0: [sdg] tag#96 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
[77517.340656] scsi target0:0:6: handle(0x0010), sas_address(0x5001438020b9ee17), phy(23)
[77517.340657] scsi target0:0:6: enclosure_logical_id(0x5001438020b9ee25), slot(52)
[77519.975049] sd 0:0:6:0: task abort: SUCCESS scmd(ffff8838c3e0cd48)

sometimes accompanied by a several times

[97148.490308] mpt2sas_cm0: log_info(0x31110101): originator(PL), code(0x11), sub_code(0x0101)

or lately

[76776.492024] mpt2sas_cm0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)

In an effort to fix this problem I've increased all timeouts I could find to 90 or 120 seconds.
In the kernel driver:

/sys/block/sd?/device/timeout
/sys/block/sd?/device/eh_timeout

And also various I/O timeout settings in the controller's BIOS config utility.

However, almost like clockwork, any accesses that cause a drive to spin up timeout after between 10 to 12 seconds, which makes me assume I've missed a timeout setting somewhere. But I can't find any more.

I can provoke it consistantly by spinning down a drive with hdparm -y /dev/sdx, waiting a moment, and then running time hddtemp /dev/sdx.

My hardware configuration is as follows (copied from previously linked serverfault.com thread):

Dell Perc H310 (LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(07.39.02.00)) (flashed to 9211-8i IT-mode)
    `- HP SAS Expander card (FW 2.10)
        |- Hitachi HDS72404 } md0
        |- Hitachi HDS72404 } md0
        |- HGST HDN724040AL } md0
        |- HGST HDN724040AL } md0
        |- ST8000AS0002-1NA (btrfs)
        |- ST8000AS0002-1NA (btrfs)
        `- ST8000AS0002-1NA (xfs)

I tried downgrading to firmware 19 on the HBA and 2.08 on the expander, but that made no difference at all.

Is there anything I have missed or done wrong?
Anthing else I can try?

Last edited by Cobra_Fast (2017-10-08 23:10:23)

Offline

#2 2017-10-16 03:01:19

severach
Member
Registered: 2015-05-23
Posts: 198

Re: Having trouble with a SAS2008 controller and hard disk standby/wakeup

If you're using IT mode you can swap it with any of the cards listed here.

From 32 to 2 ports: Ideal SATA/SAS Controllers for ZFS & Linux MD RAID

Offline

#3 2017-10-16 11:36:40

Cobra_Fast
Member
Registered: 2016-06-25
Posts: 7

Re: Having trouble with a SAS2008 controller and hard disk standby/wakeup

Yeah thanks, but no thanks.

I meanwhile discovered that I can fix the issue by compiling my own kernel with a changed constant.
If I increase

#define BLK_MIN_SG_TIMEOUT  (7 * HZ)

in include/linux/blkdev.h (set it to 45*HZ) I practically don't have the problem anymore.
But I'd still like to know why some commands get a significantly shorter timeout than I configured in /sys/block/sd?/device/timeout.

Offline

Board footer

Powered by FluxBB