You are not logged in.

#1 2016-11-28 10:56:30

Bronek
Member
From: London
Registered: 2014-02-14
Posts: 119

zfs hook occasionally attempting to import ZFS root too soon

My system (root filesytem on ZFS) is occasionally failing to start, and instead bails out with a prompt like the one below

:: running hook [udev]
:: Triggering uevents...
:: running hook [zfs]
ZFS: Importing " -d /dev/disk/by-id -N zroot "
cannot import 'zroot': no such pool available
/init: line 48: die: not found
cannot open 'zroot': dataset does not exist
:: running cleanup hook [udev]
ERROR: Failed to mount the real root device.
Bailing out, you are on your own. Good luck.

sh: can't access tty; job control turned off
[rootfs ]# 

Eventually after removing "change-default-console-loglevel.patch" from my own kernel build (i.e. allowing the kernel to show messages when no "quiet" present in command line) I found that this happens because zfs hook is either a) started too soon, i.e. before the kernel had a chance to actually scan disks and partitions or b) attempts to import zfs root too soon. Here is example of a failing boot (with kernel messages from console interleaved with boot messages)

[    5.239068] mpt2sas_cm0: Allocated physical memory: size(17390 kB)
[    5.239069] mpt2sas_cm0: Current Controller Queue Depth(7931),Max Controller Queue Depth(8192)
[    5.239069] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[    5.250321] usb 1-1: new high-speed USB device number 2 using ehci-pci
[    5.277372] usb 2-1: new high-speed USB device number 2 using ehci-pci
[    5.284375] mpt2sas_cm0: LSISAS2308: FWVersion(19.00.00.00), ChipRevision(0x05), BiosVersion(07.37.00.00)
[    5.284376] mpt2sas_cm0: Protocol=(
[    5.284376] Initiator
[    5.284376] ,Target
[    5.284376] ),
[    5.284377] Capabilities=(
[    5.284377] TLR
[    5.284377] ,EEDP
[    5.284377] ,Snapshot Buffer
[    5.284378] ,Diag Trace Buffer
[    5.284378] ,Task Set Full
[    5.284379] ,NCQ
[    5.284379] )
[    5.284422] scsi host0: Fusion MPT SAS Host
[    5.284724] mpt2sas_cm0: sending port enable !!
[    5.365020] hub 1-1:1.0: USB hub found
[    5.365115] hub 1-1:1.0: 6 ports detected
[    5.375492] isci: Intel(R) C600 SAS Controller Driver - version 1.2.0
[    5.375542] isci 0000:04:00.0: driver configured for rev: 6 silicon
[    5.375548] isci 0000:04:00.0: OEM parameter table found in OROM
[    5.375549] isci 0000:04:00.0: OEM SAS parameters (version: 1.0) loaded (platform)
[    5.375772] isci 0000:04:00.0: SCU controller 0: phy 3-0 cables: {short, short, short, short}
[    5.378145] scsi host7: isci
[    5.391878] hub 2-1:1.0: USB hub found
[    5.391985] hub 2-1:1.0: 8 ports detected
[    5.474351] usb 5-2: new high-speed USB device number 2 using xhci_hcd
:: running hook [    5.477588] usb 6-2: new SuperSpeed USB device number 2 using xhci_hcd
[    5.490583] ZFS: Loaded module v0.6.5.8-1, ZFS pool version 5000, ZFS filesystem version 5
[    5.495567] hub 6-2:1.0: USB hub found
[zfs]
ZFS: Impo[    5.496203] hub 6-2:1.0: 4 ports detected
[    5.507660] mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x5003048013f5f000), phys(8)
rting " -d /dev/[    5.518115] ata2: SATA link down (SStatus 0 SControl 300)
[    5.518141] ata6: SATA link down (SStatus 0 SControl 300)
[    5.518360] ata4: SATA link down (SStatus 0 SControl 300)
disk/by-id -N zr[    5.518397] ata5: SATA link down (SStatus 0 SControl 300)
[    5.518421] ata3: SATA link down (SStatus 0 SControl 300)
[    5.519373] mpt2sas_cm0: port enable: SUCCESS
oot "
cannot im[    5.526109] ata1: SATA link down (SStatus 0 SControl 300)
[    5.538505] scsi 0:0:0:0: Direct-Access     ATA      INTEL SSDSC2BB16 0370 PQ: 0 ANSI: 6
[    5.538509] scsi 0:0:0:0: SATA: handle(0x000a), sas_addr(0x4433221103000000), phy(3), device_name(0x55cd2e404bdd3bdf)
port 'zroot': no[    5.538510] scsi 0:0:0:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(0)
 such pool avail[    5.538589] scsi 0:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)

For comparison, here is shortened console log from successful boot:

[    5.159721] mpt2sas_cm0: Allocated physical memory: size(17390 kB)
[    5.159722] mpt2sas_cm0: Current Controller Queue Depth(7931),Max Controller Queue Depth(8192)
[    5.159722] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[    5.205547] mpt2sas_cm0: LSISAS2308: FWVersion(19.00.00.00), ChipRevision(0x05), BiosVersion(07.37.00.00)
[    5.205548] mpt2sas_cm0: Protocol=(
[    5.205549] Initiator
[    5.205549] ,Target
[    5.205549] ),
[    5.205549] Capabilities=(
[    5.205549] TLR
[    5.205550] ,EEDP
[    5.205550] ,Snapshot Buffer
[    5.205550] ,Diag Trace Buffer
[    5.205550] ,Task Set Full
[    5.205551] ,NCQ
[    5.205551] )
[    5.205689] scsi host0: Fusion MPT SAS Host
[    5.205995] mpt2sas_cm0: sending port enable !!
[    5.208740] mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x5003048013f5f000), phys(8)
[    5.217425] mpt2sas_cm0: port enable: SUCCESS
[    5.218321] scsi 0:0:0:0: Direct-Access     ATA      INTEL SSDSC2BB16 0370 PQ: 0 ANSI: 6
[    5.218332] scsi 0:0:0:0: SATA: handle(0x000a), sas_addr(0x4433221103000000), phy(3), device_name(0x55cd2e404bdd3bdf)
[    5.218333] scsi 0:0:0:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(0)
[    5.218430] scsi 0:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    5.219702] scsi 0:0:1:0: Direct-Access     ATA      INTEL SSDSC2BB16 0370 PQ: 0 ANSI: 6
[    5.219705] scsi 0:0:1:0: SATA: handle(0x0009), sas_addr(0x4433221102000000), phy(2), device_name(0x55cd2e404c42b288)
[    5.219707] scsi 0:0:1:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(1)
[    5.219772] scsi 0:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    5.221521] scsi 0:0:2:0: Direct-Access     ATA      WDC WD40EFRX-68W 0A80 PQ: 0 ANSI: 6
[    5.221524] scsi 0:0:2:0: SATA: handle(0x000e), sas_addr(0x4433221107000000), phy(7), device_name(0x50014ee20913380f)
[    5.221526] scsi 0:0:2:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(4)
[    5.221591] scsi 0:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    5.224170] scsi 0:0:3:0: Direct-Access     ATA      WDC WD40EFRX-68W 0A80 PQ: 0 ANSI: 6
[    5.224173] scsi 0:0:3:0: SATA: handle(0x000b), sas_addr(0x4433221106000000), phy(6), device_name(0x50014ee25e68bd78)
[    5.224174] scsi 0:0:3:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(5)
[    5.224393] scsi 0:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    5.226770] scsi 0:0:4:0: Direct-Access     ATA      WDC WD40EFRX-68W 0A80 PQ: 0 ANSI: 6
[    5.226773] scsi 0:0:4:0: SATA: handle(0x000c), sas_addr(0x4433221105000000), phy(5), device_name(0x50014ee25e68c365)
[    5.226774] scsi 0:0:4:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(6)
[    5.226857] scsi 0:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    5.229381] scsi 0:0:5:0: Direct-Access     ATA      WDC WD40EFRX-68W 0A80 PQ: 0 ANSI: 6
[    5.229390] scsi 0:0:5:0: SATA: handle(0x000d), sas_addr(0x4433221104000000), phy(4), device_name(0x50014ee209134d1d)
[    5.229391] scsi 0:0:5:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(7)
[    5.229468] scsi 0:0:5:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    5.322427] usb 1-1: new high-speed USB device number 2 using ehci-pci
[    5.326390] usb 4-1: new high-speed USB device number 2 using ehci-pci
[    5.339617] ZFS: Loaded module v0.6.5.8-1, ZFS pool version 5000, ZFS filesystem version 5
[    5.423376] usb 5-2: new high-speed USB device number 2 using xhci_hcd
[    5.426703] usb 6-2: new SuperSpeed USB device number 2 using xhci_hcd
[    5.437038] hub 1-1:1.0: USB hub found
[    5.437141] hub 1-1:1.0: 6 ports detected
[    5.441036] hub 4-1:1.0: USB hub found
[    5.441141] hub 4-1:1.0: 8 ports detected
[    5.444402] hub 6-2:1.0: USB hub found
[    5.444946] hub 6-2:1.0: 4 ports detected
[    5.451387] ata2: SATA link down (SStatus 0 SControl 300)
[    5.451426] ata1: SATA link down (SStatus 0 SControl 300)
[    5.451466] ata3: SATA link down (SStatus 0 SControl 300)
[    5.452405] ata5: SATA link down (SStatus 0 SControl 300)
[    5.453382] ata6: SATA link down (SStatus 0 SControl 300)
[    5.453423] ata4: SATA link down (SStatus 0 SControl 300)
[    5.471718] sd 0:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
[    5.471719] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    5.471749] sd 0:0:1:0: [sdb] 312581808 512-byte logical blocks: (160 GB/149 GiB)
[    5.471750] sd 0:0:1:0: [sdb] 4096-byte physical blocks
[    5.471974] sd 0:0:2:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    5.471975] sd 0:0:2:0: [sdc] 4096-byte physical blocks
[    5.472001] sd 0:0:3:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    5.472002] sd 0:0:3:0: [sdd] 4096-byte physical blocks
[    5.472070] sd 0:0:4:0: [sde] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    5.472071] sd 0:0:4:0: [sde] 4096-byte physical blocks
[    5.472081] sd 0:0:5:0: [sdf] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    5.472082] sd 0:0:5:0: [sdf] 4096-byte physical blocks
[    5.473004] sd 0:0:0:0: [sda] Write Protect is off
[    5.473144] sd 0:0:1:0: [sdb] Write Protect is off
[    5.473317] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    5.473504] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    5.476668] sd 0:0:2:0: [sdc] Write Protect is off
[    5.476742] sd 0:0:3:0: [sdd] Write Protect is off
[    5.476849] sd 0:0:5:0: [sdf] Write Protect is off
[    5.476898] sd 0:0:4:0: [sde] Write Protect is off
[    5.477103]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5
[    5.477108]  sda: sda1 sda2 sda3 sda4 sda5
[    5.477431] sd 0:0:2:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    5.477494] sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    5.477757] sd 0:0:4:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    5.477811] sd 0:0:5:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    5.479855] sd 0:0:0:0: [sda] Attached SCSI disk
[    5.479907] sd 0:0:1:0: [sdb] Attached SCSI disk
[    5.544859] hub 5-2:1.0: USB hub found
[    5.545575] hub 5-2:1.0: 4 ports detected
[    5.716357] usb 4-1.1: new high-speed USB device number 3 using ehci-pci
[    5.735900] usb 6-2.2: new SuperSpeed USB device number 3 using xhci_hcd
[    5.753404] hub 6-2.2:1.0: USB hub found
[    5.753956] hub 6-2.2:1.0: 4 ports detected
[    5.803432] hub 4-1.1:1.0: USB hub found
[    5.803783] hub 4-1.1:1.0: 4 ports detected
[    5.834390] usb 5-2.2: new high-speed USB device number 3 using xhci_hcd
[    5.879404] usb 4-1.2: new low-speed USB device number 4 using ehci-pci
[    6.018414] clocksource: Switched to clocksource tsc
[    6.018433] isci 0000:04:00.0: driver configured for rev: 6 silicon
[    6.018438] isci 0000:04:00.0: OEM parameter table found in OROM
[    6.018440] isci 0000:04:00.0: OEM SAS parameters (version: 1.0) loaded (platform)
[    6.018544] isci 0000:04:00.0: SCU controller 0: phy 3-0 cables: {short, short, short, short}
[    6.018868] hub 5-2.2:1.0: USB hub found
[    6.019596] hub 5-2.2:1.0: 4 ports detected
[    6.020936] scsi host7: isci
[    6.063060] hidraw: raw HID events driver (C) Jiri Kosina
[    6.078191] usbcore: registered new interface driver usbhid
[    6.083806] usbhid: USB HID core driver
[    6.093315] usb 4-1.1.1: new low-speed USB device number 5 using ehci-pci
[    6.237707]  sdc: sdc1 sdc9
[    6.248919] sd 0:0:2:0: [sdc] Attached SCSI disk
[    6.250943]  sdd: sdd1 sdd9
[    6.251640]  sde: sde1 sde9
[    6.258157] random: nonblocking pool is initialized
[    6.259389] sd 0:0:3:0: [sdd] Attached SCSI disk
[    6.260116] sd 0:0:4:0: [sde] Attached SCSI disk
[    6.273450]  sdf: sdf1 sdf9
[    6.284775] sd 0:0:5:0: [sdf] Attached SCSI disk
:: running hook [zfs]
ZFS: Importing " -d /dev/disk/by-id -N zroot "
[    6.824671] SPL: using hostid 0xa8c03302
:: running cleanup hook [udev]
[    7.124416] ip_tables: (C) 2000-2006 Netfilter Core Team
[    7.132660] systemd[1]: systemd 231 running in system mode. (+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
[    7.150981] systemd[1]: Detected architecture x86-64.

Welcome to Arch Linux!

My question is, should I alter the zfs hook to wait some time after zfs module is loaded but before "zfs import"? Adding a fixed "sleep 5" into zfs hook is trivial and might just work around the problem, but perhaps there is more robust fix which I could share with archzfs maintainer ? Ideally I'd like to put a command in zfs hook which will block until all disks have been scanned by kernel (would "lsblk" fit the bill ?), and one with few enough dependencies to actually make sens there. Or perhaps the fix belongs to udev hook, which might appear to call "udevadm settle" too soon?

Offline

#2 2016-11-28 12:19:27

Bronek
Member
From: London
Registered: 2014-02-14
Posts: 119

Re: zfs hook occasionally attempting to import ZFS root too soon

Doing some research I learned that actually this is quite common problem and all distributions had to find their own way of handling with the removal of "scsi_wait_scan" module in kernel 3.6, which would have been used previously to block until current scan had completed. Following suggestion from https://bugs.launchpad.net/ironic-pytho … ug/1551300 (but without knowing much about udevadm - advice welcome!) I decided to try adding the following to zfs hook right before wait for zfs module

    # Force udev to list all devices, will block until scsi scan is completed
    udevadm trigger --verbose --dry-run --type=devices --subsystem-match=scsi_disk
    udevadm settle

Offline

#3 2016-12-01 19:12:43

Bronek
Member
From: London
Registered: 2014-02-14
Posts: 119

Re: zfs hook occasionally attempting to import ZFS root too soon

Well, that did not help. Got this today

[    5.707365] mpt2sas_cm0: Allocated physical memory: size(17390 kB)
[    5.713614] mpt2sas_cm0: Current Controller Queue Depth(7931),Max Controller Queue Depth(8192)
[    5.722316] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[    5.771694] mpt2sas_cm0: LSISAS2308: FWVersion(19.00.00.00), ChipRevision(0x05), BiosVersion(07.37.00.00)
[    5.781349] mpt2sas_cm0: Protocol=(
[    5.784709] Initiator,Target
[    5.787671] ), Capabilities=(
[    5.790703] TLR,EEDP
[    5.792947] ,Snapshot Buffer,Diag Trace Buffer
[    5.797485] ,Task Set Full,NCQ
[    5.800645] )
[    5.802479] scsi host7: Fusion MPT SAS Host
[    5.807025] mpt2sas_cm0: sending port enable !!
:: running hook [zfs]
[    5.814402] mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x5003048013f5f000), phys(8)
ZFS: Importing " -d /dev/disk/by-id -N zroot "
cannot import 'z[    5.834307] mpt2sas_cm0: port enable: SUCCESS
root': no such p[    5.839863] scsi 7:0:0:0: Direct-Access     ATA      INTEL SSDSC2BB16 0370 PQ: 0 ANSI: 6
ool available
/[    5.848787] scsi 7:0:0:0: SATA: handle(0x000a), sas_addr(0x4433221103000000), phy(3), device_name(0x55cd2e404bdd3bdf)
init: line 48: d[    5.848790] scsi 7:0:0:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(0)
[    5.848864] scsi 7:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
ie: not found
c[    5.850042] scsi 7:0:1:0: Direct-Access     ATA      INTEL SSDSC2BB16 0370 PQ: 0 ANSI: 6
[    5.850065] scsi 7:0:1:0: SATA: handle(0x0009), sas_addr(0x4433221102000000), phy(2), device_name(0x55cd2e404c42b288)
[    5.850066] scsi 7:0:1:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(1)
[    5.850128] scsi 7:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
annot open 'zroo[    5.851868] scsi 7:0:2:0: Direct-Access     ATA      WDC WD40EFRX-68W 0A80 PQ: 0 ANSI: 6
t': dataset does[    5.851870] scsi 7:0:2:0: SATA: handle(0x000c), sas_addr(0x4433221107000000), phy(7), device_name(0x50014ee20913380f)
 not exist
:: r[    5.851871] scsi 7:0:2:0: SATA: enclosure_logical_id(0x5003048013f5f000), slot(4)
[    5.851934] scsi 7:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)

Will now try simply adding

options scsi_mod scan=sync

to /etc/modprobe.d/

Offline

#4 2016-12-11 20:33:03

Bronek
Member
From: London
Registered: 2014-02-14
Posts: 119

Re: zfs hook occasionally attempting to import ZFS root too soon

Just an update that so far the change in my last post (i.e. setting "options scsi_mod scan=sync") works fine. This is instead of "udevadm settle"

Last edited by Bronek (2016-12-11 21:12:00)

Offline

#5 2017-01-02 20:51:17

Bronek
Member
From: London
Registered: 2014-02-14
Posts: 119

Re: zfs hook occasionally attempting to import ZFS root too soon

This seems to have cured the problem for good

bronek@gdansk ~ % cat /etc/modprobe.d/zfs.conf
# Enforce synchronous scsi scan, to prevent zfs driver loading before disks are available
options scsi_mod scan=sync

I created issue #94 in archzfs to make this option a default, so others can avoid the problems I've had

Offline

Board footer

Powered by FluxBB