You are not logged in.

#1 2015-08-28 22:08:00

Sedrunum
Member
Registered: 2015-08-28
Posts: 3

host bus error during arch installation on encrypted ssd

Two days ago, I started installing a brand new Arch Linux on my Laptop. As I wanted to apply full disk encrytion (including /boot), I consider running arch with GRUB being installed on a removable flash drive with an encrypted boot partition of 500MiB. This boot partition comprises the grub config, kernel and initramfs. Prgressing in the boot process it then decrypts the root partition - residing on a LUKS/dm-crypt encrypted Samsung 830 SSD with activated TRIM support. As it turned out this configuration wouldn't be the actual problem but my bus controller: After partitioning, encrypting, decrypting, formatting and mounting of my desired partitions I fixed some broken-gpg-signature-errors and continued with

pacstrap /mnt

This caused several read-write-errors, ending with the assumption that the filesystem would be read-only what I had checked in advance with a blank

mount

to be wrong. After

mount -o remount,rw /mnt

and deleting the package parts as well as the pacman db lock, the installation seemed successful. However after system configuration, kernel compiling, bootloader installation and reboot, my system complained to me:

ERROR: Root device mounted successfully, but /sbin/init does not exist.
Bailing out, you are on your own now. Good luck.

Running a quick check through manual mounting I couldn't find any files being saved on my SSD partition. Anew installation revealed the same output. Anyone out there who has an idea on how to continue?

Some logs....

dmesg: (dumped after first pacstrap run and filtered with 'grep ata')

[    0.000000] ACPI: SSDT 0x00000000CA0E4C30 000315 (v01 SataRe SataTabl 00001000 INTL 20091112)
[    0.000000] Memory: 8042632K/8275960K available (5699K kernel code, 893K rwdata, 1732K rodata, 1180K init, 1152K bss, 233328K reserved, 0K cma-reserved)
[    1.147649] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    3.660276] Write protecting the kernel read-only data: 8192k
[    3.714976] libata version 3.00 loaded.
[    3.784617] ata1: SATA max UDMA/133 abar m2048@0xf7c16000 port 0xf7c16100 irq 28
[    3.784619] ata2: SATA max UDMA/133 abar m2048@0xf7c16000 port 0xf7c16180 irq 28
[    3.784620] ata3: SATA max UDMA/133 abar m2048@0xf7c16000 port 0xf7c16200 irq 28
[    3.784622] ata4: SATA max UDMA/133 abar m2048@0xf7c16000 port 0xf7c16280 irq 28
[    3.784623] ata5: DUMMY
[    3.784624] ata6: DUMMY
[    4.103429] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    4.103989] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    4.103992] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    4.103994] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    4.104205] ata1.00: ATA-9: SAMSUNG SSD 830 Series, CXM03B1Q, max UDMA/133
[    4.104206] ata1.00: 250069680 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    4.104515] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    4.104517] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    4.104519] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    4.104721] ata1.00: configured for UDMA/133
[    4.423364] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    4.424774] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    4.424781] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    4.424785] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    4.426038] ata2.00: ATA-8: WDC WD5000LPVT-22G33T0, 01.01A01, max UDMA/133
[    4.426045] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    4.427545] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    4.427553] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    4.427557] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    4.428803] ata2.00: configured for UDMA/133
[    4.746478] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    4.747043] ata3.00: ATAPI: Optiarc BD RW BD-5750H, 1.00, max UDMA/100
[    4.747947] ata3.00: configured for UDMA/100
[    5.079646] ata4: SATA link down (SStatus 0 SControl 300)
[   18.521629] systemd[1]: Listening on LVM2 metadata daemon socket.
[   20.037257] systemd[1]: Starting Rebuild Hardware Database...
[  892.427726] ata1.00: exception Emask 0x60 SAct 0x3c0000 SErr 0x800 action 0x6 frozen
[  892.431510] ata1.00: irq_stat 0x20000000, host bus error
[  892.435061] ata1: SError: { HostInt }
[  892.438572] ata1.00: failed command: WRITE FPDMA QUEUED
[  892.442085] ata1.00: cmd 61/00:90:08:17:0f/08:00:00:00:00/40 tag 18 ncq 1048576 out
[  892.448671] ata1.00: status: { DRDY }
[  892.451962] ata1.00: failed command: WRITE FPDMA QUEUED
[  892.455094] ata1.00: cmd 61/00:98:08:27:0f/06:00:00:00:00/40 tag 19 ncq 786432 out
[  892.461331] ata1.00: status: { DRDY }
[  892.464236] ata1.00: failed command: WRITE FPDMA QUEUED
[  892.467202] ata1.00: cmd 61/00:a0:08:2d:0f/06:00:00:00:00/40 tag 20 ncq 786432 out
[  892.473045] ata1.00: status: { DRDY }
[  892.475823] ata1.00: failed command: WRITE FPDMA QUEUED
[  892.478558] ata1.00: cmd 61/00:a8:08:1f:0f/08:00:00:00:00/40 tag 21 ncq 1048576 out
[  892.484001] ata1.00: status: { DRDY }
[  892.486509] ata1: hard resetting link
[  892.804156] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  892.805062] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[  892.805071] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[  892.805075] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[  892.806023] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[  892.806032] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[  892.806036] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[  892.806260] ata1.00: configured for UDMA/133
[  892.806317] ata1: EH complete

(this pattern continues for a while...)

smartctl -a /dev/sda: (/dev/sda is the SSD.)

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.3-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG SSD 830 Series
Serial Number:    S0XYNEAC778775
LU WWN Device Id: 5 002538 043584d30
Firmware Version: CXM03B1Q
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 2
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Aug 28 21:27:59 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  540) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (   9) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       4096
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       6955
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       2638
177 Wear_Leveling_Count     0x0013   097   097   000    Pre-fail  Always       -       91
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   099   099   010    Pre-fail  Always       -       2
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   099   099   010    Old_age   Always       -       1
183 Runtime_Bad_Block       0x0013   099   099   010    Pre-fail  Always       -       1
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   068   058   000    Old_age   Always       -       32
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   253   253   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       260
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       10909167982

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      6955         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

lspci:

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M] (rev ff)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 5289 (rev 01)
03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0a)
04:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
05:00.0 FireWire (IEEE 1394): JMicron Technology Corp. IEEE 1394 Host Controller (rev 30)

Constructive solutions are welcome. I'll try to work it out by myself in the meantime.

Thanks in advance and Yours sincerely
Sedrunum

EDIT 1: Three observations:

1.

mount -o remount,rw /mnt

doesn't help. The filesystem appears writable afterwards but checking the real content of the luks container after unmounting, closing of the luks container, reboot, reopen and remount reaveals a total mess. No files have been written since the errors appeared. Approximately Arch switches to read-only mode as being afraid of losing data on the filesystem due to corrupt i/o instructions. Maybe. not for sure.

2. Selecting another SATA port doesn't help. I have a laptop with two sata slots - one ssd, one hdd. I exchanged these two but nothing changed. Still the same errors.

3. Furthermore I tried writing these two bitstreams to files on the mentioned filesystem:

dd if=/dev/urandom
dd if=/dev/zero

I don't know if I can prove it for sure but it felt that with /dev/urandom errors appeared later than with /dev/zero which could mean that there is some kind of buffer which can take the laod to a certain point and outputs errors if the incoming load is too much and fast... I don't know.

Last edited by Sedrunum (2015-08-29 02:00:58)

Offline

#2 2015-08-29 02:53:29

Sedrunum
Member
Registered: 2015-08-28
Posts: 3

Re: host bus error during arch installation on encrypted ssd

PROBLEM SOLVED:

I found the wrongdoer. I have to say sorry for Samsung and the other developers of F2FS but I consider it UNSTBALE. If something like my problem happens, no best scoring in benchmarks compensates this behaviour. If anybody is in touch with the F2FS developers, pls tell them about this bug. I'm using ext4 now.

Bye and good night
Sedrunum

Offline

#3 2015-08-29 17:41:02

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: host bus error during arch installation on encrypted ssd

Sedrunum wrote:

I found the wrongdoer. I have to say sorry for Samsung and the other developers of F2FS but I consider it UNSTBALE. If something like my problem happens, no best scoring in benchmarks compensates this behaviour. If anybody is in touch with the F2FS developers, pls tell them about this bug.

Nothing stops you from submitting a bug in the kernel's bug tracker, in fact you should report the bug in the kernel bug tracker otherwise things might never get fixed.

On the other hand I've been using f2fs on a system installed on a usb drive and on a raspberry pi and so far I've had zero problems (me knocks on wood).


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB