You are not logged in.
Pages: 1
I have a old machine with a few years old western digital wd2500aajb hdd. It has a single partrition with /home on it. After a blackout I was dropped to maintanace started running fsck on the disk, then there was an other blackout while fsck was running. When rebooting systemd stopped with A start job is running for dev-disk-by... then it droped me to a maintainance shell. There is nothing in /dev for the hdd. When looking at the output of dmesg I see that there is a model number mismatch. Here is the interesting part of dmesg:
[ 0.887231] ata_piix 0000:00:1f.1: version 2.13
[ 0.887244] ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
[ 0.893415] scsi host0: ata_piix
[ 0.899380] scsi host1: ata_piix
[ 0.899521] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x14c0 irq 14
[ 0.899526] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x14c8 irq 15
[ 0.899910] ata_piix 0000:00:1f.2: MAP [ P0 -- P1 -- ]
[ 0.971586] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 1.050982] scsi host2: ata_piix
[ 1.051321] scsi host3: ata_piix
[ 1.051431] ata3: SATA max UDMA/133 cmd 0x14f8 ctl 0x1810 bmdma 0x14d0 irq 18
[ 1.051435] ata4: SATA max UDMA/133 cmd 0x1800 ctl 0x1814 bmdma 0x14d8 irq 18
[ 1.067451] ata2.00: ATA-8: WDC WD2500AAJB-00WGA0, 00.02C01, max UDMA/100
[ 1.067458] ata2.00: 488397168 sectors, multi 16: LBA48
[ 1.080345] ata2.00: configured for UDMA/100
[ 1.110285] ata1.00: ATA-6: ST340014A, 3.06, max UDMA/100
[ 1.110289] ata1.00: 78165360 sectors, multi 16: LBA
[ 1.123496] ata1.00: configured for UDMA/100
[ 1.123667] scsi 0:0:0:0: Direct-Access ATA ST340014A 3.06 PQ: 0 ANSI: 5
[ 1.124491] scsi 1:0:0:0: Direct-Access ATA WDC WD2500AAJB-0 2C01 PQ: 0 ANSI: 5
[ 1.190041] usb usb1-port7: over-current condition
[ 1.245244] sd 0:0:0:0: [sda] 78165360 512-byte logical blocks: (40.0 GB/37.2 GiB)
[ 1.245332] sd 0:0:0:0: [sda] Write Protect is off
[ 1.245337] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.245375] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1.245982] sd 1:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[ 1.246065] sd 1:0:0:0: [sdb] Write Protect is off
[ 1.246070] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 1.246106] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1.247489] sda: sda1 sda2
[ 1.248052] sd 0:0:0:0: [sda] Attached SCSI disk
[ 1.263381] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 1.263444] ata2.00: BMDMA stat 0x24
[ 1.263494] ata2.00: failed command: READ DMA
[ 1.263548] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
res 51/84:00:07:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
[ 1.263641] ata2.00: status: { DRDY ERR }
[ 1.263689] ata2.00: error: { ICRC ABRT }
[ 1.263784] ata2: soft resetting link
[ 1.300018] tsc: Refined TSC clocksource calibration: 2659.999 MHz
[ 1.300024] clocksource tsc: mask: 0xffffffffffffffff max_cycles: 0x2657a34898c, max_idle_ns: 440795323804 ns
[ 1.437669] ata2.00: configured for UDMA/100
[ 1.437737] ata2: EH complete
[ 1.453367] ata2.00: limiting speed to UDMA/66:PIO4
[ 1.453373] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 1.453429] ata2.00: BMDMA stat 0x24
[ 1.453479] ata2.00: failed command: READ DMA
[ 1.453531] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
res 51/84:00:07:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
[ 1.453625] ata2.00: status: { DRDY ERR }
[ 1.453674] ata2.00: error: { ICRC ABRT }
[ 1.453749] ata2: soft resetting link
[ 1.636822] ata2.00: model number mismatch 'WDC WD2500AAJB-00WGA0' != 'WDC WD2500AAJB-00WGQ0'
[ 1.636826] ata2.00: revalidation failed (errno=-19)
[ 1.636878] ata2.00: limiting speed to UDMA/66:PIO3
[ 2.300066] Switched to clocksource tsc
[ 6.606710] ata2: soft resetting link
[ 6.786825] ata2.00: model number mismatch 'WDC WD2500AAJB-00WGA0' != 'WDC WD2500AAJB-00WGQ0'
[ 6.786829] ata2.00: revalidation failed (errno=-19)
[ 6.786884] ata2.00: disabled
[ 11.760035] ata2: soft resetting link
[ 11.929610] ata2.00: ATA-8: WDC WD2500AAJB-00WGA0, 00.02C01, max UDMA/100
[ 11.929615] ata2.00: 488397168 sectors, multi 16: LBA48
[ 11.940150] ata2.00: model number mismatch 'WDC WD2500AAJB-00WGA0' != 'WDC WD2500AAJB-00WGQ0'
[ 11.940154] ata2.00: revalidation failed (errno=-19)
[ 11.940207] ata2.00: limiting speed to UDMA/100:PIO3
[ 16.913367] ata2: soft resetting link
[ 17.093491] ata2.00: model number mismatch 'WDC WD2500AAJB-00WGA0' != 'WDC WD2500AAJB-00WGQ0'
[ 17.093495] ata2.00: revalidation failed (errno=-19)
[ 17.093548] ata2.00: disabled
[ 17.093575] sd 1:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[ 17.093581] sd 1:0:0:0: [sdb] tag#0 Sense Key : 0xb [current] [descriptor]
[ 17.093585] sd 1:0:0:0: [sdb] tag#0 ASC=0x47 ASCQ=0x0
[ 17.093590] sd 1:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[ 17.093593] blk_update_request: I/O error, dev sdb, sector 0
[ 17.093647] Buffer I/O error on dev sdb, logical block 0, async page read
[ 17.093748] sd 1:0:0:0: rejecting I/O to offline device
[ 17.093802] sd 1:0:0:0: killing request
[ 17.093809] sd 1:0:0:0: rejecting I/O to offline device
[ 17.093867] ldm_validate_partition_table(): Disk read failed.
[ 17.093877] sd 1:0:0:0: rejecting I/O to offline device
[ 17.093936] sd 1:0:0:0: rejecting I/O to offline device
[ 17.093996] sd 1:0:0:0: rejecting I/O to offline device
[ 17.094049] sdb: unable to read partition table
[ 17.094211] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 17.095402] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095476] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095533] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095589] sd 1:0:0:0: [sdb] Read Capacity(16) failed: Result: hostbyte=0x01 driverbyte=0x00
[ 17.095594] sd 1:0:0:0: [sdb] Sense not available.
[ 17.095600] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095656] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095711] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095766] sd 1:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=0x01 driverbyte=0x00
[ 17.095770] sd 1:0:0:0: [sdb] Sense not available.
[ 17.095776] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095832] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095890] sd 1:0:0:0: rejecting I/O to offline device
[ 17.095947] sd 1:0:0:0: rejecting I/O to offline device
[ 17.096752] ata2: EH complete
[ 17.096786] ata2.00: detaching (SCSI 1:0:0:0)
[ 17.097092] sd 1:0:0:0: [sdb] Stopping disk
[ 17.097134] sd 1:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
[ 18.157496] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 18.363768] random: nonblocking pool is initialized
[ 19.227857] ip_tables: (C) 2000-2006 Netfilter Core Teamthe whole is here. I only found a few thing about similar issues, like this and this. My situation is similar to the second link in that 'WDC WD2500AAJB-00WGA0' and 'WDC WD2500AAJB-00WGQ0' differ in 1bit, literaly one bit flipped. It is mentioned in the other posts that it is/might be a kernel bug, so I tryed booting with the LTS kernel, with the same results. Since the hdd does not show up in /dev I cant try the other stuff recomended like smartctl. If its any help here are the outputs for dmidecode and lshw. Any help is appreciated.
Offline
So you are having bit flip errors on the SATA link. Try different cable, different SATA ports, make sure that the cable isn't bent tightly, try to move it away from other signals. If possible, test the disk in other machine.
Offline
Unfortunatly I dont have the spare cabels or an other machine to try right now, but after multiple reboots the names are the same that suggest its not some random cabel error.
Offline
Then at least try swapping these disks between ports.
If you performed update before it stopped working, try reverting to some older kernel from /var/cache/pacman/pkg.
You can also attempt running smartctl using SCSI generic interface:
modprobe sg
smartctl -a /dev/sgN # check dmesg|tail to find N, probably it's 1Last edited by mich41 (2015-09-26 18:21:37)
Offline
Swapping cabels did not change anything. Changing to LTS kernel or downgrading to earlier normal kernels (3 or 4) did not change anything. The output of smartctl:
smartctl 6.4 2015-06-04 r4109 [i686-linux-4.1.6-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus
Device Model: ST340014A
Serial Number: 3JX4R80Y
Firmware Version: 3.06
User Capacity: 40,020,664,320 bytes [40.0 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-6 T13/1410D revision 2
Local Time is: Sat Sep 26 20:58:32 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 31) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 060 052 006 Pre-fail Always - 72701212
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 614763870
9 Power_On_Hours 0x0032 064 064 000 Old_age Always - 31990
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1986
194 Temperature_Celsius 0x0022 033 054 000 Old_age Always - 33
195 Hardware_ECC_Recovered 0x001a 060 052 000 Old_age Always - 72701212
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.it seems ok to me.
Offline
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus
Device Model: ST340014AReally? ![]()
Offline
How old is the power supply in that machine? Do you have any mods/addon lights/water pumps inside the case?
This [1] seems to point to either a bad cable or a bad power supply as the most common problems. Try swapping cables if you have more that one ide cable, or swap the drives' configuration between primary/secondary. If you can try to test the WD drive in another machine.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Oh, sorry it seems I was very tired, the hdd in the smartctl output is the other one mounted at /, old but working properly. The WD drive doesnt show up with sg either. The power supply is 8-10 years old as the whole machine except the WD drive wich is just 4-5 years old. I dug up an other old pc, switched around the cabels and drives, but the error remained in every setup.
Offline
Well, failure to work with know-good kernels and in another machine with another PSU suggests malfunction of disk electronics.
Depending on the exact nature of the problem, you may be able to rescue your data by reducing communication speed. See libata.force in kernel-parameters.txt. I'd start with libata.force=udma/33, maybe udma/16 if this thing really exists. Or the PIO modes, why not.
If this fails, you will need to replace disk's PCB or send it to some data recovery company.
Offline
Does the WD disk show up during the bios post? It should show up in the first screen when drives are detected. It might be as mich41 says and the disk just gave up the ghost, I have one 30GB WD disk that died while unplugged, it spins but it's as if it isn't connected to the ide cable, the same might have happened to yours during the second power failure.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Since the drive had no crucial information, I given up on it. Thanks for all the help.
Offline
Pages: 1