You are not logged in.
Pages: 1
Hello,
I want to recover my raid5 array. /dev/md* is not showing up and mdadm --examine /dev/sdb (my array was with drive d,e,f,g) shows
/dev/sdb:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
Investigating further, i found out that smartctl -a /dev/sd* prints this
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.7-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD20EFRX-68EUZN0
Serial Number: WD-WMC4M2192751
LU WWN Device Id: 5 0014ee 6043e6510
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Sep 1 20:55:39 2014 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 195 174 021 Pre-fail Always - 3250
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 54
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3164
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 54
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 44
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 23
194 Temperature_Celsius 0x0022 113 103 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
ATA Error Count: 198 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 198 occurred at disk power-on lifetime: 3164 hours (131 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 02 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 10 02 00 00 00 a0 08 00:02:26.516 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.515 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:02:26.515 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:02:26.514 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.513 IDENTIFY DEVICE
Error 197 occurred at disk power-on lifetime: 3164 hours (131 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:02:26.515 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:02:26.514 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.513 IDENTIFY DEVICE
ef 10 02 00 00 00 a0 08 00:02:26.493 SET FEATURES [Enable SATA feature]
Error 196 occurred at disk power-on lifetime: 3164 hours (131 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 02 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 10 02 00 00 00 a0 08 00:02:26.514 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.513 IDENTIFY DEVICE
ef 10 02 00 00 00 a0 08 00:02:26.493 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.493 IDENTIFY DEVICE
Error 195 occurred at disk power-on lifetime: 3164 hours (131 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 02 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 10 02 00 00 00 a0 08 00:02:26.493 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.493 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:02:26.492 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:02:26.492 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.491 IDENTIFY DEVICE
Error 194 occurred at disk power-on lifetime: 3164 hours (131 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:02:26.492 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:02:26.492 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:02:26.491 IDENTIFY DEVICE
ef 10 02 00 00 00 a0 08 00:02:26.471 SET FEATURES [Enable SATA feature]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed without error 00% 19 -
# 2 Short offline Interrupted (host reset) 10% 6 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Can "device fault" mean my disk is dead? Or do I have any chances of restoring anything?
Can smartctl test feature recover errors?
partprobe is giving me
Error: Invalid argument during seek for read on /dev/sdb
gdisk sais my gpt partition is damaged...
Not sure where to go from there, can anyone help please?
Last edited by fireout (2014-09-01 23:47:55)
Offline
Please do not bump your thread when no-one else has replied -- use the edit feature to add information to your first post.
You can not recover a RAID5 with 1 disk. I'm not sure exactly what those errors in your SMART test are indicating, but they're not normal IME.
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
All 4 drives shows up in /dev (sd[bdef]). It's just the superblock (i guess) that cannot be read as my /dev/md is not listed.
Sorry about the bump, i updated my first post; you can remove subsequent post if needed.
Offline
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
Again, all the drives seems to be available, I'm guessing one or more of them might be corrupted or something (according to the smartctl output)
I did look into those, keep backing off the "recreating the array" step; I would like to get as much details as I can before I write anything to the disks, maybe something off the long smartctl test, but still 4 and something hours to go for that to complete...
Offline
keep backing off the "recreating the array" step
Recreating is not the same as Reassembling. What is this array from? What actually happened to lead you to need to recover the array? Is there any information in /proc/mdstat?
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
/proc/mdstat is not even there, although, I booted of the ArchISO, not sure it sould show up with the install iso?... my /var was mounted on the array, could not boot otherwise; I changed my fstab but didn't reboot yet.
Power failure is the cause of all this, the power cord wasn't pushed all the way and the cord was knocked out.
Offline
Any other input before I resort to data retrieval services (and later, plan a backup strategy)?
looking at the drives further, I found out that one drive is indeed defect/not detected.
no partitions are present on any of the drives (/dev/sdb shows up but not /dev/sdb1)
This array was created by the intel raid chipset, can I recreate it without loosing data even if one drive is missing?
Thanks
Offline
This array was created by the intel raid chipset, can I recreate it without loosing data even if one drive is missing?
Post the output of:
fdisk -l /dev/sd[bdef]
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
my partition was a gpt partition. fdisk shows
fdisk -l /dev/sdb
Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000
Partition 1 does not start on physical sector boundary.
Device Boot Start End Blocks Id System
/dev/sdb1 1 4294967295 2147483647+ ee GPT
and for the two other drives
Disk /dev/sdd: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
gdisk show
gdisk -l /dev/sdb
GPT fdisk (gdisk) version 0.8.10
Warning! Disk size is smaller than the main header indicates! Loading
secondary header from the last sector of the disk! You should use 'v' to
verify disk integrity, and perhaps options on the experts' menu to repair
the disk.
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: damaged
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sdb: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): F486503E-7A80-46A7-A351-9BF0686221E9
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 11720765406
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 2048 11720765406 5.5 TiB FD00 Linux RAID
Offline
Yeah OK, I have no idea what's going on there. Sorry.
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
Can you show the output for gdisk -l on each disk, not just sdb, and also the output of:
mdadm --misc --examine /dev/sd[bdef]
and also:
for i in b d e f; do smartctl -a /dev/sd$i; done
Hopefully it's just one disk, but I have seen disks update out of sync before due to a failed sata cable, I assume the same could happen from a sudden loss of power. If that is the case, it is possible to lower the event count on the higher drive(s) and force the array to re-assemble. But hopefully I can help from the output of those commands.
Offline
Pages: 1