I'm declaring this post not solved, but closed, since I'm changing the drive.
Thanks everyone!
]]>Edit: the only thing that you can try with this drive are the Seatools (google it). Try to perform a low-level-format with it. If that doesn't helps: see above.
]]>I've run this steps so far (from an arch live)
# parted
mklabel msdos
# cfdisk /dev/sda
new linux partition, whole drive (don't care too much for a separate /home or /boot)
# mkfs.ext4 /dev/sda1
# mkfs.ext4 -cc /dev/sda1
And that is going to take long, will update when finish. Thanks for all the answer, i didn't have even a little hope being this a very specific hardware issue, and being an old ide disk, but once again, the arch community is suprising me!
]]>This drive has suffered some media damage - 1277 bad sectors have already been reallocated to spare area and there are another 675 unreadable sectors waiting for a write which will move them there as well.
I'd start with backing up all important data from this drive. Then dd it to force reallocation of all pending sectors and check if there are no more cropping out and if read/write speed comes back to something reasonable.
If it will still be dog slow or "Reallocated_Sector_Ct" will start growing to thousands or millions, there's probably no hope for it. Otherwise, run mkfs -cc (takes some time) and if this also doesn't increase "Reallocated_Sector_Ct" there's a chance that these bad blocks were a single event and you can keep using this HDD for less critical stuff (well backed up data, OS files, read-once warez, etc.).
]]>[mz@mother ~]$ sudo smartctl -a /dev/sda
[sudo] password for mz:
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.6-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus
Device Model: ST380011A
Serial Number: 5JVFB46V
Firmware Version: 8.01
User Capacity: 80.026.361.856 bytes [80,0 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-6 T13/1410D revision 2
Local Time is: Mon Apr 15 21:00:37 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 58) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 053 045 006 Pre-fail Always - 227358446
3 Spin_Up_Time 0x0003 099 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12
5 Reallocated_Sector_Ct 0x0033 069 069 036 Pre-fail Always - 1277
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 4561475723
9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6985
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 3068
194 Temperature_Celsius 0x0022 041 046 000 Old_age Always - 41
195 Hardware_ECC_Recovered 0x001a 053 045 000 Old_age Always - 227358446
197 Current_Pending_Sector 0x0012 099 099 000 Old_age Always - 675
198 Offline_Uncorrectable 0x0010 099 099 000 Old_age Offline - 675
199 UDMA_CRC_Error_Count 0x003e 200 188 000 Old_age Always - 194
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 091 244 000 Old_age Always - 9
SMART Error Log Version: 1
ATA Error Count: 905 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 905 occurred at disk power-on lifetime: 6982 hours (290 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 00 cb 42 e0 Error: UNC at LBA = 0x0042cb00 = 4377344
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 ff ca 42 e0 00 12:05:04.994 READ MULTIPLE
c4 00 08 bf bc 42 e0 00 12:05:04.993 READ MULTIPLE
c4 00 08 0f b8 42 e0 00 12:05:04.980 READ MULTIPLE
c5 00 08 1f 07 00 e2 00 12:05:02.611 WRITE MULTIPLE
c5 00 70 f7 68 01 e0 00 12:04:58.660 WRITE MULTIPLE
Error 904 occurred at disk power-on lifetime: 6978 hours (290 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 58 28 5e e6 Error: UNC at LBA = 0x065e2858 = 106834008
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 57 28 5e e6 00 07:43:30.083 READ MULTIPLE
c4 00 08 4f 28 5e e6 00 07:43:30.082 READ MULTIPLE
c4 00 08 47 28 5e e6 00 07:43:30.065 READ MULTIPLE
c4 00 08 3f 28 5e e6 00 07:43:30.057 READ MULTIPLE
c4 00 08 37 28 5e e6 00 07:43:30.048 READ MULTIPLE
Error 903 occurred at disk power-on lifetime: 6977 hours (290 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 58 e6 5b e6 Error: UNC at LBA = 0x065be658 = 106686040
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 57 e6 5b e6 00 07:34:36.236 READ MULTIPLE
c4 00 08 4f e6 5b e6 00 07:34:36.177 READ MULTIPLE
c4 00 08 47 e6 5b e6 00 07:34:36.143 READ MULTIPLE
c4 00 08 3f e6 5b e6 00 07:34:36.118 READ MULTIPLE
c4 00 08 37 e6 5b e6 00 07:34:36.117 READ MULTIPLE
Error 902 occurred at disk power-on lifetime: 6976 hours (290 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 a0 7b bd e5 Error: UNC at LBA = 0x05bd7ba0 = 96304032
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 9f 7b bd e5 00 05:49:36.263 READ MULTIPLE
c4 00 08 97 7b bd e5 00 05:49:36.255 READ MULTIPLE
c4 00 08 8f 7b bd e5 00 05:49:36.008 READ MULTIPLE
c4 00 08 87 7b bd e5 00 05:49:35.585 READ MULTIPLE
c4 00 08 7f 7b bd e5 00 05:49:35.042 READ MULTIPLE
Error 901 occurred at disk power-on lifetime: 6976 hours (290 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 07 f8 02 b7 e5 Error: UNC at LBA = 0x05b702f8 = 95879928
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 f7 02 b7 e5 00 05:36:50.339 READ MULTIPLE
c4 00 08 ef 02 b7 e5 00 05:36:50.305 READ MULTIPLE
c4 00 08 e7 02 b7 e5 00 05:36:50.178 READ MULTIPLE
c4 00 08 df 02 b7 e5 00 05:36:50.119 READ MULTIPLE
c4 00 08 d7 02 b7 e5 00 05:36:49.254 READ MULTIPLE
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I'm running # smartctl --test=long /dev/sda, it sais it will last about an hour.
Does anybody know where to check the log of the test? (since i'm back on command line, smartctl sais the test is running in background)
]]># smartctl -a /dev/<drive>
Self tests can be performed by
# smartctl -t <test> /dev/<drive>
This tool is a part of smartmontools package.
]]>Have you tested for smart-errors? I would recommend you to use a parted-magic live-cd for this.
I tested the disk with fsck from a live arch, and it found thousands of block errors. The fsck was running for almost the whole weekend. After it finished, I ran the fsck 2 o 3 more times, just to be sure, and it said the file system was clean.
Any other tools for an ext4 partition (or to check for bad blocks or anything) appart of fsck.ext4?
]]># blockdev --setra 16384 /dev/sda
I think I better get rid of that old ide disk and get a more common sata disk, but still I don't understand why this is happening (maybe the disk is broken?)
]]>$ cat /proc/scsi/scsi
[mz@mother ~]$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST380011A Rev: 8.01
Type: Direct-Access ANSI SCSI revision: 05
$ dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc
[mz@mother ~]$ dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc
1024+0 registros leĆdos
1024+0 registros escritos
1073741824 bytes (1,1 GB) copiados, 20,9698 s, 51,2 MB/s
# hdparm -Tt /dev/sda
[mz@mother ~]$ sudo hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 2552 MB in 2.00 seconds = 1276.16 MB/sec
Timing buffered disk reads: 12 MB in 4.18 seconds = 2.87 MB/sec
Those 2.87 MB/sec are around 100MB/sec on another computer with a sata disk, I'm sure they should be higher, maybe not 100, but at least 40MB/sec for my IDE drive (just guessing).
# hdparm -i /dev/sda
[mz@mother ~]$ sudo hdparm -i /dev/sda
/dev/sda:
Model=ST380011A, FwRev=8.01, SerialNo=5JVFB46V
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2: ATA/ATAPI-1,2,3,4,5,6
* signifies the current active mode
This makes my arch experience -actually, my mother's- extremly frustrating.
Any tips? Thanks!
EDIT: forgot to post the kernel, using "linux" from core.
$ uname -a
%
[mz@mother ~]$ uname -a
Linux mother 3.8.6-1-ARCH #1 SMP PREEMPT Sat Apr 6 07:27:01 CEST 2013 x86_64 GNU/Linux