You are not logged in.
This isn't too urgent since I'm not having this problem anymore, but I would like to know what caused it in the first place.
A few months ago, I think it was in May, when I was booting into KDE, the desktop wouldnt load. After logging in it goes to a black screen with a cursor. I check my tty consoles and they are all spammed with DRDY and I/O errors.
Today my log files don't go back to that time, but I apparantly wrote them down:
[timestamp] ata1.00: irq_stat 0x40000001
[timestamp] ata1: SError: { CommWake }
[timestamp] ata1.00: failed command: READ FPDMA QUEUED
[timestamp] ata1.00: cmd 60/08:00:79:c5:e9/00:00:38:00:00/40 tag 0 ncq 4096 in
[timestamp] res 41/40:08:7c:c5:e9/00:00:38:00:00/60 Emast 0x409 (media error) <F>
[timestamp] ata1.00: status {DRDY ERR }
[timestamp] error: { UNC }
[timestamp] end_request: I/O error, dev sda, sector 9540445540
It repeats this over and over again on all 7 consoles. I believe the hex numbers and sector # change slightly each time it repeats.
I restart the computer and it does the same thing. Then one day it booted into the desktop after sitting at a black screen with a cursor for 10 minutes.
When I saw Media and I/O error, I immediately thought it was my hard drive failing (which sucks because I had this computer for less than a year). But the weird thing was that once it finally boots into a desktop enviroment, everything works fine. I can access any file without any error, I can boot into my windows partition perfectly, and everything runs as fast as it did before. I ran fschk and chksdk several times, I passed with 0 bad sectors. SMART tests come out perfect.
The strangest thing about this is that the errors only come about when I boot into KDE. If I boot into any other enviroment I do not get these errors. Completely reinstalling KDE did not fix it. I don't remember updating or changing anything before these errors started appearing. I did do a full system update once I was actually able to get into my KDE session, but it didnt do anything. I eventually solved it by using another desktop enviroment, which was xfice at the time but now its cinnamon. I never got those errors ever since.
I posted something similar to this on KDE's forum, and no one could figure it out. One of the admins told me its very unusual that something like this would happen only with KDE, and the READ FPDMA/DRDY ERRs "are sourced from the kernel - and are disk/kernel related according to https://bugs.launchpad.net/ubuntu/+sour ... bug/550559."
As of now I can install KDE perfectly without any sort of issue, but I know several kernel updates occured since then. So could this be a issue with the kernel? Or is my hdd really starting to fail? I backed up the computer already, but I record music constantly and don't "aggresively" run backups (i.e not every day, but every month or so). Should I start doing so?
Thanks
Offline
This isn't too urgent since I'm not having this problem anymore, but I would like to know what caused it in the first place.
A few months ago, I think it was in May, when I was booting into KDE, the desktop wouldnt load. After logging in it goes to a black screen with a cursor. I check my tty consoles and they are all spammed with DRDY and I/O errors.
Today my log files don't go back to that time, but I apparantly wrote them down:[timestamp] ata1.00: irq_stat 0x40000001
[timestamp] ata1: SError: { CommWake }
[timestamp] ata1.00: failed command: READ FPDMA QUEUED
[timestamp] ata1.00: cmd 60/08:00:79:c5:e9/00:00:38:00:00/40 tag 0 ncq 4096 in
[timestamp] res 41/40:08:7c:c5:e9/00:00:38:00:00/60 Emast 0x409 (media error) <F>
[timestamp] ata1.00: status {DRDY ERR }
[timestamp] error: { UNC }
[timestamp] end_request: I/O error, dev sda, sector 9540445540It repeats this over and over again on all 7 consoles. I believe the hex numbers and sector # change slightly each time it repeats.
I restart the computer and it does the same thing. Then one day it booted into the desktop after sitting at a black screen with a cursor for 10 minutes.When I saw Media and I/O error, I immediately thought it was my hard drive failing (which sucks because I had this computer for less than a year). But the weird thing was that once it finally boots into a desktop enviroment, everything works fine. I can access any file without any error, I can boot into my windows partition perfectly, and everything runs as fast as it did before. I ran fschk and chksdk several times, I passed with 0 bad sectors. SMART tests come out perfect.
The strangest thing about this is that the errors only come about when I boot into KDE. If I boot into any other enviroment I do not get these errors. Completely reinstalling KDE did not fix it. I don't remember updating or changing anything before these errors started appearing. I did do a full system update once I was actually able to get into my KDE session, but it didnt do anything. I eventually solved it by using another desktop enviroment, which was xfice at the time but now its cinnamon. I never got those errors ever since.
I posted something similar to this on KDE's forum, and no one could figure it out. One of the admins told me its very unusual that something like this would happen only with KDE, and the READ FPDMA/DRDY ERRs "are sourced from the kernel - and are disk/kernel related according to https://bugs.launchpad.net/ubuntu/+sour ... bug/550559."
As of now I can install KDE perfectly without any sort of issue, but I know several kernel updates occured since then. So could this be a issue with the kernel? Or is my hdd really starting to fail? I backed up the computer already, but I record music constantly and don't "aggresively" run backups (i.e not every day, but every month or so). Should I start doing so?Thanks
I'm afraid to say that I think it's HDD. I've had a similar problem. Or wait, I dunno... now that I've actually read your post I'm not so sure. That's a good thing, though! It may mean that there's still hope for my poor Compaq laptop. I only encountered the error(s) one time after my re-installation of Arch Linux, which I did just to make sure it wasn't my installation. (And 'cause I wanted to use the new Grub and didn't feel like upgrading.)
https://bbs.archlinux.org/viewtopic.php?id=147189
Yeah, I would backup religiously.... well sort of. Just do a backup everytime you do something important to you, like record a song. Do you already have a backup script or like setup set up?
Personally, I'm due for a new backup within a week or two simply because it's been about a month or so since my last one. So I guess as long as you do a backup once a month, you should.... be fine. I'm not saying that doing more regular backups would necessarily be a bad thing, although it could prove to be quite a pain.
See if you find any of these useful.
http://www.linuxquestions.org/questions … ing-838499
http://mikeys-ranting.blogspot.com/2010 … olved.html
https://answers.launchpad.net/ubuntu/+question/122588
Here's a bit from a comment on a post @ http://superuser.com/questions/121391/s … -icrc-abrt
DRDY ERR messages actually seems to be reported as a kernel bug in a lot of systems which seems to relate a lot with Ubuntu and to a smaller extent Debian. I am investigating this because this is something that has started happening with me recently. I would recommend the following (You will require a bootable CD for some of this and you may need it due to disk issues for all of this. The Ubuntu desktop install CD works well without making you install anything):
Put "options libata noacpi=1" in /etc/modprobe.d/options.conf
Run "e2fsck -f -c -v /dev/sda1" but replace /dev/sda1 with the partitions causing the error. As far as I know, e2fsck needs a partition with the file system so this probably won't work on the whole disk. If it does work on the whole disk, you still need to run it on the partitions anyways. You need a bootable CD for this.
Edit the file /boot/grub/menu.lst and on the line that starts with "# kopt" add "noapic" to the end of the line. The # at the start is important and does not act like a comment. Do not remove the #.
This does not affect the disk but if you change "splash" to "nosplash" and remove the word "quiet" from /boot/grub/menu.lst on the line that starts with "# defoptions" Then it will not have an image when you boot ubuntu but instead will give you more verbose output.
On Ubuntu, after you change anything inside /boot/grub/menu.lst you must run /usr/sbin/update-grub
Personally, I'm too new here and really don't know enough about the cryptic codes of the SATA stuff so here's what I found as a comment to one of the above links.
The error is related to SATA Native Command Queueing (NCQ). FPDMA = First Party DMA. This is a newish performance feature on SATA drives.
I'd recommend a couple checks:
1) Update to the latest driver if you haven't already (seems several ubuntu users have also seen this - https://bugs.launchpad.net/ubuntu/+bug/550559)
2) If you're using new drives with a motherboard that's an older generation, there might be a SATA spec compatibility issue. You can sometimes jumper the SATA drives to legacy mode (sometimes called SATA2 or 1.5). Also, you may be able to set your hardware (in BIOS) to a legacy mode. This should fix the above error, but might impose a perf penalty. There may also be a BIOS update to better support SATA.
3) Check if there's any driver options to disable NCQ support. Though the queueing provides a perf boost, its not the end of the world to go without it.
Last edited by lspci (2012-08-27 07:24:04)
Offline
Some of these issues are related to interference of the Spread Spectrum Clocking (SSC), SSC is a way of eliminating interference. Now some BIOSes let you tweak this setting and some harddisks like WD have a jumper setting to disable/enable the spread spectrum frequency. But since it is more of a RF issue rearranging your sata cables might help or replace them with a better quality cable could do the trick to reduce interference. If you have overclocked your PC, try to re-tweak it to a more sane level.
Hope this helps
Offline
@lspci
Those links you sent me were VERY helpful! I used the online SMART test using SpeedFan on windows; I never thought of using smartctl --all /dev/sda.
Heres what it gave me:
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.4.9-1-ARCH] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 2.5" HDD MK..76GSX
Device Model: TOSHIBA MK5076GSX
Serial Number: 91OHT43CT
LU WWN Device Id: 5 000039 3818077b6
Firmware Version: GS002D
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Aug 28 00:22:51 2012 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 163) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 128
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 1756
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
9 Power_On_Minutes 0x0032 092 092 000 Old_age Always - 3277h+14m
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1072
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 2191
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 133
193 Load_Cycle_Count 0x0032 094 094 000 Old_age Always - 61036
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 45 (Min/Max 15/53)
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 25758112
200 Multi_Zone_Error_Rate 0x0032 100 100 000 Old_age Always - 56684070
240 Head_Flying_Hours 0x0032 094 094 000 Old_age Always - 153507
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 5471510246
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 7806093714
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 558 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 558 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 79 c5 e9 40 00 00:06:08.673 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 00:06:08.673 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:06:08.673 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:06:08.672 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:06:08.672 SET FEATURES [Set transfer mode]
Error 557 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 79 c5 e9 40 00 00:06:04.673 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 00:06:04.673 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:06:04.673 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:06:04.672 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:06:04.672 SET FEATURES [Set transfer mode]
Error 556 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 79 c5 e9 40 00 00:06:00.673 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 00:06:00.673 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:06:00.673 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:06:00.672 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:06:00.672 SET FEATURES [Set transfer mode]
Error 555 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 79 c5 e9 40 00 00:05:56.673 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 00:05:56.673 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:05:56.672 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:05:56.672 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:05:56.672 SET FEATURES [Set transfer mode]
Error 554 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 02 7c c5 e9 68 Error: WP at LBA = 0x08e9c57c = 149538172
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 18 d1 d8 e7 40 00 00:05:52.624 WRITE FPDMA QUEUED
61 08 10 89 d8 e7 40 00 00:05:52.624 WRITE FPDMA QUEUED
61 08 40 51 d8 e7 40 00 00:05:52.624 WRITE FPDMA QUEUED
61 10 38 01 d8 e7 40 00 00:05:52.623 WRITE FPDMA QUEUED
61 08 30 e1 d7 e7 40 00 00:05:52.623 WRITE FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2320 -
# 2 Extended offline Aborted by host 90% 2075 -
# 3 Short offline Completed without error 00% 1900 -
# 4 Short offline Completed without error 00% 1574 -
# 5 Short offline Completed without error 00% 1415 -
# 6 Short offline Completed without error 00% 1081 -
# 7 Short offline Completed without error 00% 637 -
# 8 Short offline Completed without error 00% 248 -
# 9 Short offline Completed without error 00% 0 -
#10 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The READ FPDMA QUEUED errors come up in SMART's log, so those problems I had have to be HDD related, not kernel. Other than that I really don't know what I'm looking at here. I don't like the numbers for UDMA_CRC_Error Count or Multi Zone Error rate, but I don't know what those mean. Until someone explains to me what I'm dealing with here, I'm going to assume it is an HDD problem and start furiously backing stuff up.
Some of these issues are related to interference of the Spread Spectrum Clocking (SSC), SSC is a way of eliminating interference. Now some BIOSes let you tweak this setting and some harddisks like WD have a jumper setting to disable/enable the spread spectrum frequency. But since it is more of a RF issue rearranging your sata cables might help or replace them with a better quality cable could do the trick to reduce interference. If you have overclocked your PC, try to re-tweak it to a more sane level.
I forgot to mention that my computer is a Dell Inspiron n5110. Its a piece of garbage, its the only computer I am aware off that has programs that PREVENT you from connecting to the internet when you don't buy their stupid extended warranty. I'm saving up for a thinkpad t420 but thats another story
Woah. Radio signals interfering with data passing through the SATA cables? I've had poorly sheilded guitar cables that pick up odd AM stations, but SATA cables? Thats bizzare. I don't own a PC yet, but I'm going to build one for my brothers once the parts come in, and I'll definitely make a note of that. I didn't even know you could overclock a CPU to the point where it GENERATES RF signals powerful enough to interfere with the HDD. Thats really interesting.
Another note I should mention is that the motherboard on my dell died maybe a month or 2 after the I/O errors. Gotta love dell. Every single dell computer I owned died of mobo failure without any warning. Even their printers. Good god.
Anyway, could the mobo have been a factor? I know the tech that replaced the mobo told me that the LCD cable wasn't seated correctly, so could the SATA cables have been bad as well? I really doubt it though, but who knows
---
Edited for embarrassing spelling mistakes
Last edited by 68flag (2012-08-28 05:31:15)
Offline
@lspci
Those links you sent me were VERY helpful! I used the online SMART test using SpeedFan on windows; I never thought of using smartctl --all /dev/sda.Heres what it gave me:
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.4.9-1-ARCH] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Toshiba 2.5" HDD MK..76GSX Device Model: TOSHIBA MK5076GSX Serial Number: 91OHT43CT LU WWN Device Id: 5 000039 3818077b6 Firmware Version: GS002D User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue Aug 28 00:22:51 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 120) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 163) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 128 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 1756 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0 9 Power_On_Minutes 0x0032 092 092 000 Old_age Always - 3277h+14m 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1072 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 2191 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 133 193 Load_Cycle_Count 0x0032 094 094 000 Old_age Always - 61036 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 45 (Min/Max 15/53) 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 25758112 200 Multi_Zone_Error_Rate 0x0032 100 100 000 Old_age Always - 56684070 240 Head_Flying_Hours 0x0032 094 094 000 Old_age Always - 153507 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 5471510246 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 7806093714 254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 558 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 558 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 79 c5 e9 40 00 00:06:08.673 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 00:06:08.673 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 00:06:08.673 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 00:06:08.672 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:06:08.672 SET FEATURES [Set transfer mode] Error 557 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 79 c5 e9 40 00 00:06:04.673 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 00:06:04.673 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 00:06:04.673 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 00:06:04.672 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:06:04.672 SET FEATURES [Set transfer mode] Error 556 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 79 c5 e9 40 00 00:06:00.673 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 00:06:00.673 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 00:06:00.673 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 00:06:00.672 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:06:00.672 SET FEATURES [Set transfer mode] Error 555 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 02 7c c5 e9 68 Error: UNC at LBA = 0x08e9c57c = 149538172 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 79 c5 e9 40 00 00:05:56.673 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 00:05:56.673 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 00:05:56.672 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 00:05:56.672 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:05:56.672 SET FEATURES [Set transfer mode] Error 554 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 02 7c c5 e9 68 Error: WP at LBA = 0x08e9c57c = 149538172 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 08 18 d1 d8 e7 40 00 00:05:52.624 WRITE FPDMA QUEUED 61 08 10 89 d8 e7 40 00 00:05:52.624 WRITE FPDMA QUEUED 61 08 40 51 d8 e7 40 00 00:05:52.624 WRITE FPDMA QUEUED 61 10 38 01 d8 e7 40 00 00:05:52.623 WRITE FPDMA QUEUED 61 08 30 e1 d7 e7 40 00 00:05:52.623 WRITE FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2320 - # 2 Extended offline Aborted by host 90% 2075 - # 3 Short offline Completed without error 00% 1900 - # 4 Short offline Completed without error 00% 1574 - # 5 Short offline Completed without error 00% 1415 - # 6 Short offline Completed without error 00% 1081 - # 7 Short offline Completed without error 00% 637 - # 8 Short offline Completed without error 00% 248 - # 9 Short offline Completed without error 00% 0 - #10 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
The READ FPDMA QUEUED errors come up in SMART's log, so those problems I had have to be HDD related, not kernel. Other than that I really don't know what I'm looking at here. I don't like the numbers for UDMA_CRC_Error Count or Multi Zone Error rate, but I don't know what those mean. Until someone explains to me what I'm dealing with here, I'm going to assume it is an HDD problem and start furiously backing stuff up.
bart_b wrote:Some of these issues are related to interference of the Spread Spectrum Clocking (SSC), SSC is a way of eliminating interference. Now some BIOSes let you tweak this setting and some harddisks like WD have a jumper setting to disable/enable the spread spectrum frequency. But since it is more of a RF issue rearranging your sata cables might help or replace them with a better quality cable could do the trick to reduce interference. If you have overclocked your PC, try to re-tweak it to a more sane level.
I forgot to mention that my computer is a Dell Inspiron n5110. Its a piece of garbage, its the only computer I am aware off that has programs that PREVENT you from connecting to the internet when you don't buy their stupid extended warranty. I'm saving up for a thinkpad t420 but thats another story
Woah. Radio signals interfering with data passing through the SATA cables? I've had poorly sheilded guitar cables that pick up odd AM stations, but SATA cables? Thats bizzare. I don't own a PC yet, but I'm going to build one for my brothers once the parts come in, and I'll definitely make a note of that. I didn't even know you could overclock a CPU to the point where it GENERATES RF signals powerful enough to interfere with the HDD. Thats really interesting.
Another note I should mention is that the motherboard on my dell died maybe a month or 2 after the I/O errors. Gotta love dell. Every single dell computer I owned died of mobo failure without any warning. Even their printers. Good god.
Anyway, could the mobo have been a factor? I know the tech that replaced the mobo told me that the LCD cable wasn't seated correctly, so could the SATA cables have been bad as well? I really doubt it though, but who knows---
Edited for embarrassing spelling mistakes
I think it's rather interesting that both you and I have Toshiba hard drives. I've heard that they're supposed to be as delicate as glass, but mine's held up all right, until these weird errors and stuff.
As for motherboards. If the motherboard is messed up, everything will seem messed up, but may or may not.
Offline
SSC is intended as a solution for the failure. It's not that you have big radio towers in your PC you must see it more like an echo on your sata cable bouncing around. Modern day sata drivers in the linux kernel are intended for highspeed sata 600 and if your machine does not like this speed you can try to set your drive to sata-150
If your toshi-drive has a jumper to force it to sata-150 try that for the moment.
Another possible solution if you install "sdparm" and setup your device with the --flexible option, see man sdparm
Last edited by bart_b (2012-08-28 08:59:34)
Offline
@bart
I would rather not mess around with the HDD itself, that usually creates problems rather than solving them. And this computer's HDD was not designed to be user-servicable; I would have to rip the whole computer apart to gain access to it.
How would I know if my computer uses sata 600? Laptops usually don't have SATA cables, the HDD plugs directly into the mobo.
SDparm appears to be for SCSI drives only. When I run sdparm --f I get this:
Read write error recovery mode page:
AWRE 1
ARRE 0
PER 0
Caching (SBC) mode page:
WCE 1
RCD 0
Control mode page:
SWP 0
It looks to me like sdparm/hdparm are performance tuners, which I don't think I need because I doubt that the problem is performance related. If it was, I would have problems with windows and other linux desktop enviroments as well. I just want to know if the errors were signs that the drive would be failing soon.
I just realized that KDE was the only enviroment I used that had a paging utility that ran constantly. Could those errors be caused by a problem with the page files?
Offline