Hi there --
I've got a Seagate Momentus XT hybrid drive (ST95005620AS) in my desktop machine, dual-booting Arch Linux and Windows. Whenever I turn the machine on, the drive seeks loudly for a bit then, once it gets to the Windows loading screen, it quiets down and stays quiet. If I boot into Linux, however, it sounds like it's seeking 100% of the time and I can't get it to stop. (It's super annoying.) I've tried a bunch of things and I really don't know where to look next.
- turning off Advanced Power Management (hdparm -B 255)
- tweaking Automatic Acoustic Management settings (hdparm -M 0/128/254)
- a variety of spin-down commands (hdparm --idle-immediate, hdparm --idle-unload, hdparm -y)
smartctl reports that ID# 184 ("End-to-End_Error") is FAILING_NOW with a value of 097, but SeaTools under Windows claims the drive is fine. Searching around the web indicates that SeaTools is the definitive tool for inferring errors from Seagate drives and that its results trump smartctl.
I've run "iostat -d 1" and the majority of intervals show zero I/O activity on the drive in question, despite the chronic noise.
I don't know what to investigate next. The fact that the drive quiets down in Windows implies to me that I should be able to make it run quietly in Linux. Halp?
Check file indexing. I had it on and it made HDD behave like yours. If it's not indexing, I have no idea what it could be.
You can run `iotop -Pao` to see if any process in userland is doing it. Is it ext4? Is it a freshly formatted ext4? I seem to recall reading some threads that described the same problem and the cause was some background indexing of the partition or something like that... might wanna search on it to see if my memory is right.
Check iotop. What desktop environment are you on?
As far as I know, I don't run anything that would be indexing files, and lsof doesn't show any likely candidates.
I ran "iotop -Pao" for a while and jbd2 was moderately active on my root partition. After jumping through a bunch of hoops , I turned off the journal on that partition and the constant seek noise persists.
It is indeed an ext4 device, though not at all freshly formatted. I found the threads you mentioned and my interpretation is that, having turned off the journal and had the problem continue, they do not apply.
I'm using i3 and a very light load otherwise (ps reports fewer than 30 non-kernel threads).
 Because !&^*@%!@#^$&% systemd.
While hearing those noises, does the HDD activity indicator light turn on? Have you tried:
hdparm -B 254
...or other values? I seem to recall owning an HDD that ignored '255' but was happy with '254' (in fact I just checked hdparm's man page and it confirms some drives ignore '255'). Setting lower values might be worth it, too (i.e. 1, 127, 128).
Last edited by Pse (2014-04-05 01:57:21)
I just tried a bunch of different values with "hdparm -B", to no avail.
Have you checked systemd's journal if it was getting flooded by some error message for example?
Post the output of 'smartctl -a /dev/sdX'.
It might be that the drive has background data collection enabled.
I had one of these drives a while back. It was kind of finicky about what machine it was in, what operating system it ran, and most importantly what firmware it had. It seemed to do great in my old MacBook with OSX and pretty well with Arch. But when I put it in my Thinkpad it was not as happy. It still seemed to work okay, but it definitely made a bit more noise.
You should check to make sure you have the latest firmware for this drive. The early version were very buggy while the newer ones seem to be less so. I remember upgrading to something that ended in '28'...
Okay weirdness. I turned the machine on today and it's silent in Linux now, too. Maybe it needed a hard power-off after turning the journal off or tweaking the APM/AAM values? I have no idea.
There is/was nothing in systemd's journal to indicate problems.
Here's what smartctl has to say about it:
=== START OF INFORMATION SECTION === Model Family: Seagate Momentus XT Device Model: ST95005620AS Serial Number: 5YX1NQJV LU WWN Device Id: 5 000c50 04541a608 Firmware Version: SD28 User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Sat Apr 5 14:00:25 2014 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 634) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 103) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1001) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 190375647 3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 704 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 32287264 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 3125 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 746 184 End-to-End_Error 0x0032 097 097 099 Old_age Always FAILING_NOW 3 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 253 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 083 063 045 Old_age Always - 17 (Min/Max 15/17) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 49 193 Load_Cycle_Count 0x0032 089 089 000 Old_age Always - 22300 194 Temperature_Celsius 0x0022 017 040 000 Old_age Always - 17 (0 5 0 0 0) 195 Hardware_ECC_Recovered 0x001a 047 044 000 Old_age Always - 190375647 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 3117 - # 2 Short offline Completed without error 00% 3050 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
(See the comment in my original post as to why I think the FAILING_NOW entry is a red herring.)
If the filesystem was new and it was ext4 it could be that it was just finishing up the mkfs. It does something called lazy_itable_init and lazy_journal_init which means that it only writes out enough of the filesystem to get going and then finishes up on subsequent mounts.
The filesystem is at least a year old by this point.
The filesystem is at least a year old by this point.
Okay... I guess that rules that out.
I'm leaning towards the hypothesis that the drive required a power-down or reboot to reflect settings made with hdparm. Apparently, after my last round of tinkering, advanced power management was disabled and acoustic management was set to 0. I may run further experiments to see if this hypothesis holds; on the other hand, I might just not look the proverbial gift horse in the mouth and enjoy my nice, quiet system.
Thanks for all the advice!
The line "Auto Offline Data Collection: Disabled." in the output of smartctl says my idea was wrong. The other smart values seem ok, except the one that says that it is failing.
One thing that catches my eye is the Min/Max Airflow Temperature, the range seems a bit low, unless you are taking active measures to control the HD temperature, so I'd say to take other smart values with a grain of salt. Of course that doing regular backups is always a good idea, just in case something decides to give up the ghost.
Regarding APM/AAM settings, usually they are lost after a powerdown, as far as I know these is no simple way to save "custom" values to the firmware but your drive might behave differently for some reason. Because APM/AAM settings are lost after a power down they take effect immediately, at least on the drives I have had my hands on so far, however I don't have experience with hybrid drives so something could be different.
Last edited by R00KIE (2014-04-06 13:06:58)
Thanks, R00KIE. I don't have any links handy, but wandering around the web gave the impression that the names associated with the vendor-specific attributes are not to be trusted. That is, there is no guarantee that any given vendor implmentation of S.M.A.R.T. uses those values for their named purposes, which is why the FAILING_NOW entry is (reputedly) nothing to worry about. Well, not quite: it's "nothing to worry about" because the official SeaTools check says the drive is okay, and my web searches told me that's what Seagate says is authoritative.
But, um, the problem is back. Sort of. It seems that cold-booting directly to Linux results in a quiet drive, but that warm-booting (definitely from Windows, maybe from Linux, too) results in a loud drive. This is only a preliminary hypothesis and requires experimentation to solidify.