You are not logged in.
Could you make a bug report for this, maybe even mark it as critical?
Well, I guess it's really a hardware bug rather than an arch bug. Why does the "medium" power management level for the drive cause it to park the drive head at a rate likely to cause failure after a fraction of the MTBF? I'm not really sure it's arch's job to save the user from him/herself.
Offline
shining wrote:Could you make a bug report for this, maybe even mark it as critical?
Well, I guess it's really a hardware bug rather than an arch bug. Why does the "medium" power management level for the drive cause it to park the drive head at a rate likely to cause failure after a fraction of the MTBF? I'm not really sure it's arch's job to save the user from him/herself.
What's wrong about informing the user? Couldn't it be arch's job?
I read the arch forums regularly, but I missed this thread, and I wouldn't have found it without searching specifically for it.
I don't read anything about what's going on in Ubuntu, and I only rarely read slashdot. There is too much there.
But if it was announced in the forums (announcement section), on archlinux.org News, or on the ML, I wouldn't have missed it.
It's very easy to check if a laptop is affected, just a smartctl command. And then the workaround is just one hdparm line. The only problem is knowing about the issue in the first place.
But if everyone else disagree, I'll just give up..
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
$ sudo smartctl -a /dev/sda |egrep '(Cycle|Hours)'
9 Power_On_Hours 0x0012 092 092 000 Old_age Always - 3530
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 501
193 Load_Cycle_Count 0x0012 054 054 000 Old_age Always - 462017
$ sudo hdparm -I /dev/sda |grep 'power management level'
Advanced power management level: 128 (0x80)
okay.
I set 'hdparm -B 254' in rc.local (will move to 255 if necessary)
right now.
but it comes back to 128 after suspend. what can I do about that? is there any callback, or hal trick to set it back on resume?
Last edited by lloeki (2007-11-23 11:41:09)
To know recursion, you must first know recursion.
Offline
What's wrong about informing the user? Couldn't it be arch's job?
Absolutely, I agree with you 100% here. But is it arch's job to stick
/sbin/hdparm -B 254
in everybody's rc.sysinit? That's up to the developers to decide, I guess.
Last edited by loserMcloser (2007-11-24 05:36:59)
Offline
shining wrote:What's wrong about informing the user? Couldn't it be arch's job?
Absolutely, I agree with you 100% here. But is it arch's job to stick
/sbin/hdparm -B 254
in everybody's rc.sysinit? That's up to the developers to decide, I guess.
Well, it would be easier to do just that, but that's not what I asked.
Mostly because there might be several reasons for not doing it :
- if it's Arch's policy to not override default values
- if setting this value for people that don't have this problem only has downsides
and maybe others,,
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
it comes back to 128 after suspend. what can I do about that? is there any callback, or hal trick to set it back on resume?
answering to myself:
$ cat /etc/pm/sleep.d/50-hdparm_pm
#/bin/sh
if [ "$1" == "resume" ] || [ "$1" == "thaw" ]; then
hdparm -B 254 /dev/sda
fi
and make sure it's chmod +x, and called from /etc/rc.local
see http://wiki.archlinux.org/index.php/Pm-utils for details.
Last edited by lloeki (2007-11-26 08:51:38)
To know recursion, you must first know recursion.
Offline
What's wrong about informing the user? Couldn't it be arch's job?
I'm on your side here, but I do need to say - this value of 128 that's supposedly "bad" is the value that is set if no one touches anything.
I guess I just don't understand what you'd like announced? Could you explain, or possibly throw some text at me?
Offline
I'm a bit confused about what this magic value is actually doing. Does setting it to 255 turn of the "head parking" or whatever you might call it all together? If so, this sounds much less safe than any wear that leaving it on might do. Unless you park your laptop at a desk (at which point it becomes a desktop, right?) I would think your much more likely to break a hard drive from impact with the head and the platters than just fatigue on the head arms.
Before I run off and play with my power management, does any one have a explanation (or a link to one) that really explains whats going on here? I looked through the ubuntu bug report, but there are few actual details there either.
Offline
shining wrote:What's wrong about informing the user? Couldn't it be arch's job?
I'm on your side here, but I do need to say - this value of 128 that's supposedly "bad" is the value that is set if no one touches anything.
I guess I just don't understand what you'd like announced? Could you explain, or possibly throw some text at me?
I tried to address that with my first post in response to you:
http://bbs.archlinux.org/viewtopic.php? … 07#p294507
So, I believe the announcement would be something like:
The default power management parameters for certain hard disks may be too aggressively set by the manufacturer. These default settings may result in premature HDD failure.
Some operating systems or distributions of operating systems adjust these settings to mitigate the issue. Arch Linux, however, leaves them set at the default level. We therefore advise that you check your settings and modify them if they appear to meet the 'aggressive' criteria set forth in the following documentation of the issue:
<links provided>
Last edited by MrWeatherbee (2007-11-26 17:40:58)
Offline
I'm a bit confused about what this magic value is actually doing. Does setting it to 255 turn of the "head parking" or whatever you might call it all together? If so, this sounds much less safe than any wear that leaving it on might do. Unless you park your laptop at a desk (at which point it becomes a desktop, right?) I would think your much more likely to break a hard drive from impact with the head and the platters than just fatigue on the head arms.
Before I run off and play with my power management, does any one have a explanation (or a link to one) that really explains whats going on here? I looked through the ubuntu bug report, but there are few actual details there either.
you got it about right. The default number of 128 is surprisingly right in the middle 0 - 255, there is a reason for that, its the happy medium. By setting it to a higher number u keep the heads out over the platters and have a higher chance of causing damage to the HD but it wears less on the HD and battery by doing so, vice versa the other way around. 128 is the happy medium for a reason, set as default by manufactures.
With a 5 year warranty and default settings I doubt any manufacture can refuse replacement it the drive fails due to this natural occurrence. That is if the drive fails in the first place, u would actually have to wait and see before u start whining to much!
I am more interested in how many drives will fail due to neglect from keeping the heads 'parked' over the platters more. This requires more user input then one might first expect.
Last edited by jacko (2007-11-26 18:03:39)
Offline
With a 5 year warranty and default settings I doubt any manufacture can refuse replacement it the drive fails due to this natural occurrence. That is if the drive fails in the first place, u would actually have to wait and see before u start whining to much!
I am more interested in how many drives will fail due to neglect from keeping the heads 'parked' over the platters more. This requires more user input then one might first expect.
I think the whole point is to direct users to the information and let them decide how to proceed.
If one decides to leave the settings untouched after informing himself of the issue, then fine. Or, one may still have some questions after reading over the documentation and come here for help or debate. None of that is possible if the issue isn't raised to an appropriate level.
Offline
jacko wrote:With a 5 year warranty and default settings I doubt any manufacture can refuse replacement it the drive fails due to this natural occurrence. That is if the drive fails in the first place, u would actually have to wait and see before u start whining to much!
I am more interested in how many drives will fail due to neglect from keeping the heads 'parked' over the platters more. This requires more user input then one might first expect.
I think the whole point is to direct users to the information and let them decide how to proceed.
If one decides to leave the settings untouched after informing himself of the issue, then fine. Or, one may still have some questions after reading over the documentation and come here for help or debate. None of that is possible if the issue isn't raised to an appropriate level.
Yes, that's exactly my opinion.
As for the source link for the announcement, I first thought the link from the first post to the ubuntu wiki would be enough.
But it indeed probably doesn't highlight enough that it might be a dangerous workaround and cause bigger problems than it solves.
As barebones pointed out, it probably depends mostly on the usage.
If it's really used as a laptop, it might be safer to keep the default 128 value.
If it's rather used as a desktop, it might be safer to increase that value to decrease the number of unnecessary parking.
But it also seems that, depending on the hardware and usage (not sure if both are as important), the number of load cycles can be very different.
That's why it should be let to the user to deal with the situation once he's aware of the problem.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
i also have the hdd problem with the "western digital caviar GP" (3,5") on my homeserver - an increase of load_cycle_count of about ~100/hour :-(
the problem is that this hdd does not support "hdparm -B XYZ". Is there any other possibility to stop the hdd-head-parking?
Offline
pretty easy:
disable APM.
assuming modern hardware is using ACPI, you don't need APM at all
Offline
thanks for you answer. i tried this allready: i disabled apm in the bios and i also added apm=off to the kernel options in grub, but it didn't help
Offline
interesting, because after two yrs without APM my laptop shows:
sudo smartctl -a /dev/sda |egrep '(Cycle|Hours)'
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1253
193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 177693
240 Head_Flying_Hours 0x003e 200 200 000 Old_age Always - 0
without hanging hdparm settings.
Offline
there seems to be a problem with "western digital caviar GP" harddisks on linux. i tried different distros and kernels... but i could not resolve the problem. I changed the disks to Samsung SP and no more "HDD clicking". Strange.
Offline
it comes back to 128 after suspend. what can I do about that? is there any callback, or hal trick to set it back on resume?
answering to myself:
$ cat /etc/pm/sleep.d/50-hdparm_pm #/bin/sh if [ "$1" == "resume" ] || [ "$1" == "thaw" ]; then hdparm -B 254 /dev/sda fi
and make sure it's chmod +x, and called from /etc/rc.local
see http://wiki.archlinux.org/index.php/Pm-utils for details.
Here is my personal solution (tnx to lloeki) for setting default -B values after suspend, depending on battery state:
#!/bin/sh
if [ "$1" == "resume" ] || [ "$1" == "thaw" ]; then
if cat /proc/acpi/battery/BAT1/state|grep "charged" > /dev/null; then
hdparm -B 254 /dev/sda
else
hdparm -B 128 /dev/sda
fi
fi
254 and 128 are my choice, you could change to whatever you have in laptop-mode-tools config
Last edited by ludmiloff (2008-04-04 12:28:46)
Stable ArchLinux
http://archstable.blogspot.com/
Offline
I read this thread and the related one in Ubuntu forums and two weeks ago I started to check what was the status of my laptop HD, a Samsung HM121HC, 120GB IDE, bought in November last year. The click sound could be heard every few seconds so I was not surprised when I saw that the Load_Cycle_Count was above 200000. This seemed to me very high for a HD bought only a few months ago, so I checked the values of the power management. I discovered that this HD has only 4 values:
- 1: The most aggressive, the HD spins up and down every few seconds
- 128: default value, but also very aggressive, the click is heard every few seconds
- 254: the clicks stop completely
- 255: apparently no power management, but the clicks can be heard at the same rate as with 128
So I decided to include this line in my rc.local:
hdparm -B254 /dev/sda > /dev/null
Everything seemed fine, but the temperature of the HD was now 3 or 4 degrees higher and the laptop started to freeze for a few seconds when the HD was being heavily used. I checked the logs and I discovered errors like this:
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: soft resetting link
ata1.00: configured for UDMA/100
ata1: EH complete
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Furthermore,
hdparm -v /dev/sda
didn't show the values of multcount, IO_support or unmaskirq, and complained about a not valid IO. I looked for information about these problems and somewhere, I don't really remember where, I found that they may be caused by libata, and it may be solved using the legacy IDE drivers. I tried it.
The problem with hdparm was solved:
# hdaprm -v /dev/hda
/dev/hda:
multcount = 16 (on)
IO_support = 3 (32-bit w/sync)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 16383/255/63, sectors = 234441648, start = 0
For a few days everything was fine: no clicks, no few seconds freeze, no errors either with hdparm or smartctl. But yesterday the freezings started again, this time with this error:
hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
ide0: reset: success
or this
hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: 0xb0
It seems that if I go back to a value of 128 in power management no freezings occur, but I can not say for sure.
As I write this the smart values are these:
# smartctl -A /dev/hda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 3
3 Spin_Up_Time 0x0007 252 252 025 Pre-fail Always - 2187
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 365
5 Reallocated_Sector_Ct 0x0033 099 099 010 Pre-fail Always - 11
7 Seek_Error_Rate 0x000e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 2319
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 200
191 G-Sense_Error_Rate 0x0032 099 099 000 Old_age Always - 12848
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 44
194 Temperature_Celsius 0x0022 094 088 000 Old_age Always - 48 (Lifetime Min/Max 12/50)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 9798
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 12
201 Soft_Read_Error_Rate 0x0032 252 252 000 Old_age Always - 0
223 Load_Retry_Count 0x0032 098 098 000 Old_age Always - 2567
225 Load_Cycle_Count 0x0032 074 074 000 Old_age Always - 264646
So I don't know what to do. If I don't change the power management value the cliks are heard every 15 or 20 seconds and the Load_Cycle_Count grows accordingly, with a rate of more than 100 per hour and if I change it the HD may suffer random freezings.
Does anyone have any new idea?
Sorry for the long post.
Offline
Further to my previous post, it seems that if I disable smartd no freezings occur. Could the smartd acces to the hd so often that it hinders the normal read/write operations?
Offline
hi
I just checked my /etc/rc.local and I have
hdparm -B 254 /dev/sda
hdparm -S 0 /dev/sda
I know the first is the USEFUL one, is the second useless?
hdparm --help reports " -S set standby (spindown) timeout" so I think it may help..
thanks for the clarify
Offline
Hi everybody.
Sorry that i bring back this post, but i found something that could be useful.
I put my hdparm -B to 254 value, and the famous thick noise goes away, but when i monitor the hardrive temperature with that value it raises up a lot. Even the laptop surface gets hot.
So, i was wondering why this never happen on M$.
In M$ my top temperature of the HD was 37 ºC, and on arch it gets all time at 45 ºC, so why is this ?
Here is the answer:
I found that the LOAD_CYCLE_COUNT on M$ was 199, and there are a lot of values that gets the HD works even more silent (i always feel that on _any linux_ my hd was more noisy).
Well, i hope this help more, if i'm on the wrong way please let me know, or what could be a better conf for the HD life.
See ya.
"Any question has two points of view: the wrong one and ours."
www.alexertech.com
Offline
alexertech,
in fact your load_cycle_count is not 199, but 0x1083 (hex). it's the raw value you should read, else your temperature would be 110°C when you can read it's 0x25 (again, hex, it is 38 in dec).
as for noise, there's a hdparm setting whihc is acoustic management, which is a balance between performance and noise by adjusting head speed. it seems by default it's 254, when it can be between 0 and 254, but have discrete steps (see hdparm man page) try to look at what hdparm -iI /dev/sda says, and try to set the value to 128.
Last edited by lloeki (2008-06-17 21:03:05)
To know recursion, you must first know recursion.
Offline
Hi lloeki.
Well, i thought that the value appears in the "value" column, thats where the actually value of the param.
And I don't know if 110 where in Celsius, because if you convert that from Fahrenheit, gives 43º, what i thought was correct because in that moment the machine where into heavy work.
But, i reeaaallyyy don't know about this, so,....
Thanks anyway, I'm looking hdparm to see what else can i put right.
One thing that its true, its since i put 199, the HD on Arch doesn't get that HOT , it tops 38º, but if i put 254, it tops to 45º, so i don't know
"Any question has two points of view: the wrong one and ours."
www.alexertech.com
Offline
if you set -M (acoustic management) to 254, it will allow the head to move as fast as possible, resulting in clicks and frequent noise as the head jumps from point to point. setting it to 128 will lessen the noise and reduce performance since the head moves slower, but I suppose it'll certainly improve head lifetime, since it doesn't suffer from such harder accelerations which cause the clicks.
if you set -B (advanced power management) to 254 the hd will never spin down, so the motor will constantly generate heat. if you set it to 199, it will spin down, allowing it to cool down as the motor is off.
as for the value, use smartmontools and see values with smartctl --all /dev/sda (you may want to resize the terminal larger than 80 chars). you'll see them in the 'Raw' column, in readable format.
To know recursion, you must first know recursion.
Offline