Hard drive advanced power management level can kill your laptop drive?

loserMcloser · 2007-11-23 03:16:05

shining wrote:

Could you make a bug report for this, maybe even mark it as critical?

Well, I guess it's really a hardware bug rather than an arch bug. Why does the "medium" power management level for the drive cause it to park the drive head at a rate likely to cause failure after a fraction of the MTBF? I'm not really sure it's arch's job to save the user from him/herself.

shining · 2007-11-23 07:10:36

loserMcloser wrote:

shining wrote:
Could you make a bug report for this, maybe even mark it as critical?
Well, I guess it's really a hardware bug rather than an arch bug. Why does the "medium" power management level for the drive cause it to park the drive head at a rate likely to cause failure after a fraction of the MTBF? I'm not really sure it's arch's job to save the user from him/herself.

What's wrong about informing the user? Couldn't it be arch's job?
I read the arch forums regularly, but I missed this thread, and I wouldn't have found it without searching specifically for it.
I don't read anything about what's going on in Ubuntu, and I only rarely read slashdot. There is too much there.
But if it was announced in the forums (announcement section), on archlinux.org News, or on the ML, I wouldn't have missed it.

It's very easy to check if a laptop is affected, just a smartctl command. And then the workaround is just one hdparm line. The only problem is knowing about the issue in the first place.
But if everyone else disagree, I'll just give up..

lloeki · 2007-11-23 11:39:06

$ sudo smartctl -a /dev/sda |egrep '(Cycle|Hours)'
  9 Power_On_Hours          0x0012   092   092   000    Old_age   Always       -       3530
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       501
193 Load_Cycle_Count        0x0012   054   054   000    Old_age   Always       -       462017
$ sudo hdparm -I /dev/sda |grep 'power management level'
        Advanced power management level: 128 (0x80)

okay.
I set 'hdparm -B 254' in rc.local (will move to 255 if necessary)
right now.

but it comes back to 128 after suspend. what can I do about that? is there any callback, or hal trick to set it back on resume?

Last edited by lloeki (2007-11-23 11:41:09)

loserMcloser · 2007-11-24 05:34:29

shining wrote:

What's wrong about informing the user? Couldn't it be arch's job?

Absolutely, I agree with you 100% here. But is it arch's job to stick

/sbin/hdparm -B 254

in everybody's rc.sysinit? That's up to the developers to decide, I guess.

Last edited by loserMcloser (2007-11-24 05:36:59)

shining · 2007-11-24 11:26:16

loserMcloser wrote:

shining wrote:
What's wrong about informing the user? Couldn't it be arch's job?
Absolutely, I agree with you 100% here. But is it arch's job to stick
/sbin/hdparm -B 254
in everybody's rc.sysinit? That's up to the developers to decide, I guess.

Well, it would be easier to do just that, but that's not what I asked.
Mostly because there might be several reasons for not doing it :
- if it's Arch's policy to not override default values
- if setting this value for people that don't have this problem only has downsides
and maybe others,,

lloeki · 2007-11-26 08:50:29

it comes back to 128 after suspend. what can I do about that? is there any callback, or hal trick to set it back on resume?

answering to myself:

$ cat /etc/pm/sleep.d/50-hdparm_pm 
#/bin/sh

if [ "$1" == "resume" ] || [ "$1" == "thaw" ]; then
        hdparm -B 254 /dev/sda
fi

and make sure it's chmod +x, and called from /etc/rc.local

see http://wiki.archlinux.org/index.php/Pm-utils for details.

Last edited by lloeki (2007-11-26 08:51:38)

phrakture · 2007-11-26 17:01:41

shining wrote:

What's wrong about informing the user? Couldn't it be arch's job?

I'm on your side here, but I do need to say - this value of 128 that's supposedly "bad" is the value that is set if no one touches anything.

I guess I just don't understand what you'd like announced? Could you explain, or possibly throw some text at me?

barebones · 2007-11-26 17:38:54

I'm a bit confused about what this magic value is actually doing. Does setting it to 255 turn of the "head parking" or whatever you might call it all together? If so, this sounds much less safe than any wear that leaving it on might do. Unless you park your laptop at a desk (at which point it becomes a desktop, right?) I would think your much more likely to break a hard drive from impact with the head and the platters than just fatigue on the head arms.

Before I run off and play with my power management, does any one have a explanation (or a link to one) that really explains whats going on here? I looked through the ubuntu bug report, but there are few actual details there either.

MrWeatherbee · 2007-11-26 17:39:08

phrakture wrote:

shining wrote:
What's wrong about informing the user? Couldn't it be arch's job?
I'm on your side here, but I do need to say - this value of 128 that's supposedly "bad" is the value that is set if no one touches anything.
I guess I just don't understand what you'd like announced? Could you explain, or possibly throw some text at me?

I tried to address that with my first post in response to you:

http://bbs.archlinux.org/viewtopic.php? … 07#p294507

So, I believe the announcement would be something like:

The default power management parameters for certain hard disks may be too aggressively set by the manufacturer. These default settings may result in premature HDD failure.
Some operating systems or distributions of operating systems adjust these settings to mitigate the issue. Arch Linux, however, leaves them set at the default level. We therefore advise that you check your settings and modify them if they appear to meet the 'aggressive' criteria set forth in the following documentation of the issue:
<links provided>

Last edited by MrWeatherbee (2007-11-26 17:40:58)

jacko · 2007-11-26 18:00:49

barebones wrote:

I'm a bit confused about what this magic value is actually doing. Does setting it to 255 turn of the "head parking" or whatever you might call it all together? If so, this sounds much less safe than any wear that leaving it on might do. Unless you park your laptop at a desk (at which point it becomes a desktop, right?) I would think your much more likely to break a hard drive from impact with the head and the platters than just fatigue on the head arms.
Before I run off and play with my power management, does any one have a explanation (or a link to one) that really explains whats going on here? I looked through the ubuntu bug report, but there are few actual details there either.

you got it about right. The default number of 128 is surprisingly right in the middle 0 - 255, there is a reason for that, its the happy medium. By setting it to a higher number u keep the heads out over the platters and have a higher chance of causing damage to the HD but it wears less on the HD and battery by doing so, vice versa the other way around. 128 is the happy medium for a reason, set as default by manufactures.

With a 5 year warranty and default settings I doubt any manufacture can refuse replacement it the drive fails due to this natural occurrence. That is if the drive fails in the first place, u would actually have to wait and see before u start whining to much!

I am more interested in how many drives will fail due to neglect from keeping the heads 'parked' over the platters more. This requires more user input then one might first expect.

Last edited by jacko (2007-11-26 18:03:39)

MrWeatherbee · 2007-11-26 18:15:40

jacko wrote:

With a 5 year warranty and default settings I doubt any manufacture can refuse replacement it the drive fails due to this natural occurrence. That is if the drive fails in the first place, u would actually have to wait and see before u start whining to much!
I am more interested in how many drives will fail due to neglect from keeping the heads 'parked' over the platters more. This requires more user input then one might first expect.

I think the whole point is to direct users to the information and let them decide how to proceed.

If one decides to leave the settings untouched after informing himself of the issue, then fine. Or, one may still have some questions after reading over the documentation and come here for help or debate. None of that is possible if the issue isn't raised to an appropriate level.

shining · 2007-11-26 19:13:10

MrWeatherbee wrote:

jacko wrote:
With a 5 year warranty and default settings I doubt any manufacture can refuse replacement it the drive fails due to this natural occurrence. That is if the drive fails in the first place, u would actually have to wait and see before u start whining to much!
I am more interested in how many drives will fail due to neglect from keeping the heads 'parked' over the platters more. This requires more user input then one might first expect.
I think the whole point is to direct users to the information and let them decide how to proceed.
If one decides to leave the settings untouched after informing himself of the issue, then fine. Or, one may still have some questions after reading over the documentation and come here for help or debate. None of that is possible if the issue isn't raised to an appropriate level.

Yes, that's exactly my opinion.

As for the source link for the announcement, I first thought the link from the first post to the ubuntu wiki would be enough.
But it indeed probably doesn't highlight enough that it might be a dangerous workaround and cause bigger problems than it solves.
As barebones pointed out, it probably depends mostly on the usage.
If it's really used as a laptop, it might be safer to keep the default 128 value.
If it's rather used as a desktop, it might be safer to increase that value to decrease the number of unnecessary parking.

But it also seems that, depending on the hardware and usage (not sure if both are as important), the number of load cycles can be very different.
That's why it should be let to the user to deal with the situation once he's aware of the problem.

gernonimo · 2008-03-06 15:52:48

i also have the hdd problem with the "western digital caviar GP" (3,5") on my homeserver - an increase of load_cycle_count of about ~100/hour :-(
the problem is that this hdd does not support "hdparm -B XYZ". Is there any other possibility to stop the hdd-head-parking?

broch · 2008-03-07 02:30:34

pretty easy:
disable APM.
assuming modern hardware is using ACPI, you don't need APM at all

gernonimo · 2008-03-07 08:32:59

thanks for you answer. i tried this allready: i disabled apm in the bios and i also added apm=off to the kernel options in grub, but it didn't help

broch · 2008-03-08 02:18:22

interesting, because after two yrs without APM my laptop shows:
sudo smartctl -a /dev/sda |egrep '(Cycle|Hours)'
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1253
193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 177693
240 Head_Flying_Hours 0x003e 200 200 000 Old_age Always - 0

without hanging hdparm settings.

gernonimo · 2008-03-17 17:35:23

there seems to be a problem with "western digital caviar GP" harddisks on linux. i tried different distros and kernels... but i could not resolve the problem. I changed the disks to Samsung SP and no more "HDD clicking". Strange.

ludmiloff · 2008-04-04 10:29:00

lloeki wrote:

it comes back to 128 after suspend. what can I do about that? is there any callback, or hal trick to set it back on resume?
answering to myself:
$ cat /etc/pm/sleep.d/50-hdparm_pm 
#/bin/sh

if [ "$1" == "resume" ] || [ "$1" == "thaw" ]; then
        hdparm -B 254 /dev/sda
fi
and make sure it's chmod +x, and called from /etc/rc.local
see http://wiki.archlinux.org/index.php/Pm-utils for details.

Here is my personal solution (tnx to lloeki) for setting default -B values after suspend, depending on battery state:

#!/bin/sh

if [ "$1" == "resume" ] || [ "$1" == "thaw" ]; then
    if cat /proc/acpi/battery/BAT1/state|grep "charged" > /dev/null; then
    hdparm -B 254 /dev/sda
    else
    hdparm -B 128 /dev/sda
    fi
fi

254 and 128 are my choice, you could change to whatever you have in laptop-mode-tools config

Last edited by ludmiloff (2008-04-04 12:28:46)

Laertes · 2008-05-18 08:49:22

I read this thread and the related one in Ubuntu forums and two weeks ago I started to check what was the status of my laptop HD, a Samsung HM121HC, 120GB IDE, bought in November last year. The click sound could be heard every few seconds so I was not surprised when I saw that the Load_Cycle_Count was above 200000. This seemed to me very high for a HD bought only a few months ago, so I checked the values of the power management. I discovered that this HD has only 4 values:

- 1: The most aggressive, the HD spins up and down every few seconds
- 128: default value, but also very aggressive, the click is heard every few seconds
- 254: the clicks stop completely
- 255: apparently no power management, but the clicks can be heard at the same rate as with 128

So I decided to include this line in my rc.local:

hdparm -B254 /dev/sda > /dev/null

Everything seemed fine, but the temperature of the HD was now 3 or 4 degrees higher and the laptop started to freeze for a few seconds when the HD was being heavily used. I checked the logs and I discovered errors like this:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: soft resetting link
ata1.00: configured for UDMA/100
ata1: EH complete
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Furthermore,

hdparm -v /dev/sda

didn't show the values of multcount, IO_support or unmaskirq, and complained about a not valid IO. I looked for information about these problems and somewhere, I don't really remember where, I found that they may be caused by libata, and it may be solved using the legacy IDE drivers. I tried it.

The problem with hdparm was solved:

# hdaprm -v /dev/hda
/dev/hda:
 multcount     = 16 (on)
 IO_support    =  3 (32-bit w/sync)
 unmaskirq     =  0 (off)
 using_dma     =  1 (on)
 keepsettings  =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 16383/255/63, sectors = 234441648, start = 0

For a few days everything was fine: no clicks, no few seconds freeze, no errors either with hdparm or smartctl. But yesterday the freezings started again, this time with this error:

hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
ide0: reset: success

or this

hda: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: 0xb0

It seems that if I go back to a value of 128 in power management no freezings occur, but I can not say for sure.

As I write this the smart values are these:

 # smartctl -A /dev/hda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       3
  3 Spin_Up_Time            0x0007   252   252   025    Pre-fail  Always       -       2187
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       365
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       11
  7 Seek_Error_Rate         0x000e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       2319
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       200
191 G-Sense_Error_Rate      0x0032   099   099   000    Old_age   Always       -       12848
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       44
194 Temperature_Celsius     0x0022   094   088   000    Old_age   Always       -       48 (Lifetime Min/Max 12/50)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       9798
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       12
201 Soft_Read_Error_Rate    0x0032   252   252   000    Old_age   Always       -       0
223 Load_Retry_Count        0x0032   098   098   000    Old_age   Always       -       2567
225 Load_Cycle_Count        0x0032   074   074   000    Old_age   Always       -       264646

So I don't know what to do. If I don't change the power management value the cliks are heard every 15 or 20 seconds and the Load_Cycle_Count grows accordingly, with a rate of more than 100 per hour and if I change it the HD may suffer random freezings.

Does anyone have any new idea?

Sorry for the long post.

Laertes · 2008-05-22 09:04:05

Further to my previous post, it seems that if I disable smartd no freezings occur. Could the smartd acces to the hd so often that it hinders the normal read/write operations?

_Marco_ · 2008-05-22 09:46:14

hi
I just checked my /etc/rc.local and I have

hdparm -B 254 /dev/sda
hdparm -S 0 /dev/sda

I know the first is the USEFUL one, is the second useless?
hdparm --help reports " -S set standby (spindown) timeout" so I think it may help..
thanks for the clarify

alexertech · 2008-06-16 17:23:01

Hi everybody.

Sorry that i bring back this post, but i found something that could be useful.

I put my hdparm -B to 254 value, and the famous thick noise goes away, but when i monitor the hardrive temperature with that value it raises up a lot. Even the laptop surface gets hot.

So, i was wondering why this never happen on M$.

In M$ my top temperature of the HD was 37 ºC, and on arch it gets all time at 45 ºC, so why is this ?

Here is the answer:

I found that the LOAD_CYCLE_COUNT on M$ was 199, and there are a lot of values that gets the HD works even more silent (i always feel that on _any linux_ my hd was more noisy).

Well, i hope this help more, if i'm on the wrong way please let me know, or what could be a better conf for the HD life.

See ya.

lloeki · 2008-06-17 20:58:57

alexertech,

in fact your load_cycle_count is not 199, but 0x1083 (hex). it's the raw value you should read, else your temperature would be 110°C when you can read it's 0x25 (again, hex, it is 38 in dec).

as for noise, there's a hdparm setting whihc is acoustic management, which is a balance between performance and noise by adjusting head speed. it seems by default it's 254, when it can be between 0 and 254, but have discrete steps (see hdparm man page) try to look at what hdparm -iI /dev/sda says, and try to set the value to 128.

Last edited by lloeki (2008-06-17 21:03:05)

alexertech · 2008-06-17 21:28:55

Hi lloeki.

Well, i thought that the value appears in the "value" column, thats where the actually value of the param.

And I don't know if 110 where in Celsius, because if you convert that from Fahrenheit, gives 43º, what i thought was correct because in that moment the machine where into heavy work.

But, i reeaaallyyy don't know about this, so,....

Thanks anyway, I'm looking hdparm to see what else can i put right.

One thing that its true, its since i put 199, the HD on Arch doesn't get that HOT , it tops 38º, but if i put 254, it tops to 45º, so i don't know

lloeki · 2008-06-17 22:12:56

if you set -M (acoustic management) to 254, it will allow the head to move as fast as possible, resulting in clicks and frequent noise as the head jumps from point to point. setting it to 128 will lessen the noise and reduce performance since the head moves slower, but I suppose it'll certainly improve head lifetime, since it doesn't suffer from such harder accelerations which cause the clicks.

if you set -B (advanced power management) to 254 the hd will never spin down, so the motor will constantly generate heat. if you set it to 199, it will spin down, allowing it to cool down as the motor is off.

as for the value, use smartmontools and see values with smartctl --all /dev/sda (you may want to resize the terminal larger than 80 chars). you'll see them in the 'Raw' column, in readable format.

Arch Linux

#26 2007-11-23 03:16:05

Re: Hard drive advanced power management level can kill your laptop drive?

#27 2007-11-23 07:10:36

Re: Hard drive advanced power management level can kill your laptop drive?

#28 2007-11-23 11:39:06

Re: Hard drive advanced power management level can kill your laptop drive?

#29 2007-11-24 05:34:29

Re: Hard drive advanced power management level can kill your laptop drive?

#30 2007-11-24 11:26:16

Re: Hard drive advanced power management level can kill your laptop drive?

#31 2007-11-26 08:50:29

Re: Hard drive advanced power management level can kill your laptop drive?

#32 2007-11-26 17:01:41

Re: Hard drive advanced power management level can kill your laptop drive?

#33 2007-11-26 17:38:54

Re: Hard drive advanced power management level can kill your laptop drive?

#34 2007-11-26 17:39:08

Re: Hard drive advanced power management level can kill your laptop drive?

#35 2007-11-26 18:00:49

Re: Hard drive advanced power management level can kill your laptop drive?

#36 2007-11-26 18:15:40

Re: Hard drive advanced power management level can kill your laptop drive?

#37 2007-11-26 19:13:10

Re: Hard drive advanced power management level can kill your laptop drive?

#38 2008-03-06 15:52:48

Re: Hard drive advanced power management level can kill your laptop drive?

#39 2008-03-07 02:30:34

Re: Hard drive advanced power management level can kill your laptop drive?

#40 2008-03-07 08:32:59

Re: Hard drive advanced power management level can kill your laptop drive?

#41 2008-03-08 02:18:22

Re: Hard drive advanced power management level can kill your laptop drive?

#42 2008-03-17 17:35:23

Re: Hard drive advanced power management level can kill your laptop drive?

#43 2008-04-04 10:29:00

Re: Hard drive advanced power management level can kill your laptop drive?

#44 2008-05-18 08:49:22

Re: Hard drive advanced power management level can kill your laptop drive?

#45 2008-05-22 09:04:05

Re: Hard drive advanced power management level can kill your laptop drive?

#46 2008-05-22 09:46:14

Re: Hard drive advanced power management level can kill your laptop drive?

#47 2008-06-16 17:23:01

Re: Hard drive advanced power management level can kill your laptop drive?

#48 2008-06-17 20:58:57

Re: Hard drive advanced power management level can kill your laptop drive?

#49 2008-06-17 21:28:55

Re: Hard drive advanced power management level can kill your laptop drive?

#50 2008-06-17 22:12:56

Re: Hard drive advanced power management level can kill your laptop drive?

Board footer