After switching to systemd I noticed a massive increase of power consumption on my Thinkpad X220 (i5 Sandy Bridge).
The strange thing is, that after some reboots it randomly seems to catch the i915 power saving mode and the consumption gets from ~24W to ~7W at idle. But only after several reboots.
Adding the good old i915.i915_enable_rc6=1 to the boot parameters doesn't make any differences. But looking at the powertop detail outputs it seems to be a i915 issue.
Using the latest linux kernel from core (3.6.4) and latest intel drivers (2.20.12).
This is most likely a kernel issue, not a systemd one.
Intel has had massive power regressions the last few kernel updates.
Makes sense. The problems just occured after switching to systemd, so I thought it may be a service that sometimes didn't start correctly.
Maybe I installed a new kernel version together with systemd.
There is already a massive thread concerning power regressions in sandy bridge machines. Actually it does not affect the ivy bridge generation, as my computer actually consumes less power now.
Also, the kernel parameter i915_enable_rc6=1 actually does nothing, as you are simply enabling what is already the default. In fact, if you look at powertop, I believe that the default (-1) is actually equivelent to i915.i915_renable_rc6=3, so using 1 will actually make it so that it no longer enables rc6p and theoretically will end up using more power, as it can no longer enter the deeper rc6p state. If you want to try a 7, it will enable rc6pp in addition to the other two, but at the apparent cost of instability. For me, it worked fine, but actually ended up reducing my overall battery time for some reason. So I stuck with the defaults because I figure they are defaults for a reason.
The advice I was given in the other thread was to use 1 or -1.
I'm not sure that the 1 does nothing in practice with the current kernel. I could be wrong. However, adding a bunch of stuff explicitly back to my command line which I had largely dropped because it was meant to be unnecessary does seem to have helped (a bit). I'm still worried and power consumption is still too high but it does seem to help. That's anecdotal and not worth much as this bug seems somewhat arbitrary and random. But I would not be too quick to dismiss options for i915 which are meant to now make no difference but... er... at least seem to.
I'm currently using:
pcie_aspm=force i915.i915_enable_rc6=1 i915.i915_enable_fbc=1 i915.lvds_downclock=1 i915.semaphores=1
which is what I was using previously. I've also been told that a "7" can actually increase power usage though I don't know the explanation for this.
I believe that what the defaults are equivalent to depends on your hardware (default for the chipset) so it might give different results in different cases...?
@cfr, I think I was the one who told you that battery times decreased because of the 7. I am not sure why this was, or if it was my mind thinking I was experiencing somethign I was not, or possibly even doing different things on my comptuer during those times resulted in differences. I just know tha the defaults are sane, so I will stick with them. If you take a look at the defaults though (powertop inidicates percentages of time spent in each power state), when you put nothing and let it default to the -1, it actually then disables rc6p, leaving only rc6 enabled. So it is probably a good idea to let it do what it wants. I just checked the intel graphics site, and rc6 is now enabled by default on sandy and ivy bridge machines.
Also, framebuffer compression has been enabled by default since quarter 4 of 2011 apparently. So you can probably drop that one.
Though it says that semaphores is still not enabled by default. Actually it just doesn't say anything about it since quarter 4 of 2011 when it says it is not enabled by default.
I think the lvds_downclock one is obviously not going to be enabled by default, since that probably diminishes overall performance of the gpu, though likely at some power savings.
All in all, I have taken off all those things regarding i915. I still have pcie_aspm=force though, since my logs apparently still inidcate that the kernel is allowing the bios to still tell it not to turn it on. I have heard that this was fixed, but my machine still seems to be doing it. Even so, I don't notice any adverse effects from the setting.
Oh yeah, one more thing. You know that if you want to clean up your kernel command line, you can have those settings loaded, even if you use early kms if you put them in /etc/modprobe.d/i915.conf and then have mkinitcpio.conf include that file. That is exactly what the FILES= line is for actually.
You mentioned the times regarding rc6=7 but somebody else (I think) actually specified that it could increase power usage (or I read it somewhere). At least, so I think.
I've actually changed rc6 to -1. I realise I'm largely specifying defaults but all I can say is that when I didn't, things seemed worse. At least until the current regression gets fixed, I'm inclined to play it safe. If something could make things worse, I'll change it, but as long as I'm just whistling in the wind, I plan to continue inflicting my tunelessness on the elements. I don't claim this is entirely rational but I hope it is at least not irrational.
I can't say the aesthetics of my kernel command line really both me but I do know I could tidy it up, yes.
Whatever works for you is great! I've been curious abot your overheating computer. Is it doing better? Or are you still having to compute in an icebox to get work done?
It hasn't overheated since last Friday. I don't trust it and I'm worried but I haven't seen the problem again so it is a little difficult to diagnose what's wrong... (I assume it shouldn't do that pretty much regardless though I'm not sure about this.)
No it shouldn't do that. It makes me wonder if something physical is wrong with your computer, like degraded thermal paste or a badly seated heatsink. Your problems seem so consistent, that I have to wonder if there is something awry that is being totally overlooked.
Exactly. That's why I'm worried.
Well, it overheated and emergency shut down twice again this afternoon.
I thought the fan was OK because sensors showed it running and I could feel the air being blown out. Now I think it may be the fan after all. I ran some extremely scientific tests which consisted of repeatedly running sensors with my hand by the fan outlet. *Sometimes* hot air is blown out when sensors sees it running. But at other times, the hot air just seems to gently drift out (rather than being blown out) even though sensors reports the fan running. So I suspect my fan is working intermittently and that cannot be good...
Lenovo claim that they cannot reproduce the problem... I think this is because they are trying to trigger emergency shutdown and the passive cooling is actually quite good even when the fan is on the blink. I tried to suggest they concentrate on the fan not working right regardless of shutdown but don't know how much of this message will get through to whoever is actually working on it... They said they planned to leave it running overnight. I'm not sure what this might prove but still...
Basically, they are just suggesting I must have been using it on my lap or something even though I showed them the cooling pad it was on at the time.
Grrrrr.... somewhat frustrated!
This does sound frusturating. I am sorry you are having these issues.
I too don't understand how doing nothing with the computer during an overnight period will prove/disprove anything. It sounds kind of like these things occur when you are actually working on the system, and they sound to be just turning it on and watching it.
It is frustrating. However, telling them (again) that I was using it on a cooling pad on top of a desk and not, say, cocooned in a duvet seems to have persuaded them it was worth looking a bit more closely. They reckon the fan is fine but the thermal paste looked a bit dry so they've scraped that off and replaced it. They say if it happens again to pull the hard drive and send it to Lenovo itself so they can run more extensive tests. But the paste thing sounds plausible to me - especially since the engineer said he'd seen problems with the heat sinks on this model in use by a local council and had replaced those. So I'm hoping that will fix it...
Hopefully I'll get it back tomorrow and can try it out. (Although the real test will probably be later since I think it has to be a warmer room than my house or it doesn't trigger the issue dramatically enough!)
I think they also said something about watching the temperatures of the air coming out of the fan. They said if that air was cool, the heat sink wasn't working. So maybe it isn't that since the air wasn't cool. On the other hand, maybe it should have been hotter again.
Too many unknowns!
Last edited by cfr (2012-11-06 23:04:07)
Well I certainly hope things work out for you here. How much longer is your warranty on the machine? If the warranty is about up, is there some period of time that this service is gauranteed for. I remember once i had my macbook serviced right before the warranty was up (the optical drive stopped working), and when I got it back, the warranty was up in a matter of days, but they told me that the work was guaranteed for another 90 days, so if things got f*cked up again, to bring it back and it would still be covered. Fortunately this was in place because the optical stopped working again within that 90 days.... in fact, I am not even sure that my machine has a working optical at the moment. The machine was pretty well functioning with that exception.
They didn't say and the warranty is up in about 10 seconds. However, I've had enough issues that I think it is probably worth extending it anyway. Apart from this issue, I've already had a new keyboard this year. I haven't yet done this but probably will within the next week. (My invoice is dated 15th November last year so I figure I pretty much have to do it this week!)
I've got it back now and we'll see. I don't yet have my RAM back, but one thing at a time, I suppose... They were actually incredibly nice and brought it back to me at home which is not a service covered by the standard warranty so I am not really complaining about the RAM mix-up. (I think nobody told the engineer it was mine so he saw it but didn't make the connection...) I need the RAM in case it goes back to Lenovo at some point as they'll want the original back in.
Make sure you do get that RAM back. I certainly hope that things are better for you now. Did the service actually provide any kind of fix? Or did they just run it and see nothing wrong, so brought it back?
Usually I do not advocate for the extended warranty, but I think that in your case, it might be something worth getting since these problems seem to be consistent and getting worse over time. I am actually thinking about getting the upgraded warranty for my machine for a year. It simply provides on-site/in-home repair for $29. I have a blotch on my screen, it is not big, but it is definitely there and definitely noticable, and I am not too keen on sending my computer in. I think then, too i would end up paying for shipping to the repair facility, so at that point, I think I might as well just pay that bit to get them to come to me instead. Rediculously, the next warranty step up for my machine in the US, is $279, which is over half the price of the machine itself for 3 years total.
They told me that they removed the dried thermal paste in the heat sink and replaced it. (They also cleaned out the fan but I don't think they think that was the problem.) They aren't sure it will solve it because they couldn't reproduce the problem.
I'll bug them if I don't get the RAM back shortly. Otherwise they'll forget about it and it will disappear... but they do seem to know what they are doing so I have some hope. The fact that they are not a shop probably helps a bit just in terms of how much the people who work for them know.
Also when their entire business is servicing computers, it seems like their reuptation for such is probably much more important, ergo making men more trustworthy.
Got RAM back but laptop shut itself off again. It didn't actually feel that hot. I restarted it almost immediately and sensors on login was giving me 60ishC. So either it managed to cool by almost 40C in about 2 minutes, despite rebooting, or it is shutting down for some other reason or the sensors aren't accurate or... (Critical temperature is 100C.)
On the other hand, I can't think what else might cause this and only ever in my office. I assumed that's because my office is warmer than other places I work and can't think what else it could be...
You do realize that neither the stock nor the mainline kernel is designed for power conservation. Or have you? Performance is their first priority and mainline maintainers are quite stubborn for doing anything for laptop marketI.
Speaking about the actual wattage, has anyone done any real tests? Here's mine on battery right now with mix workload (multiple workspaces, gimp, emacs, cmus with woofers, filezilla, too many terminals and lots of Firefox tabs). I am on Ivy so could be that for such a low average usage but again I am not using Arch stock kernel instead I've custom one with preemption enabled, no *_CGROUPS or NUMA (_*IDLE* enabled of course) and BFS! If you want to save power, move away to anything but the mainline kernel. Look at the -pf.
$ sudo powerstat | sed -n '1,6p;$p' Running for 300 seconds (30 samples at 10 second intervals). Time User Nice Sys Idle IO Run Ctxt/s IRQ/s Fork Exec Exit Watts 03:25:05 2.5 0.0 0.1 97.5 0.0 2 1119 308 0 0 0 8.42 03:25:15 2.7 0.0 0.1 97.2 0.0 1 1111 306 0 0 1 8.41 8.34 Watts on Average with Standard Deviation 0.23 $ acpi -V Battery 0: Discharging, 77%, 07:55:34 remaining Battery 0: design capacity 7709 mAh, last full capacity 7624 mAh = 98% Adapter 0: off-line Thermal 0: ok, 47.0 degrees C Thermal 0: trip point 0 switches to mode critical at temperature 103.0 degrees C Cooling 0: Processor 0 of 10 Cooling 1: Processor 0 of 10 Cooling 2: Processor 0 of 10 Cooling 3: Processor 0 of 10 Cooling 4: LCD 15 of 15
The greatest threat to knowledge is not ignorance - it is the illusion of knowledge!
For the record, it now seems extremely unlikely that my laptop is overheating. It is also looking rather unlikely that there is anything wrong with it.
The current theory is that it has a safety feature which shuts the machine down to protect the hardware if the electricity spikes and that the electricity in my office is obliging. This is, apparently, a known problem. (Not known to me or the local IT people, but known to the institution.)
All the time and energy and they *know* there's something wrong and they don't tell people even though the problem could, presumably, damage machines which do not shut themselves off to protect themselves.
I think it would be time to invest in a nice surge protector. I am glad to hear that it is starting to look like things are okay with your machine. Though I find it curious that it has only been the past month to month and a half that this has been occuring.
When (and how) did you find this information out? And how is it that the IT people were kept in the dark about inconsistent power levels in certain areas of the university. I would think that this would be a key factor in debugging/fixing some of the problems around the school. I am not saying that they should have known, but that it would have been the responsible thing for the school to have informed at least IT about such a situation. I would imagine that if their failure to inform lead to a damaged personal machine, it would not be seen as the liability of the school.
This upsets me, and it is not even my machine... nor a machine in the same country (or continent) as me.
Last edited by WonderWoofy (2012-11-15 04:07:15)
I started to get suspicious because I have only ever had the issue in my office. Last week, when I encrypted my drive, my machine wrote to disk for more than 14 hours solid. I then did all the set up and all the restoring from backup etc. with no issue whatsoever. Lenovo ran the thing for 48 hours straight with no problem. But once in my office...
So I asked the local IT people if they could think of anything other than overheating and explained the issue. They took my laptop yesterday , booted it from a hardware testing CD and ran it for a couple of hours sitting on top of a hot computer in the server room which is definitely hot. (It has a whole bunch of computers, I guess.) No issue. They then ran stress tests for a while in the same place. No problem. Highest recorded temp: 65C.
The head of IT then took my power adapter to a more general IT service to be tested and inspected. They looked at it, they opened the plug to check the fuse, they tested it. No problem. Of course, the guy also explained why he was asking and the other IT person said, "Oh, is that on level 1?" "Yes..." So apparently there was another machine (a Mac), I think, doing just the same thing in an office in my part of the building. Testing showed that the power spikes and when the power spikes, the laptop shut down as a safety measure. Solution: they installed a UPS for that one laptop. They didn't tell anybody else, including the local IT people.
What the local IT people were going to do was to install equipment to monitor the power in my office and see if anything weird was causing my laptop to react. However, that was when this was an extremely-unlikely-but-we-are-getting-desperate-for-theories scenario. Clearly, that theory no longer seems wildly implausible at all. So the current hypothesis is that it is most likely that my machine is also reacting to the spikes in power by shutting itself off. (I'm not sure why it should have just started doing this but who knows what the state of the electricals is and how that might vary?)
I have been told that a surge protector will do no good. (They've given me one anyway but apparently it will not deal with spikes in phase 3 power or something - I didn't understand this bit but the head IT person said he didn't understand it either but the electricals IT person showed him with graphs on the whiteboard. So it must be true.)
The current plan is to try to get the UPS from central IT which was provided for the other laptop since that person's discipline has since moved to another floor of the building so the UPS probably went back to central IT. They are going to ask during a meeting tomorrow about this possibility. Otherwise, they are planning to order a UPS for me on Monday.
I'm somewhat surprised that the abrupt shutdowns haven't screwed my data. I've lost work but not seen fs corruption. I didn't take my laptop today. I'll need it next week but I do not plan to plug it in in my office until I have a UPS.
I'm pretty annoyed, to be honest. I've wasted hours on this and got incredibly stressed about it. The IT people have also wasted a (smaller) amount of time. I hoped to have a new draft of an article I'm working on written by the end of reading week but that didn't happen due to all of this. And they *knew*. It would be different if this was an unknown problem just discovered - of course, one could understand that. But there's a known problem which can cause this type of problem and presumably could well damage equipment without these sorts of safety shutdown features and they don't tell anybody.
And, no, I doubt very much indeed that a damaged personal machine would be considered the institution's liability. (Maybe if it was a student's machine and the student was required to use it or something but even then...)
I can't believe it does much good to their equipment either.
The local IT people did know there was an issue with some burnt out devices but that was about eight years ago and only got mentioned as a outside possibility when other diagnostics turned up no result.
What gets me is that the proposed solution will only solve the issue for me - not for anybody else in my part of the building on level 1.