You are not logged in.

#1 2013-06-27 23:00:48

eduardosm
Member
Registered: 2011-02-26
Posts: 18

Suspend instead of power off when CPU overheats

As I many times have experienced, when the CPU of my laptop reaches 85ºC, the kernel triggers a shut down to avoid damage to the hardware.
I would like to know if it could be possible to configure it to suspend instead of shutting down. Suspending will also allow the CPU to cool down with the advantage of being faster and can avoid a potential unsaved data loss (and then the user can wake up the computer in the state it was).

Offline

#2 2013-06-29 08:55:17

Strike0
Member
From: Germany
Registered: 2011-09-05
Posts: 1,429

Re: Suspend instead of power off when CPU overheats

Which CPU is that? Do you know why it is overheating and have you tried to control that it is happening in the first place?

Offline

#3 2013-06-29 10:23:43

eduardosm
Member
Registered: 2011-02-26
Posts: 18

Re: Suspend instead of power off when CPU overheats

Strike0 wrote:

Which CPU is that? Do you know why it is overheating and have you tried to control that it is happening in the first place?

It is an Intel Core2 Duo T5800, which is configured with an ondemand governor and a frequency range of 1.2GHz-2.0GHz.
I know that overheating is due to dust in the fans and I can clean them. However, they will get dust again, so I would like that, when this happens, the computer goes into suspension mode instead of shuting down.

Offline

#4 2013-06-29 17:48:11

Strike0
Member
From: Germany
Registered: 2011-09-05
Posts: 1,429

Re: Suspend instead of power off when CPU overheats

Good you know that dust is the cause .. Other users have had problems with ondemand recently, as it is not supported by the new intel drivers. Maybe you check for your cpu and try other cpu govenors (but I think it does not use the intel_pstate module).
Your idea to let the machine suspend in such cases is understandable. But in fact I am not sure those emergency shutdowns are fully controlled by the kernel only. Often the bios controls such too, some bios even have settings to let you change the max temperature. Just imagine your system overheats because of a zombie process eating up the CPU. It might happen when you are not at the machine and the suspend might hang due to the rogue process, In this case the cpu (and the notebook) would be garbage.

An alternative for you might be to install lm_sensors and setup heat thresholds below 85C (e.g. at 78C). There are tools for alarming and I am sure you could also let the machine suspend automatically at e.g. 78C. When you do it this way, the hardware protection is still there in case anything fails. Plus you know earlier when it is time to get the dust out. My suggestion for you would be to have a look at that first. A start: https://wiki.archlinux.org/index.php/Lm_sensors

Offline

#5 2013-06-29 18:27:47

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: Suspend instead of power off when CPU overheats

The intel_pstate driver is only for Sandy Bridge processors at the moment.  Though Ivy Bridge support can be had with a simple one liner patch.  But the older Core2Duo processors may never be supported with the intel_pstate driver.

Strike0, when you say that others have had problems with ondemand recently, are you really talking about ondemand, or are you talking about the issues people have been complaining about with the intel_pstate driver automatically taking over control of their processor?

Offline

#6 2013-06-29 22:28:08

eduardosm
Member
Registered: 2011-02-26
Posts: 18

Re: Suspend instead of power off when CPU overheats

Strike0 wrote:

Good you know that dust is the cause .. Other users have had problems with ondemand recently, as it is not supported by the new intel drivers. Maybe you check for your cpu and try other cpu govenors (but I think it does not use the intel_pstate module).
Your idea to let the machine suspend in such cases is understandable. But in fact I am not sure those emergency shutdowns are fully controlled by the kernel only. Often the bios controls such too, some bios even have settings to let you change the max temperature. Just imagine your system overheats because of a zombie process eating up the CPU. It might happen when you are not at the machine and the suspend might hang due to the rogue process, In this case the cpu (and the notebook) would be garbage.

An alternative for you might be to install lm_sensors and setup heat thresholds below 85C (e.g. at 78C). There are tools for alarming and I am sure you could also let the machine suspend automatically at e.g. 78C. When you do it this way, the hardware protection is still there in case anything fails. Plus you know earlier when it is time to get the dust out. My suggestion for you would be to have a look at that first. A start: https://wiki.archlinux.org/index.php/Lm_sensors

I don't think that the ondemand is a problem. I have a widget which shows the current CPU speed and I can see how it changes depending on the load. I think that if the shutdowns were not triggered by the kernel, the computer would just shut down instantaneously, but I can see how the Xorg system closes its windows, then I get the console with the steps of killing processes, umounting filesystems... And I could also find this message on the system log:

critical temperature reached(85 C),shutting down

So, the kernel is configured to shutdown at 85ºC. I suppose that there is also a higher hardwere limit which would shut down the computer without OS intervention (I cannot confirm it because the kernel never failed doing that job). And I don't think that the suspension should hang because a process eating CPU, because I can perfectly suspend the computer while I am doing a CPU-intensive operation, like compiling or compressing. I already know about lm_sensors and that I could set up an alarm, but it still would be nice if the computer entered into suspension mode instead of shuting down. I don't have always an airgun available to remove the dust, and my laptop gets dust quite often.

Offline

#7 2013-06-29 23:31:24

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: Suspend instead of power off when CPU overheats

Is 85 what sensors lists as critical, too? I assume this must be hardware-specific (or firmware) even if it is something the kernel is picking up on.  That is, I certainly don't think that the kernel is set up to do this *generally* because people certainly report higher temperatures at times.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#8 2013-06-30 00:28:53

fledermann
Member
From: Bielefeld, Germany
Registered: 2013-06-24
Posts: 49

Re: Suspend instead of power off when CPU overheats

Yes, the critical temperatures vary and can be looked up in the processors' data sheet.
Correct me if I'm wrong, but the kernel shuts down because it receives a message from the hardware telling it that a critical temperature is reached. You can turn this safety shutdown off by adding

thermal.nocrt=1

to the boot options. Then you can use your own script to trigger a suspend. Be aware that if your script is not working properly, you run a higher risk of hardware damage. Still, the cpu should be fine because it simply freezes and halts all execution as a last-resort method.

Offline

#9 2013-06-30 15:01:52

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Suspend instead of power off when CPU overheats

I would say that it is the firmware/bios that is doing a hard shutdown in order to protect the hardware and not the kernel, so I suppose suspending is not really an option unless you can configure some daemon to suspend a few degrees before the bios/fw triggers a hard shutdown.

On the other hand, cleaning the fan and heatsink is a better solution since the machine will most probably run cooler and quieter and hopefully will not trigger the hw safeguard.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#10 2013-06-30 19:38:52

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: Suspend instead of power off when CPU overheats

@R00KIE, this is what I thought at first too.  But the OP seems to indicate that it is not a hard shutdown, but an actual initiation of a regular shutdown, as though he were to push the power button.  I think this is actuallynot the firmware, but the kernel reacting to the machine reaching a temp that is specified by the firmware as being critical.

Offline

#11 2013-06-30 21:11:59

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Suspend instead of power off when CPU overheats

@WonderWoofy, I didn't know the kernel would do that, that's the first time I've heard of it, in the past I have only seen reports of machines just turning off without warning.

If the shutdown is related with some acpi event, then a script or systemd itself must be initiating the shutdown, so it should not be too hard to change the behavior (override the script or configure systemd to suspend instead of shutdown).

Changing the default behavior should not be the solution to the problem but sure is an interesting change to make just in case a thermal critical point is reached.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#12 2013-06-30 22:22:35

eduardosm
Member
Registered: 2011-02-26
Posts: 18

Re: Suspend instead of power off when CPU overheats

Thanks everyone for their suggestions. I will look into "thermal.nocrt=1" (and clean my computer).
I can confirm that it is the kernel that triggers the shutdown.
In the file drivers/thermal/thermal_core.c from the kernel source you can find the following function:

static void handle_critical_trips(struct thermal_zone_device *tz,
				int trip, enum thermal_trip_type trip_type)
{
	long trip_temp;

	tz->ops->get_trip_temp(tz, trip, &trip_temp);

	/* If we have not crossed the trip_temp, we do not care. */
	if (tz->temperature < trip_temp)
		return;

	if (tz->ops->notify)
		tz->ops->notify(tz, trip, trip_type);

	if (trip_type == THERMAL_TRIP_CRITICAL) {
		dev_emerg(&tz->device,
			  "critical temperature reached(%d C),shutting down\n",
			  tz->temperature / 1000);
		orderly_poweroff(true);
	}
}

I found it looking for the "critical temperature reached" message.

Offline

#13 2013-06-30 23:17:21

Strike0
Member
From: Germany
Registered: 2011-09-05
Posts: 1,429

Re: Suspend instead of power off when CPU overheats

Good find eduardosm. This seems to confirm where the shutdown is being called.
So now you can
(1) try to patch the "orderly_shutdown" to a suspend (not really) or
(2) disable it with the "thermal.nocrt=1" and use your own script to suspend or
(3) just use a lower threshold in your own script (post #6) and leave the rest managed by kernel.org

Note that the parameter quoted by fledermann in #8 to my google-fu disables all critical thermal thresholds. It is not solely for the cpu. Any script you deploy in conjunction with that parameter better takes care of other heat-vulnerable hardware too.

edit for wording

Last edited by Strike0 (2013-06-30 23:38:13)

Offline

Board footer

Powered by FluxBB