You are not logged in.

#1 2020-06-26 06:21:43

tsdh
Member
From: Germany
Registered: 2014-01-07
Posts: 47

ThinkPad L580 randomly power-offs

I have a problem with my ThinkPad L580 laptop.  Sometimes it power-offs randomly as if someone plugged the power supply although both a fully charged battery and AC cable are attached.  This doesn't happen very often; maybe once a day or even less.

After turning it on again, where I can see that fsck replayed stuff from the ext4 journal, I cannot find any information about the issue in the "journalctl" of the previous boot.  The minutes and seconds before the unclean shutdown are free from any errors or warnings.

I've read that overheating may result in such emergency shutdows.  But how would I know if that's really the case given that nothing is logged?  And I wouldn't say that those power-offs always happen when the computer is under heavy load.  The few last times it happend, I've found it shut off after coming back from a short coffee break.  It's my work laptop where I compile our large codebase dozens of times a day, and usually the temperature sensor doesn't rise above 80°C while doing that.  The fan also works fine, AFAICT.

This is the output of "sensors" from the lm_sensors package which I have installed and the lm_sensors.service is enabled:

❯ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +34.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +34.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +32.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +33.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +33.0°C  (high = +100.0°C, crit = +100.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM
temp1:        +36.0°C  
temp2:         +0.0°C  
temp3:         +0.0°C  
temp4:         +0.0°C  
temp5:         +0.0°C  
temp6:         +0.0°C  
temp7:         +0.0°C  
temp8:         +0.0°C  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +36.0°C  (crit = +127.0°C)

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +30.0°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:          11.95 V  

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +31.0°C  

nvme-pci-0300
Adapter: PCI adapter
Composite:    +32.9°C  (low  = +109.8°C, high = +109.8°C)
                       (crit = +79.8°C)
Sensor 1:     +40.9°C  (low  = +109.8°C, high = +109.8°C)
Sensor 2:     +32.9°C  (low  = +109.8°C, high = +109.8°C)

I have no real clue about lm_sensors because I never felt the need to do anything special with it but looking at that output I ask myself if it is alright that "high" and "crit" are always the same and that at "Composite" for "nvme-pci-0300" "crit" is actually lower than "low" and "high".

So maybe I need to adjust those values?  If so, how would I know which values are sensible?

Are there other possible causes to the problem next to overheating?  And if so, how would I debug those?

The system is a Lenovo ThinkPad L580 with an up-to-date Arch install.

Thanks a lot for any pointers!
Tassilo

Offline

#2 2020-07-02 15:17:31

tsdh
Member
From: Germany
Registered: 2014-01-07
Posts: 47

Re: ThinkPad L580 randomly power-offs

Today an Xwayland process ran wild and constantly used 100% of one core.  Screen, mouse and keyboard were completely blocked.

I was able to SSH into the machine from another computer.  The Xwayland process could not be killed, not even after killing parent processes like gnome-session.

I initiated a reboot and even after the Reboot target has been reached, the Xwayland process still lived and deferred the shutdown. See https://ibin.co/5S37lE6Vxy8d.jpg

After waiting for some minutes, I did a hard poweroff.  So maybe that happened the other times too heating up the system until a emergency shutdown.

Offline

#3 2020-07-02 19:48:01

Bevan
Member
Registered: 2009-09-08
Posts: 57

Re: ThinkPad L580 randomly power-offs

Since nobody else answered so far, I will answer although I don't really feel qualified.

I have an L430, so way older than your machine, and I had issues with sudden power-offs last summer. Turned out it was caused by thermals, i.e., every time a CPU core went above 84°C or so the machine would just power off. There are a couple of reasons why I am hesitating to compare this situation to yours:
* On my machine, these 84°C or thereabouts were shown as critical temperature when running `sensors`
* The problem resolved itself later, either by a BIOS update or with a newer kernel. Now, the CPU just throttles down severely when reaching that temperature. There are no sudden shutdowns anymore.

The problem for me showed up when utilizing the CPU (e.g., compiling something) and at the same time using the iGPU (e.g., using VAAPI to decode a video).

Since the severe thermal throttling is still a pain I am now running thermald which throttles the CPU more gracefully when reaching high temperatures.

In your situation I would first of all try to figure out if this is really a thermal issue. Try to provoke it by utilizing CPU and iGPU at the same time and observing the core temperatures. If thermals are the issue, thermald may be the way to deal with it. But maybe your problem is an entirely different one, then thermals are just a red herring....

Offline

#4 2020-07-05 08:00:27

tsdh
Member
From: Germany
Registered: 2014-01-07
Posts: 47

Re: ThinkPad L580 randomly power-offs

Ok, I've now ran the stress tool inside s-tui straight for an hour with the performance Intel p-state governor enabled.  All 8 cores were at 100% utilization and the fan ran at its max the whole time.  Sometimes the temperature sensors reported a "red" temperature for a second before down-throttling happened and got them back in normal thermal regions.  The system stayed responsive and nu shutdown happened.

This test put far more stress on my system than my normal daily usage, so I guess at least CPU overheating is off the table as the culprit for my sporadic issue.

As far as I know, the stress tool doesn't put stress on the GPU but honestly neither does my daily usage.  But I'd like to test that, too.  Is there some tool stress for testing the (Intel) GPU?

Offline

#5 2020-07-05 08:18:00

Bevan
Member
Registered: 2009-09-08
Posts: 57

Re: ThinkPad L580 randomly power-offs

Sounds like the thermal management works as it should.

To test additional GPU load, you may use one of these benchmark applications: https://wiki.archlinux.org/index.php/Be … g#Graphics
I only know glxgears, glmark2 and vkmark firsthand and can say that glxgears is a bad candidate as it causes barely any load on the GPU.

Offline

#6 2020-07-05 09:02:19

tsdh
Member
From: Germany
Registered: 2014-01-07
Posts: 47

Re: ThinkPad L580 randomly power-offs

I'm runing all the tests in the intel-gpu-tools package (around 30 minutes so far, maybe 20% still to go).  Also some WebGL benchmark in the browser.  Still the issue hasn't been triggered yet.

Offline

#7 2020-07-05 18:20:55

tsdh
Member
From: Germany
Registered: 2014-01-07
Posts: 47

Re: ThinkPad L580 randomly power-offs

Oh, all tests of intel-gpu-tools failed because they need superuser privileges.  I've let them run for 10 hours now (around 15% of the tests done) but the system stayed stable.

So I guess overheating in off the table.

Offline

Board footer

Powered by FluxBB