You are not logged in.
Hi!
I'm (or, better "was") running Arch on Thinkpad X1 3rd gen, everything seemed to worked properly until I got system freezes a couple of times, and after running several diagnostics tests and checks I'm out of ideas what can be the issue of these freezes.
In chronological order: the first freeze occurred in June. Just suddenly stopped responding (and the music that was playing got caught in a loop, 1 or 2 last seconds). Ctrl+Alt+F{1-8} didn't work, REISUB didn't work, nothing really worked. I had to resort to hard reset, just pushing the power button for 5 seconds. Naturally, after that I couldn't boot cause fsck complained about the root filesystem as it wasn't unmounted properly (in fact, both '/' and '/home' turned into garbage). I ran fsck from a Live USB (MX Linux) and it just put *everything* into lost+found. Both on '/' and '/home'.
Anyway, I tried figuring out what caused the crash and the only diagnostics test that showed some signs of faulty behavior was a CPU stress test. Since I had a dualboot with Win, I ran Aida and it showed that the CPU was throttling all the time. With a bit of help from a friend, I disassembled the laptop, removed the old thermal paste, applied a layer of a new one, assembled it back and ran Aida again. It showed 0 throttling, so I was pleased with that and thought my bad luck had ended.
I wish that was true. Around three weeks into installing Arch for the second time (weirdly enough, the first crash also happened roughly after three weeks), it froze again. Same pattern, same results.
As you can guess, I can't run dmesg, so I have no idea what's causing this. I thought it was the throttling issue again, but I ran all the tests I could possibly imagine and none show any signs of worry. memtest86, SMART-status, CPU (using both Lenovo's UEFI diagnostic tool and Aida on Win), everything looks good. I now am seriously doubting that throttling was the issue the first time.
Lights weren't flashing, so it didn't seem like a kernel panic. Also, funny thing: Fn+Esc switched the light on Fn key back and forth in the frozen state (FnLk functionality, when you can press, say, F1 and it will behave as if you pressed Fn+F1, and vice versa). Fn+Space dimmed and switched on the keyboard back-light. But neither Ctrl+Alt+F{1-8} nor REISUB worked. Mouse cursor also didn't move, of course.
I wasn't running anything heavy at the moment of crashes; just coding and firefox in i3.
Any ideas on how to find what's causing this? By the way, after the first crash MX Linux (running from a USB drive) also froze 3 or 4 times. However, REISUB worked one or two times.
Thanks, any help would be much appreciated.
PS: the only two things I can think of are: intel microcode (I set it up as described on Arch Wiki, but most likely I didn't need it -- I never bothered to check); and two, I also didn't bother to tweak fan settings. So, it could've been overheating. Quite unlikely, because, again, I wasn't running anything resource-heavy at those times, but who knows.
Also, I was booting in EFISTUB way: just feeding linux kernel to UEFI as if it was an UEFI executable.
Last edited by gsarret (2018-10-22 05:54:56)
Offline
The most general approach:
Mount your Arch partition (if you have few: the one on which “/var/log/journal/” is present).
Use journalctl to see the logs. The very same Arch Wiki page contains explanation on how to use journalctl.
A laptop failing a stress test is neither good nor bad sign. It’s better if it is not, it’s bad if it’s failing immedietely, but if it just fail after some time… you know, those machines are not really built to handle high load for extended periods of time.
Paperclips in avatars? | Sometimes I seem a bit harsh — don’t get offended too easily!
Offline
If the system is unable to flush logs to disk, use netconsole (you need another running linux in your lan)
https://wiki.archlinux.org/index.php/Netconsole
Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !
Offline
Thanks for the quick replies!
I do have a USB stick with Arch live, but it's no use since I cannot mount any of the partitions, they are all borked after the hard reset (holding the power button for 4-5 secs) as they weren't unmounted. So, each crash essentially means reinstalling the whole OS from scratch for me (I'm backing up my data, but it's still a pain!). I would've run dmesg, journalctl and everything else otherwise...
When I was talking about stress tests, I really meant just 3-5 minutes, not a real stress test. And failing during those 5 minutes *is* a bad sign I think.
As for logging through lan, I know it would've been super helpful but I don't have another linux machine, sorry.
Elsewhere people suggest that most frequent reason for such crashes are graphics drivers, but X1 doesn't have a discrete GPU. It uses Intel HD graphics.
Offline