You are not logged in.

#1 2017-05-15 21:00:31

Horgix
Member
Registered: 2017-05-15
Posts: 3

CPU Hardware Error - Kernel 4.10.13-1

Hello everyone,

tl;dr: firmware broke when upgrading from kernel 4.10.11-1 to 4.10.13-1; i7-7700HQ CPU. How can I get this fixed?

This morning I upgraded my workstation and got a kernel update from the 4.10.11-1 to 4.10.13-1:

[2017-05-15 09:24] [ALPM] upgraded linux (4.10.11-1 -> 4.10.13-1)

After rebooting, I had some random freezes that forced me to hard reboot; I was unable to get X11 to start. Normal shutdown was failing due to not being able to stop a lot of things, etc. (can provide details if necessary, but as dirty photos of my screen since, well, I couldn't do anything on my system anymore).

Taking a look at dmesg, I found a few Hardware Errors that weren't there before :

May 15 10:49:17 myhostname kernel: smpboot: CPU0: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (family: 0x6, model: 0x9e, stepping: 0x9)
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110a
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: TSC 0 ADDR fef1ffc0 MISC 788000c086
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1494838150 SOCKET 0 APIC 0 microcode 42
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 7: ee2000000040110a
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: TSC 0 ADDR fef200c0 MISC 388000c086
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1494838150 SOCKET 0 APIC 0 microcode 42
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ee2000000040110a
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: TSC 0 ADDR fef1ff40 MISC 788000c086
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1494838150 SOCKET 0 APIC 0 microcode 42
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ee2000000040110a
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: TSC 0 ADDR fef1cec0 MISC 4388000c086
May 15 10:49:17 myhostname kernel: mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1494838150 SOCKET 0 APIC 0 microcode 42

Note: this paste is from after getting back to a working system, from journald.

A friend of mine suggested that I install intel-ucode , I followed https://wiki.archlinux.org/index.php/Microcode and set it up:

[2017-05-15 10:29] [ALPM] installed intel-ucode (20170511-1)
$ cat /boot/loader/entries/arch.conf
title	 Arch Linux
linux	/vmlinuz-linux
initrd     /intel-ucode.img
initrd	/initramfs-linux.img
options [...]

However, that didn't solve anything, and I was seeing no microcode update in dmesg, only this:

[    1.364556] microcode: sig=0x906e9, pf=0x20, revision=0x42
[    1.364607] microcode: Microcode Update Driver: v2.2.

I ended up downgrading my kernel by reinstalling 4.10.11-1 from pacman cache:

[2017-05-15 10:48] [ALPM] downgraded linux (4.10.13-1 -> 4.10.11-1)

And since then it works as it did before.
Note: I never had kernel 4.10.12-1 on this machine so am unable to tell if it got introduced by 4.10.12-1 or 4.10.13-1, but can test it if needed.

So, now that I got a working system back, I'm not in a hurry anymore; so my question and the reason of this post is:
how do I get this fixed definitely and in upcoming kernels?

If I should report  that somewhere else, feel free to point me to it and I will!
Also, feel free to correct any mistake I could have make in my analysis, I'm by no mean an expert.

Last edited by Horgix (2017-05-15 21:01:18)

Offline

#2 2017-05-15 21:49:51

loqs
Member
Registered: 2014-03-06
Posts: 17,327

Re: CPU Hardware Error - Kernel 4.10.13-1

https://bbs.archlinux.org/viewtopic.php … 1#p1698801 patch has been applied to 4.11.1 https://git.kernel.org/pub/scm/linux/ke … a16297e6f1 not seeing it in 4.10.16.
You could bisect between 4.10.11 and 4.10.13 to find which commit causes the issue.  Or just build 4.11.1 and see it it has been resolved there  (  if you have kernel modules not provided by the linux package you would need to rebuild those as well )
Edit:
Oh and welcome to the arch linux forums Horgix.

Last edited by loqs (2017-05-15 21:50:37)

Offline

#3 2017-06-18 02:54:05

Horgix
Member
Registered: 2017-05-15
Posts: 3

Re: CPU Hardware Error - Kernel 4.10.13-1

Thanks for the welcome loqs, this was indeed my first post on Arch Linux forums.

I just tried kernel 4.11.5-1 without success, sadly.
I guess I'll try to stick to 4.10.11-1 for now and follow the laptop thread that you linked (Dell XPS 15 9560 (Early 2017)) watching for some news.

However I wish this could be fixed and am willing to help as I can, since I really want and need an up-to-date kernel.
I'll try to find the time to bisect and try to find the commit that caused this but I don't get that much opportunities to recompile and test my kernel since it is my work laptop.

Offline

Board footer

Powered by FluxBB