You are not logged in.
Hi, I've been trying to install Arch on my shiny new Skylake PC and among other problems I'm seeing the following in my journal on startup:
I'm running an Intel i6600K on an MSI z170a M7 motherboard with the latest BIOS drivers (MS-7976 1.60). Now I have overclocked the processor to 4.3GHz, I'm not sure if this could be affecting this, but it is a rock solid overclock and shouldnt be causing any instability.
I'm also seeing the following MCE errors when at the bash prompt:
This seems to be causing programs to be killed every now and then, especially any kind of graphical interface. I've seen the suggestion that I should recompile glibc with "--enable-lock-elision", and I will give that a try, but I hadnt seen any of these MCE errors mentioned anywhere else so I thought I'd report it anyway.
Anyone got any idea bout the Firmware Bug?
Moderator edit [ewaller] Changed img tags for over sized images to url links
Last edited by ewaller (2015-09-27 17:42:08)
Offline
I have 6700K on Z170-HD3P - no problems. Tried from USB stick plugged to USB 2.0 ports and 3.0 ports. BTW - Z100 from Intel does not support Legacy USB aka 2.0, so not all motherboards could boot systems plugged to usb.
Offline
Try kernel 4.2.1 from testing. MCE usually means unstable overclock. Reset to stock setting while troubleshooting.mmalso install the intel-ucode package and enable it per the wiki.
Offline
I have installed intel-ucode packages. For Skylake there is no firmware upgrade microcode.
Offline
I also have intel-ucode installed and enabled, you can see it in the second screenshot I posted. I havent tried with a newer kernel, I'll give it a go. What about the "Firmware Bug"?
Offline
Dunno. Have you tried googling it?
Offline
1. You are only wasting time before you boot at least once on stock voltage and frequency and confirm that issues don't go away. Especially this "invalid frequency" thing.
2. See the official Intel fan club thread. Your symptoms are different, but maybe some of the workarounds would work on your machine.
Last edited by mich41 (2015-09-27 12:47:49)
Offline
Removing the overlock did stop the Firmware Bug message from showing up, however the MCE errors are still popping up just as frequently (and causing programs to shut down). I remember trying to disable c-state without any effect on my MCE error rate. I can give that another go but atm I cant even recompile glibc without the compilation being terminated from MCE errors.
Offline
In fact... as I was rebuilding glibc (makepkg -ci), the "xz" process was unceremoniously killed and now I dont have a system to boot into anymore. Seems like I may have to do a clean install. At least I know where the Firmware Bug errors were coming from. The MCE errors is currently thats killing my system though, and they seem unrelated to the overclock.
Offline
Now you reminded me that back in the times of crashing Haswells I developed a simple hack which disables TSX in libc without recompilation. It's here, just make sure that you are modifying the right bytes or your system may not boot anymore.
If disabling C-states doesn't work then maybe no speedstep or this "fixed oc mode" would?
EDIT: Wait, are you saying that installation CD works without MCEs?
Last edited by mich41 (2015-09-27 15:13:52)
Offline
No, installation USB is also getting MCE's at the same frequency as the installed system.
Offline
Apparently I do have stepping "3" whatever that means?
Offline
Thanks everyone for the help, I'm going to give Arch a rest for a bit, though I'm not giving up! I'll keep playing around with stuff. Consider this thread closed.
Offline
It's not an Arch issue. I suspect any distro will show you the same stuff provided you're running the same kernel version.
Offline
Has anyone found any solution/workaround yet?
The people with Broadwells are reporting stable machines now, after Intel released microcode updates for a few CPUs. I'm mentioning it, because the issues with Broadwell sound very simmilar, but not exactly like those with 6600K.
What I've found is if I boot with nomodeset the system is no longer crashing. Lots of segfaults continue to be logged.
What I can't get is the error message when the machine locks up. Without KMS I can't crash the system - with KMS and graphical environment, there's no console output when everything freezes. mcelog is not logging any MCEs in the journal. Booting with mce=3 has no effect.
The 2 ways that reliably crash immediately the system are a glibc compilation and a video from youtube, played with mpv. Playing the same video with vlc or totem doesn't freeze the system.
On the other hand - I can play steam games for days and the machine never freezes.
Offline
Did you try to use the updated Skylake microcode you can find in the kernel bugzilla bug report?
https://bugzilla.kernel.org/show_bug.cgi?id=103351
If you do, and it fixes things, please report back.
Offline
I apologise. The skylake microcode update is not linked on that thread, only Broadwell and Broadwell-H. It is probably best to ask your motherboard vendor for a BIOS update with new microcode.
Offline
Yes, this is most probably a CPU issue and it will most probably be
solved (more like worked around, actually) in microcode. I think there
aren't any Skylake microcode updates released by Intel for the public.
If they released any to the MB vendors in order to include them in their
BIOS updates, I would really like to know of the microcode version for
Skylake that brings the fixes. With the latest BIOS update from
Gigabyte, if I remember correctly, /proc/cpuinfo reported microcode
version 0x39. I will have to check when I get home.
Offline
I have seen a skylake 0x506e3 (model 94), rev. 0x3a in the field.
However, a commit in coreboot (http://review.coreboot.org/#/c/11056/), leads me to believe this microcode update could be reported as either rev 0x39 or rev 0x3a depending on the processor operating mode...
So it well could be that you already have the latest skylake microcode seen in the field.
And that crap with the microcode revision number is going to be a major pain. It is detected as a failure to apply an microcode update by a lot of stuff out there (except by patched coreboot, it seems), *and* it looks like a "microcode downgrade attack" too, so it is going to false-positive some security stuff.
Last edited by hmh (2015-10-09 14:09:49)
Offline