You are not logged in.
I posted on the bug I'm guessing you made earlier in the thread, but I don't think any other bug tracker entries have been created.
No, there are no comments by other people on my bug. This is my bug, for the record: https://bugzilla.kernel.org/show_bug.cgi?id=103361
Offline
https://bugzilla.kernel.org/show_bug.cgi?id=103351 Here's the other bug, then.
Avatar by Ditey: https://twitter.com/phrobitey
Offline
Seems like im also affected by same issue (MSI GE62 2QL Apache with i7-5700HQ),
random kernel panics or hang with MCE when running VM on Windows host.
I didnt disable any of CPU features in bios (aside from SecureBoot).
I didnt try Arch linux so far, but Ubuntu 14.04.X crashes at random, regardless of kernel version (tried 3.16, 3.19 and 4.2-unstable).
Ubuntu 15.04 crashes very early when trying to install it.
I also have OpenSUSE 13.2 in dual boot - seems to be rock solid (but latest Tumbleweed x86_64-Snapshot20150909 crashes just like Ubuntu 15.04 - early in install).
Last distro i tried was Fedora 22 which seems stable with its original kernel (i think 4.0.4?) and also after update to 4.1.6.
I really dont know what to make from this, but maybe someone could see, what have OpenSUSE 13.2 and Fedora 22 in common, so that they run well.
Offline
Seems like im also affected by same issue (MSI GE62 2QL Apache with i7-5700HQ),
random kernel panics or hang with MCE when running VM on Windows host.
I didnt disable any of CPU features in bios (aside from SecureBoot).
well, something is not right here. I'm on GS60 2QE with exact same cpu and stable.. Somehow might be motherboard related?
Creeds matter very little… The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true. So I elect for neither label. - James Branch Cabell
Offline
What is your CPU's stepping?
Offline
How can I see accurately, is this enough?
Socket [0] - [physical cores=4, logical cores=8, max online cores ever=4]
TURBO ENABLED on 4 Cores, Hyper Threading ON
Max Frequency without considering Turbo 2792.74 MHz (99.74 x [28])
Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 35x/35x/35x/35x
Real Current Frequency 3493.07 MHz [99.74 x 35.02] (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp VCore
Core 1 [0]: 3490.96 (35.00x) 9.66 12.8 3.12 71.6 51 1.1357
Core 2 [1]: 3493.07 (35.02x) 1 47 3.3 48.5 50 1.1359
Core 3 [2]: 3490.77 (35.00x) 1.01 0.883 1 96.8 50 1.1343
Core 4 [3]: 3490.47 (35.00x) 22.9 2.33 20.6 47.3 50 1.1349
Creeds matter very little… The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true. So I elect for neither label. - James Branch Cabell
Offline
Look at /proc/cpuinfo . For me, it is
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 71
model name : Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
stepping : 1
microcode : 0xd
cpu MHz : 2700.843
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt
bugs :
bogomips : 5389.77
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
i.e. stepping 1 with microcode 0xd.
Offline
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 71
model name : Intel(R) Core(TM) i7-5700HQ CPU @ 2.70GHz
stepping : 1
microcode : 0xd
cpu MHz : 3409.171
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt
bugs :
bogomips : 5389.90
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
Creeds matter very little… The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true. So I elect for neither label. - James Branch Cabell
Offline
That's certainly very interesting. As far as I know, both the GE62 and the GS60's motherboards are both based on the Intel HM87, so there should not be a significant difference...
Offline
Great findings gusy! I just got an MSI GS60 2QE Ghost Pro which comes with Intel Core i7-5700HQ and the system wasn't stable and randomly freezes.
After booting with:processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll
the system is stable.
Deluxe wrote:Seems like im also affected by same issue (MSI GE62 2QL Apache with i7-5700HQ),
random kernel panics or hang with MCE when running VM on Windows host.
I didnt disable any of CPU features in bios (aside from SecureBoot).well, something is not right here. I'm on GS60 2QE with exact same cpu and stable.. Somehow might be motherboard related?
Something's not right indeed
Offline
Thing is both CPU details posted by seqizz are showing about 3.4GHz current clock speed (what should be max for this CPU i think),
so he may be running without SpeedStep or disabled c-states, like mich41 said. I was running with SpeedStep enabled...
Now i tried running Ubuntu 14.0.4 with kernel 4.2 with disabled c-states and idle=poll as mich41 posted...
so far seems to be stable like that, but ofcourse at price of running CPU at full speed at all times which is not that great for notebook.
But it isnt probably caused just by changing clock speed (my OpenSUSE 13.2 is stable even with such features)...
I really wonder if its really caused by BDD83 errata (http://www.intel.com/content/dam/www/pu … update.pdf)
as guys from VMware forums pointed out (https://communities.vmware.com/thread/516189).
EDIT: It just crashed anyways... but took more time than usual.
Last edited by Deluxe (2015-09-18 23:33:21)
Offline
Nice find, it didn't occur to me that Intel may have already published Broadwell errata specs. However, there's still no details of these bugs nor explanation how Windows and several distributions manage to avoid them.
And I'm still not sure whether to bet on erratum BDD83 or BDD86.
Last edited by mich41 (2015-09-19 10:43:25)
Offline
Unfortunately, both BDD83 and BDD86 are marked as "No Fix" (although that only refers to the fact that there is no attempt to fix the errata in a future stepping). The errata also says "Workaround: It is possible for the BIOS to contain a workaround for this problem.", so I guess it is not entierly impossible that a firmware- or ucode-based workaround might be created.
I am betting that other distributions avoid the bug with sheer luck from nondeterminisctic compiler optimization, but not sure about Windows...
Last edited by kris7t (2015-09-19 16:15:22)
Offline
Thing is both CPU details posted by seqizz are showing about 3.4GHz current clock speed (what should be max for this CPU i think),
so he may be running without SpeedStep or disabled c-states, like mich41 said. I was running with SpeedStep enabled...
A little note to that, I've not disabled anything (at least on purpose). I couldn't use this laptop if fans goes crazy or battery drains. It's on max speed because TLP setting up "performance" governor on AC (and "powersave" on battery). Here is the kernel and parameters:
> uname -a
Linux rocket 4.1.6-1-ARCH #1 SMP PREEMPT Mon Aug 17 08:52:28 CEST 2015 x86_64 GNU/Linux
> cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux root=UUID=10054c6a-90aa-4932-b442-779984ab48eb rw rootflags=subvol=subvol-root quiet video=eDP-1:1920x1080@60 cryptdevice=/dev/md126p7:vgshit resume=/dev/vgshit/swap
Creeds matter very little… The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true. So I elect for neither label. - James Branch Cabell
Offline
I have 5675C and have this issue. My BIOS (GA-Z97-HD3) does not have Speedstep toggle, so I disable Turbo Boost, C1E, C3, C6/C7 and it still hangs when compilling stuff.
processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll
also hangs, can't even configure glibc.
I see people saying things about exotic instructions, but its strange that nobody mentioned that these CPUs are affected by HLE bug (BDD50, also BDD51).
It is very easy to notice as alot of software crashes and gdb points to elision lock in libpthread, so this may be related.
I have recompiled glibc without elision lock and with -march=ivybridge and glibc related crashes are gone so I have this crazy idea about trying ivybridge compiled kernel (e.g. linux-ck-ivybridge).
Last edited by Goresome (2015-09-21 09:19:26)
Offline
I see people saying things about exotic instructions, but its strange that nobody mentioned that these CPUs are affected by HLE bug (BDD50, also BDD51).
It was mentioned: https://bbs.archlinux.org/viewtopic.php … 1#p1561901
It is very easy to notice as alot of software crashes and gdb points to elision lock in libpthread, so this may be related.
What are you talking about? On my system, I had only one application (a game) crash and that was due to something in the drm driver. I haven't read about anyones programs crashing, it's always the full system that freezes, so no way to debug it this easily.
I have recompiled glibc without elision lock and with -march=ivybridge and glibc related crashes are gone so I have this crazy idea about trying ivybridge compiled kernel (e.g. linux-ck-ivybridge).
Well, let us know how that goes!
Offline
I have disabled elision lock in glibc now, but as I remeber
usr/bin/gtk-query-immodules-3.0 --update-cache
reliably triggers the HLE crash, I spotted this when installing fcitx-im and pacman shouted because install() failed.
As for kernel panic, I updated my BIOS to F9 (it was released on 18 of september), disabled Trubo Boost, C1E, C3, C6/C7, disabled HLE in glibc, installed linux-ck-ivybridge and it still hangs.
Offline
AFAIK GCC broadwell is not compatible with ivybridge.
From the specs (and GCC documentation) it looks like ivybdrige is a subset of the broadwell instruction set. But compiling a package with ivybridge flags and using it on a broadwell system causes problems such as crashes. At least in my experience when compiling with GCC 4.9.2, haven't tested 5.x yet.
Offline
AFAIK GCC broadwell is not compatible with ivybridge.
From the specs (and GCC documentation) it looks like ivybdrige is a subset of the broadwell instruction set. But compiling a package with ivybridge flags and using it on a broadwell system causes problems such as crashes. At least in my experience when compiling with GCC 4.9.2, haven't tested 5.x yet.
Exactly because ivybridge is a subset of haswell (and furthermore broadwell) I tried that. However that changes nothing, just disabling HLE in glibc is enough to get rid of glibc crashes.
Offline
AFAIK GCC broadwell is not compatible with ivybridge.
From the specs (and GCC documentation) it looks like ivybdrige is a subset of the broadwell instruction set. But compiling a package with ivybridge flags and using it on a broadwell system causes problems such as crashes.
That's weird, besides AMD's removal of 3DNow! I'm not aware of any X86 CPU not supporting instructions supported by it's predecessors. Maybe you hit some hardware bug? Did you run it under gdb?
BTW, Goresome, read this Phoronix article linked on page 1. They found some funny-named option which also has a side effect of disabling SpeedStep.
Last edited by mich41 (2015-09-22 15:31:00)
Offline
I am also affected by this problem (MSI GS60 2QE).
So the suspect is the HLE in glibc as far as I understand. Can anybody confirm that?
It is strange, that disabling HLE using an environment variable does not solve this.
EDIT: According to this post: http://journal.siddhesh.in/categories/glibc.html
"Update: Disabling lock elision in glibc doesn’t seem to be sufficient. Either way, the Fedora kernel folks will have an update in place to update the microcode early by default so that both the kernel and the first instantiation of pthreads will see HLE disabled. So read the story as something interesting that we did but didn’t quite work. It was fun though…"
This explains why disabling HLE using env variables is not sufficient. Did anybody recompile glibc with disabled HLE? Did that solve the problem?
Last edited by xazax (2015-09-23 12:28:54)
Offline
I am also affected by this problem (MSI GS60 2QE).
So the suspect is the HLE in glibc as far as I understand. Can anybody confirm that?
It is strange, that disabling HLE using an environment variable does not solve this.
EDIT: According to this post: http://journal.siddhesh.in/categories/glibc.html
"Update: Disabling lock elision in glibc doesn’t seem to be sufficient. Either way, the Fedora kernel folks will have an update in place to update the microcode early by default so that both the kernel and the first instantiation of pthreads will see HLE disabled. So read the story as something interesting that we did but didn’t quite work. It was fun though…"
This explains why disabling HLE using env variables is not sufficient. Did anybody recompile glibc with disabled HLE? Did that solve the problem?
I mentioned that in a several posts above. Disabling elision locks eliminates glibc crashes, bot does not solve the kernel panic issue.
As for microcode, Intel have to release new version containing updates for Broadwell first.
Offline
I think I found something.
https://bugs.debian.org/cgi-bin/bugrepo … bug=762195
TLDR: there is a bug in glibc 2.19 and 2.20 which causes TSX to still be used even if --enable-lock-elision isn't specified (even with --disable-lock-elision, in fact). This bug may still be present in glibc 2.22, it's hard to tell at first glance.
Curiously, Fedora is said to have fixed this bug and works. Ubuntu 14.04 has it unfixed and doesn't work, Ubuntu 15.04 enables lock elision explicitly like Arch and both don't work. Go figure.
You may want to try building glibc with this patch. Just put in in the glibc build directory alongside glibc-2.22-roundup.patch and modify PKGBUILD accordingly before running makepkg.
@xazax: This Fedora issue you linked was different; it involved all multithreaded processes crashing when a microcode update disabled TSX support in the CPU just while these processes were using TSX.
Last edited by mich41 (2015-09-23 16:22:10)
Offline
The patch linked in the debian bug report returns 500 Internal Server Error for me, so I could not download. Searching for the patch's name in google returns this mailing list thread for the bug, which contains the patch in plain text and thus can be copied: https://lists.debian.org/debian-glibc/2 … 00076.html
Note that the patch needs to be modified slightly, because sysdeps/unix/sysv/linux/x86/elision-conf.c has been moved out of the nptl folder in the glibc tree.
I could not reproduce the user space crash with the command
/usr/bin/gtk-query-immodules-3.0 --update-cache
that Goresome has experienced with SpeedStep disabled (I did not try with SpeedStep enabled). I am now compiling glibc and will try it out with HLE disabled for real. I am very interested whether it will fix the issue.
Last edited by kris7t (2015-09-23 18:34:24)
Offline
No, patching libc to disable HLE does not help. At first, it all looket hopeful and did not crash even upon multiple restarts of emacs, which used to be a sure trigger. However, when I attempted to compile glibc again as a sort of stress test, the system MCE'd again a few seconds into the compilation.
While disable HLE seems to have some effect, it is possible that the bad instruction may hide inside binaries other than libc. Since there is no BIOS option to disable individual processor features, I guess we have to wait until Intel released an ucode patch (if ever) to disable them. Or figure out how Windows avoids the issue.
Last edited by kris7t (2015-09-23 20:08:24)
Offline