You are not logged in.

#1 2014-10-24 17:24:17

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Kernel traps & core dumps

Hi,

since the update yesterday I get strange errors when starting or exiting an application. The update included kernel 3.17 and intel-ucode.

sudo journalctl -f _TRANSPORT=kernel _TRANSPORT=journal wrote:

Oct 24 18:59:52 brutebox kernel: traps: VirtualBox[31137] trap invalid opcode ip:7faaaecbcf02 sp:7ffffac2d7c8 error:0 in libpthread-2.20.so[7faaaecb1000+17000]
Oct 24 18:59:57 brutebox kernel: traps: VirtualBox[31181] trap invalid opcode ip:7ff9461c3aea sp:7fffd673de58 error:0 in libpthread-2.20.so[7ff9461b8000+17000]
Oct 24 19:00:02 brutebox kernel: traps: VirtualBox[31225] trap invalid opcode ip:7f92ba2f0f02 sp:7fff1f1e0118 error:0 in libpthread-2.20.so[7f92ba2e5000+17000]
Oct 24 19:00:26 brutebox kernel: traps: chromium[31392] trap invalid opcode ip:7f006aea5b63 sp:7f005ed2bca8 error:0 in libpthread-2.20.so[7f006ae94000+17000]
Oct 24 19:00:29 brutebox systemd-coredump[31393]: Process 31390 (chromium) of user 1000 dumped core.
Oct 24 19:01:01 brutebox kernel: traps: steam[31784] trap invalid opcode ip:f71ff18b sp:ffadc080 error:0 in libpthread-2.20.so[f71ee000+18000]
Oct 24 19:01:01 brutebox systemd-coredump[31785]: Process 31784 (steam) of user 1000 dumped core.
Oct 24 19:04:27 brutebox kernel: traps: coredumpctl[29427] trap invalid opcode ip:7fad4105bb63 sp:7fff96409408 error:0 in libpthread-2.20.so[7fad4104a000+17000]
Oct 24 19:04:28 brutebox systemd-coredump[885]: Process 29427 (coredumpctl) of user 1000 dumped core.
Oct 24 19:16:45 brutebox kernel: traps: pacman[5914] trap invalid opcode ip:7fab73665aea sp:7fff5a5f2d18 error:0 in libpthread-2.20.so[7fab7365a000+17000]
Oct 24 19:16:45 brutebox systemd-coredump[5915]: Process 5914 (pacman) of user 1000 dumped core.

When I try to start the application a second time after the error it works. Affected applications (additional to the ones in the log) are eclipse and obmenu. 
Installed version of glibc is 2.20-2.

If you need further information I will gladly add them.

Thanks in advance,

MK13

Edit: While starting vim on terminal I got the following error:

vim /foo/bar.txt wrote:

Vim: Caught deadly signal ILL
Illegal instruction (core dumped)

http://www.computerhope.com/unix/signals.htm wrote:

SIGILL: The ILL signal is sent to a process when it attempts to execute a malformed, unknown, or privileged instruction.

Last edited by MK13 (2014-10-24 17:45:34)

Offline

#2 2014-10-26 14:07:42

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Re: Kernel traps & core dumps

I deactivated the intel-ucode update in initrd and the errors stopped hmm

Offline

#3 2014-10-27 19:37:01

GourdCaptain
Member
Registered: 2009-04-18
Posts: 121

Re: Kernel traps & core dumps

Have this problem too, trying disabling the intel-ucode update as well. Unfortunatly, my system wasn't in much of a condition to check but it only seemed to pop up after a suspend - maybe they're not being properly handled on a suspend and resume?

Offline

#4 2014-10-31 17:53:55

toki
Member
From: Germany
Registered: 2007-12-27
Posts: 9

Re: Kernel traps & core dumps

Same here...
I also start to see this behaviour only after a suspend/resume cycle.

Last edited by toki (2014-10-31 17:55:06)

Offline

#5 2014-11-01 17:48:36

InvalidInterrupt
Member
Registered: 2014-11-01
Posts: 4

Re: Kernel traps & core dumps

I'm experiencing what appears to be the same issue.

On an Intel 4690K system, after suspend/resume applications start dumping core in libpthread.

Limited logs follow, more can be provided if requested:

journalctl -b-2 | grep -e trap -e microcode -e resume wrote:

Oct 28 20:10:19 {HOSTNAME} kernel: CPU0 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 28 20:10:19 {HOSTNAME} kernel: CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 28 20:10:19 {HOSTNAME} kernel: CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 28 20:10:19 {HOSTNAME} kernel: CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 28 20:10:19 {HOSTNAME} kernel: microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1c
Oct 28 20:10:19 {HOSTNAME} kernel: microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1c
Oct 28 20:10:19 {HOSTNAME} kernel: microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x1c
Oct 28 20:10:19 {HOSTNAME} kernel: microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x1c
Oct 28 20:10:19 {HOSTNAME} kernel: microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
Oct 29 08:27:23 {HOSTNAME} kernel: ACPI: Low-level resume complete
Oct 29 08:27:23 {HOSTNAME} kernel: CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 29 08:27:23 {HOSTNAME} kernel: CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 29 08:27:23 {HOSTNAME} kernel: CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 29 08:27:23 {HOSTNAME} kernel: PM: noirq resume of devices complete after 12.752 msecs
Oct 29 08:27:23 {HOSTNAME} kernel: PM: early resume of devices complete after 0.183 msecs
Oct 29 08:27:23 {HOSTNAME} kernel: PM: resume of devices complete after 247.076 msecs
Oct 29 08:27:23 {HOSTNAME} systemd-sleep[3001]: System resumed.
Oct 29 20:22:43 {HOSTNAME} kernel: ACPI: Low-level resume complete
Oct 29 20:22:43 {HOSTNAME} kernel: CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 29 20:22:43 {HOSTNAME} kernel: CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 29 20:22:43 {HOSTNAME} kernel: CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 29 20:22:43 {HOSTNAME} kernel: PM: noirq resume of devices complete after 13.895 msecs
Oct 29 20:22:43 {HOSTNAME} kernel: PM: early resume of devices complete after 0.200 msecs
Oct 29 20:22:43 {HOSTNAME} kernel: PM: resume of devices complete after 247.581 msecs
Oct 29 20:22:43 {HOSTNAME} systemd-sleep[3344]: System resumed.
Oct 29 20:22:49 {HOSTNAME} kernel: traps: systemctl[3532] trap invalid opcode ip:7f2c89b89b63 sp:7fffedbc4e08 error:0 in libpthread-2.20.so[7f2c89b78000+17000]
Oct 29 20:39:40 {HOSTNAME} kernel: traps: mpv[3664] trap invalid opcode ip:7f5cd7880b63 sp:7f5cc5437bc8 error:0 in libpthread-2.20.so[7f5cd786f000+17000]
Oct 29 21:40:34 {HOSTNAME} kernel: traps: steamwebhelper[5659] trap invalid opcode ip:f325918b sp:f0e0b0e0 error:0 in libpthread-2.20.so[f3248000+18000]
Oct 29 23:15:11 {HOSTNAME} kernel: traps: systemctl[6022] trap invalid opcode ip:7f28314c2b63 sp:7fff3a143d28 error:0 in libpthread-2.20.so[7f28314b1000+17000]
Oct 30 08:41:11 {HOSTNAME} kernel: ACPI: Low-level resume complete
Oct 30 08:41:11 {HOSTNAME} kernel: CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 30 08:41:11 {HOSTNAME} kernel: CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 30 08:41:11 {HOSTNAME} kernel: CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 30 08:41:11 {HOSTNAME} kernel: PM: noirq resume of devices complete after 13.971 msecs
Oct 30 08:41:11 {HOSTNAME} kernel: PM: early resume of devices complete after 0.185 msecs
Oct 30 08:41:11 {HOSTNAME} kernel: PM: resume of devices complete after 248.720 msecs
Oct 30 08:41:10 {HOSTNAME} systemd-sleep[6026]: System resumed.
Oct 30 08:49:13 {HOSTNAME} kernel: traps: systemctl[6310] trap invalid opcode ip:7f86620d0b63 sp:7ffff6303018 error:0 in libpthread-2.20.so[7f86620bf000+17000]
Oct 30 19:41:57 {HOSTNAME} kernel: ACPI: Low-level resume complete
Oct 30 19:41:57 {HOSTNAME} kernel: CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 30 19:41:57 {HOSTNAME} kernel: CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 30 19:41:57 {HOSTNAME} kernel: CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
Oct 30 19:41:57 {HOSTNAME} kernel: PM: noirq resume of devices complete after 12.815 msecs
Oct 30 19:41:57 {HOSTNAME} kernel: PM: early resume of devices complete after 0.174 msecs
Oct 30 19:41:57 {HOSTNAME} kernel: PM: resume of devices complete after 247.306 msecs
Oct 30 19:41:56 {HOSTNAME} systemd-sleep[6314]: System resumed.
Oct 30 19:43:04 {HOSTNAME} kernel: traps: alsamixer[6541] trap invalid opcode ip:7f44754f8aea sp:7fffbc448b58 error:0 in libpthread-2.20.so[7f44754ed000+17000]
Oct 30 20:12:44 {HOSTNAME} kernel: traps: xlock[6720] trap invalid opcode ip:7f7aafeafb63 sp:7fff6f0f0698 error:0 in libpthread-2.20.so[7f7aafe9e000+17000]
Oct 30 20:44:13 {HOSTNAME} kernel: traps: xlock[6899] trap invalid opcode ip:7f9e9bfefb63 sp:7fff697103d8 error:0 in libpthread-2.20.so[7f9e9bfde000+17000]
Oct 30 22:45:19 {HOSTNAME} kernel: traps: curl[7437] trap invalid opcode ip:7f5b39687b63 sp:7f5b37663628 error:0 in libpthread-2.20.so[7f5b39676000+17000]
Oct 30 23:07:37 {HOSTNAME} kernel: traps: chromium[9217] trap invalid opcode ip:7fc88d005b63 sp:7fffe72bfa98 error:0 in libpthread-2.20.so[7fc88cff4000+17000]
Oct 30 23:07:37 {HOSTNAME} kernel: traps: chromium[9238] trap invalid opcode ip:7fc88d005b63 sp:7fffe72bfa98 error:0 in libpthread-2.20.so[7fc88cff4000+17000]
Oct 30 23:07:41 {HOSTNAME} kernel: traps: chromium[9244] trap invalid opcode ip:7fc88d005b63 sp:7fffe72bfa98 error:0 in libpthread-2.20.so[7fc88cff4000+17000]
Oct 30 23:07:42 {HOSTNAME} kernel: traps: chromium[9247] trap invalid opcode ip:7fc88d005b63 sp:7fffe72bfa98 error:0 in libpthread-2.20.so[7fc88cff4000+17000]
Oct 30 23:07:42 {HOSTNAME} kernel: traps: chromium[9250] trap invalid opcode ip:7fc88d005b63 sp:7fffe72bfb08 error:0 in libpthread-2.20.so[7fc88cff4000+17000]
Oct 30 23:27:19 {HOSTNAME} kernel: traps: xlock[9647] trap invalid opcode ip:7fd1e85fcb63 sp:7fff04e63008 error:0 in libpthread-2.20.so[7fd1e85eb000+17000]
Oct 30 23:51:04 {HOSTNAME} kernel: traps: gtk-update-icon[10038] trap invalid opcode ip:7fcee3ee9aea sp:7fff9c5caab8 error:0 in libpthread-2.20.so[7fcee3ede000+17000]
Oct 30 23:53:13 {HOSTNAME} kernel: traps: gtk-update-icon[10127] trap invalid opcode ip:7fe394175aea sp:7fffe83f0598 error:0 in libpthread-2.20.so[7fe39416a000+17000]

I feel this may have to do with the microcode disabling TSX on Intel Haswel processors. If so, disabling microcode updates may result in buggy behavior due to using buggy instructions. I have no idea why a suspend would cause libpthread to begin using the instructions again though.

Offline

#6 2014-11-01 17:56:54

GourdCaptain
Member
Registered: 2009-04-18
Posts: 121

Re: Kernel traps & core dumps

This wasn't a problem prior to the switch to the /boot/intel-ucode.img file as an initrd method of handling the updates - it seemed to handle the microcode and update fine prior to that on this system.

Offline

#7 2014-11-03 00:10:27

InvalidInterrupt
Member
Registered: 2014-11-01
Posts: 4

Re: Kernel traps & core dumps

I'm by no means an expert on this. Feel free to correct me if I'm wrong.

Intel discovered the TSX bug in August. According to the changelog for the intel-ucode package, the most recent update was the first since Intel would have disabled those instructions.
Further, based on this it seems the switch to early microcode update was made to support that microcode update.

Is anyone experiencing this issue on a machine without a Haswell processor?

Last edited by InvalidInterrupt (2014-11-03 01:10:18)

Offline

#8 2014-11-03 00:22:49

GourdCaptain
Member
Registered: 2009-04-18
Posts: 121

Re: Kernel traps & core dumps

I've got a Haswell.

Offline

#9 2014-11-03 10:47:21

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Re: Kernel traps & core dumps

I can confirm that the errors only occur after a suspend and I have a haswell as well (i5 4570).

Offline

#10 2014-11-04 01:46:33

quietraccoon
Member
Registered: 2014-05-27
Posts: 4

Re: Kernel traps & core dumps

I'm having the same issue on a Haswell Refresh cpu, i5-4690, and it appears to only happen after a suspend+resume for me as well. This thread and the link to the bug report are the only places I've found so far with other people having this issue.

glibc 2.20-2
linux 3.17.1-1
intel-ucode 20140913-1

Output from journal -xn

Nov 03 17:42:12 JordBox kernel: traps: tumblerd[8407] trap invalid opcode ip:7f1b11303aea sp:7fff0e8ca318 error:0 in libpthread-2.20.so[7f1b112f8000+17000]
Nov 03 17:42:28 JordBox systemd-coredump[8409]: Process 8407 (tumblerd) of user 1000 dumped core.
-- Subject: Process 8407 (tumblerd) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 8407 (tumblerd) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Nov 03 17:43:21 JordBox kernel: traps: journalctl[8610] trap invalid opcode ip:7f5bfdd73b63 sp:7fffd925d608 error:0 in libpthread-2.20.so[7f5bfdd62000+17000]
Nov 03 17:43:21 JordBox systemd-coredump[8660]: Process 8610 (journalctl) of user 1000 dumped core.
-- Subject: Process 8610 (journalctl) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 8610 (journalctl) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Nov 03 17:44:04 JordBox kernel: traps: xfce4-terminal[8807] trap invalid opcode ip:7f76d79e2f02 sp:7fff7d9f2ea8 error:0 in libpthread-2.20.so[7f76d79d7000+17000]
Nov 03 17:44:08 JordBox kernel: traps: xfce4-terminal[8897] trap invalid opcode ip:7f6e3e387f02 sp:7fff825ee188 error:0 in libpthread-2.20.so[7f6e3e37c000+17000]
Nov 03 17:44:10 JordBox systemd-coredump[8810]: Process 8807 (xfce4-terminal) of user 1000 dumped core.
-- Subject: Process 8807 (xfce4-terminal) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 8807 (xfce4-terminal) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Nov 03 17:44:13 JordBox systemd-coredump[8900]: Process 8897 (xfce4-terminal) of user 1000 dumped core.
-- Subject: Process 8897 (xfce4-terminal) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 8897 (xfce4-terminal) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Nov 03 17:44:19 JordBox kernel: traps: xfce4-terminal[8986] trap invalid opcode ip:7f1742085f02 sp:7fffcd74c0f8 error:0 in libpthread-2.20.so[7f174207a000+17000]
Nov 03 17:44:24 JordBox systemd-coredump[8989]: Process 8986 (xfce4-terminal) of user 1000 dumped core.
-- Subject: Process 8986 (xfce4-terminal) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 8986 (xfce4-terminal) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.

Offline

#11 2014-11-04 01:50:52

GourdCaptain
Member
Registered: 2009-04-18
Posts: 121

Re: Kernel traps & core dumps

Should I register a bug on the intel-firmware package on the Arch bug tracker for this?

Offline

#12 2014-11-04 02:07:11

InvalidInterrupt
Member
Registered: 2014-11-01
Posts: 4

Re: Kernel traps & core dumps

My intuition tells me glibc might be the best package to file a bug report with.
Just a hunch. Regardless, we should file a report somewhere.

Offline

#13 2014-11-04 08:49:05

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Re: Kernel traps & core dumps

I already wrote a message to the glibc mail list explaining the issue but they told to me to file a bug report at my distro.
I think we should file a report for the ucode package as GourdCaptain said.

Last edited by MK13 (2014-11-05 10:14:25)

Offline

#14 2014-11-05 05:53:35

quietraccoon
Member
Registered: 2014-05-27
Posts: 4

Re: Kernel traps & core dumps

After a little quick searching the other night, I found a patch for Debian's glibc package that was submitted and accepted into their testing repo. I modified it slightly for 2.20. I built+installed glibc 2.20-2 from [core], with the patch applied, and am now no longer getting the kernel traps, nor have I gotten any new issues AFAIK so far. Before doing that, I was also having a problem with Arch not booting when upgrading from Linux 3.17.1 to 3.17.2; Arch was instead going into a recovery shell, forcing me to use the fallback and then downgrade Linux. After installing patched glibc, I'm able to boot into Arch with 3.17.2; not sure about the connection, though.

glibc-2.20-local-blacklist-on-TSX-Haswell.patch:

Intel TSX is broken on Haswell based processors (erratum HSD136/HSW136)
and a microcode update is available to simply disable the corresponding
instructions.

While the responsability to continue or not using TSX should be left to
the users, a live microcode update will disable the TSX instructions
causing already started binaries to segfault. This patch simply disable 
Intel TSX (HLE and RTM) on processors which might receive a microcode
update, so that it doesn't happen. We might expect newer steppings to
fix the issue, and if it is not the case the corresponding processors 
will be shipped with TSX already disabled.

Author: Henrique de Moraes Holschuh <hmh@debian.org>

diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86_64/multiarch/init-arch.c
index db74d97..6f61ae6 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86_64/multiarch/init-arch.c
@@ -26,7 +26,7 @@ struct cpu_features __cpu_features attribute_hidden;
 
 
 static void
-get_common_indeces (unsigned int *family, unsigned int *model)
+get_common_indeces (unsigned int *family, unsigned int *model, unsigned int *stepping)
 {
   __cpuid (1, __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax,
 	   __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ebx,
@@ -36,6 +36,7 @@ get_common_indeces (unsigned int *family, unsigned int *model)
   unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
   *family = (eax >> 8) & 0x0f;
   *model = (eax >> 4) & 0x0f;
+  *stepping = eax & 0x0f;
 }
 
 
@@ -47,6 +48,7 @@ __init_cpu_features (void)
   unsigned int edx;
   unsigned int family = 0;
   unsigned int model = 0;
+  unsigned int stepping = 0;
   enum cpu_features_kind kind;
 
   __cpuid (0, __cpu_features.max_cpuid, ebx, ecx, edx);
@@ -56,7 +58,7 @@ __init_cpu_features (void)
     {
       kind = arch_kind_intel;
 
-      get_common_indeces (&family, &model);
+      get_common_indeces (&family, &model, &stepping);
 
       unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
       unsigned int extended_family = (eax >> 20) & 0xff;
@@ -131,7 +133,7 @@ __init_cpu_features (void)
     {
       kind = arch_kind_amd;
 
-      get_common_indeces (&family, &model);
+      get_common_indeces (&family, &model, &stepping);
 
       ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
 
@@ -179,6 +181,14 @@ __init_cpu_features (void)
 	}
     }
 
+  /* Disable Intel TSX (HLE and RTM) due to erratum HSD136/HSW136
+     on Haswell processors, to work around outdated microcode that
+     doesn't disable the broken feature by default */
+  if (kind == arch_kind_intel && family == 6 &&
+      ((model == 63 && stepping <= 2) || (model == 60 && stepping <= 3) ||
+       (model == 69 && stepping <= 1) || (model == 70 && stepping <= 1)))
+    __cpu_features.cpuid[COMMON_CPUID_INDEX_7].ebx &= ~(bit_RTM | bit_HLE);
+
   __cpu_features.family = family;
   __cpu_features.model = model;
   atomic_write_barrier ();
diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h
index 793707a..e2745cb 100644
--- a/sysdeps/x86_64/multiarch/init-arch.h
+++ b/sysdeps/x86_64/multiarch/init-arch.h
@@ -41,6 +41,7 @@ #define bit_FMA4	(1 << 16)

 /* COMMON_CPUID_INDEX_7.  */
 #define bit_RTM		(1 << 11)
+#define bit_HLE		(1 << 4)
 #define bit_AVX2	(1 << 5)
 
 /* XCR0 Feature flags.  */

Reading the discussion on the bug report page, it appears the patch is necessary and that compiling glibc with "--enable-elision-lock=no" isn't enough cuz it doesn't fully disable it.

Patch originally found here:
https://bugs.debian.org/cgi-bin/bugrepo … =762195#20


If anyone's interested, here's the slightly modified glibc PKGBUILD from [core] I used:

# $Id: PKGBUILD 221792 2014-09-16 00:03:06Z allan $
# Maintainer: Allan McRae <allan@archlinux.org>

# toolchain build order: linux-api-headers->glibc->binutils->gcc->binutils->glibc
# NOTE: valgrind requires rebuilt with each major glibc version

# NOTE: adjust version in install script when locale files are updated

# Modified by quietraccoon; added patch for TSX blacklist

pkgname=glibc
pkgver=2.20
pkgrel=2
pkgdesc="GNU C Library"
arch=('i686' 'x86_64')
url="http://www.gnu.org/software/libc"
license=('GPL' 'LGPL')
groups=('base')
depends=('linux-api-headers>=3.16' 'tzdata' 'filesystem>=2013.01')
makedepends=('gcc>=4.9')
backup=(etc/gai.conf
        etc/locale.gen
        etc/nscd.conf)
options=('!strip' 'staticlibs')
install=glibc.install
source=(http://ftp.gnu.org/gnu/libc/${pkgname}-${pkgver}.tar.xz{,.sig}
    glibc-2.20-local-blacklist-on-TSX-Haswell.patch
	glibc-2.20-getifaddrs_internal-segfault.patch
	glibc-2.20-linux-3.16-additions.patch
	glibc-2.20-do_ftell_wide-memleak.patch
        locale.gen.txt
        locale-gen)
md5sums=('948a6e06419a01bd51e97206861595b0'
         'SKIP'
         'd7fd951fd0b7d891eefb6b14e5dd2d28'
         '1c5d5c2017445c75dbc5c6d0c1e45ddb'
         '8f1059f431b842e54b12bde689620df8'
         'b50feeab78fa6ce0a8cfb41ee8dc1fd8'
         '07ac979b6ab5eeb778d55f041529d623'
         '476e9113489f93b348b21e144b6a8fcf')
#validpgpkeys=('F37CDAB708E65EA183FD1AF625EF0A436C2A4AFF')  # Carlos O'Donell

prepare() {
  cd ${srcdir}/glibc-${pkgver}

  # fix segfault in getifaddrs_internal
  # https://sourceware.org/ml/libc-alpha/2014-09/msg00312.html
  patch -p1 -i $srcdir/glibc-2.20-getifaddrs_internal-segfault.patch
  
  # linux 3.16 additions - commit 0bd72468
  patch -p1 -i $srcdir/glibc-2.20-linux-3.16-additions.patch
  
  # plug memory leak - commit 984c0ea9
  patch -p1 -i $srcdir/glibc-2.20-do_ftell_wide-memleak.patch

  # TSX blacklist
  patch -p1 -i $srcdir/glibc-2.20-local-blacklist-on-TSX-Haswell.patch

  mkdir ${srcdir}/glibc-build
}

build() {
  LD_LIBRARY_PATH=/usr/lib

  cd ${srcdir}/glibc-build

  if [[ ${CARCH} = "i686" ]]; then
    # Hack to fix NPTL issues with Xen, only required on 32bit platforms
    # TODO: make separate glibc-xen package for i686
    export CFLAGS="${CFLAGS} -mno-tls-direct-seg-refs"
  fi

  echo "slibdir=/usr/lib" >> configparms
  echo "rtlddir=/usr/lib" >> configparms
  echo "sbindir=/usr/bin" >> configparms
  echo "rootsbindir=/usr/bin" >> configparms

  # remove hardening options for building libraries
  CFLAGS=${CFLAGS/-fstack-protector-strong/}
  CPPFLAGS=${CPPFLAGS/-D_FORTIFY_SOURCE=2/}

  ${srcdir}/${pkgname}-${pkgver}/configure --prefix=/usr \
      --libdir=/usr/lib --libexecdir=/usr/lib \
      --with-headers=/usr/include \
      --with-bugurl=https://bugs.archlinux.org/ \
      --enable-add-ons \
      --enable-obsolete-rpc \
      --enable-kernel=2.6.32 \
      --enable-bind-now --disable-profile \
      --enable-stackguard-randomization \
      --enable-lock-elision=no \
      --enable-multi-arch

  # build libraries with hardening disabled
  echo "build-programs=no" >> configparms
  make

  # re-enable hardening for programs
  sed -i "/build-programs=/s#no#yes#" configparms
  echo "CC += -fstack-protector-strong -D_FORTIFY_SOURCE=2" >> configparms
  echo "CXX += -fstack-protector-strong -D_FORTIFY_SOURCE=2" >> configparms
  make

  # remove harding in preparation to run test-suite
  sed -i '5,7d' configparms
}

check() {
  # the linker commands need to be reordered - fixed in 2.19
  LDFLAGS=${LDFLAGS/--as-needed,/}

  cd ${srcdir}/glibc-build

  # tst-cleanupx4 failure on i686 is "expected"
  make check || true
}

package() {
  cd ${srcdir}/glibc-build

  install -dm755 ${pkgdir}/etc
  touch ${pkgdir}/etc/ld.so.conf

  make install_root=${pkgdir} install

  rm -f ${pkgdir}/etc/ld.so.{cache,conf}

  install -dm755 ${pkgdir}/usr/lib/{locale,systemd/system,tmpfiles.d}

  install -m644 ${srcdir}/${pkgname}-${pkgver}/nscd/nscd.conf ${pkgdir}/etc/nscd.conf
  install -m644 ${srcdir}/${pkgname}-${pkgver}/nscd/nscd.service ${pkgdir}/usr/lib/systemd/system
  install -m644 ${srcdir}/${pkgname}-${pkgver}/nscd/nscd.tmpfiles ${pkgdir}/usr/lib/tmpfiles.d/nscd.conf

  install -m644 ${srcdir}/${pkgname}-${pkgver}/posix/gai.conf ${pkgdir}/etc/gai.conf

  install -m755 ${srcdir}/locale-gen ${pkgdir}/usr/bin

  # create /etc/locale.gen
  install -m644 ${srcdir}/locale.gen.txt ${pkgdir}/etc/locale.gen
  sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' -e 's|^|#|g' \
    ${srcdir}/glibc-${pkgver}/localedata/SUPPORTED >> ${pkgdir}/etc/locale.gen

  # remove the static libraries that have a shared counterpart
  # libc, libdl, libm and libpthread are required for toolchain testsuites
  # in addition libcrypt appears widely required
  rm $pkgdir/usr/lib/lib{anl,BrokenLocale,nsl,resolv,rt,util}.a

  # Do not strip the following files for improved debugging support
  # ("improved" as in not breaking gdb and valgrind...):
  #   ld-${pkgver}.so
  #   libc-${pkgver}.so
  #   libpthread-${pkgver}.so
  #   libthread_db-1.0.so

  cd $pkgdir
  strip $STRIP_BINARIES usr/bin/{gencat,getconf,getent,iconv,iconvconfig} \
                        usr/bin/{ldconfig,locale,localedef,nscd,makedb} \
                        usr/bin/{pcprofiledump,pldd,rpcgen,sln,sprof} \
                        usr/lib/getconf/*
  [[ $CARCH = "i686" ]] && strip $STRIP_BINARIES usr/bin/lddlibc4

  strip $STRIP_STATIC usr/lib/*.a

  strip $STRIP_SHARED usr/lib/{libanl,libBrokenLocale,libcidn,libcrypt}-*.so \
                      usr/lib/libnss_{compat,db,dns,files,hesiod,nis,nisplus}-*.so \
                      usr/lib/{libdl,libm,libnsl,libresolv,librt,libutil}-*.so \
                      usr/lib/{libmemusage,libpcprofile,libSegFault}.so \
                      usr/lib/{audit,gconv}/*.so
}

Currently installed...

Linux 3.17.2
glibc 2.20-2 with patch
grub 1:2.02.beta2-4
intel-ucode 20140913-1



EDIT:
The patch was originally posted on the link I provided, but the patch I actually used was taken from Debian's tarball-package-thing (or whatever it's called). They are the same though.
http://ftp.de.debian.org/debian/pool/ma … ian.tar.xz

Last edited by quietraccoon (2014-11-05 07:12:28)

Offline

#15 2014-11-05 10:14:42

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Re: Kernel traps & core dumps

MK13 wrote:

I already wrote a message to the glibc mail list explaining the issue but they told to me to file a bug report at my distro.
I think we should file a report for the ucode package as GourdCaptain said.

glibc wrote:

What bugs should be reported?

Most users do not compile the GNU C Library from the sources released by the GNU developers. Most people are using glibc binaries supplied with a complete operating system distribution. Distributions may include their own modifications to glibc in the binaries and sources you get with the operating system. If the glibc you are using comes from a complete operating system distribution, you should report bugs to that distribution project first. Your distribution's own documentation and web pages should refer you to their bug-reporting system. Your distribution's maintainers will determine whether the problem is specific to their modifications or other details of that particular system. If the problem does exist in the standard GNU C Library code, they will report it to the GNU maintainers or direct you how to do so.

Maybe we should file a report for glibc in [core] with a link to the patch from quietraccoons link and encourage the maintainer to report upstream? Or, now that we know that the issue isn't specific to arch, should we report directly to glibc?

http://www.gnu.org/software/libc/bugs.html

Last edited by MK13 (2014-11-05 10:20:47)

Offline

#16 2014-11-05 13:46:35

t.ask
Member
Registered: 2013-01-14
Posts: 11

Re: Kernel traps & core dumps

Please, file an Arch  bugreport on this. Possibly with referring to the fixed version of glibc. Thanks.

Btw. I experience the same issue with Haswell, microcode and suspend.

Offline

#17 2014-11-05 14:31:16

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Re: Kernel traps & core dumps

I added a bug report in the arch bug tracker for the glibc package:

https://bugs.archlinux.org/task/42689

Offline

#18 2014-11-05 21:34:24

EscapedNull
Member
Registered: 2013-12-04
Posts: 129

Re: Kernel traps & core dumps

I'm afraid to upgrade my kernel, honestly.

Offline

#19 2014-11-10 00:44:00

t.ask
Member
Registered: 2013-01-14
Posts: 11

Re: Kernel traps & core dumps

@EscapedNull: You can if you don't have a Haswell system AFAIK. It's also not effecting many systems. Not sure if this might be related to certain BIOS settings.

Just for the records. I can say that removing syslinux microcode settings helps with the "applications not starting" issue (suspend).

Btw. I experience also another weird issue with applications or Wine games running in fullscree. Mouse clicks act on everything which is underneath the fullscreen window. It's like fullscreen is ghosting the application/game (Gnome). Is this related somehow?  This it's also there with disabled microcode update. I just ask to make sure it's only happening here.

Last edited by t.ask (2014-11-10 00:45:28)

Offline

#20 2014-11-12 11:00:10

t.ask
Member
Registered: 2013-01-14
Posts: 11

Re: Kernel traps & core dumps

Can someone else confirm that this error is only effecting Haswell systems without Hyper Threading CPUs? With my systems it looks like only non-HT Haswell systems are affected.

Offline

#21 2014-11-12 14:13:53

mrunion
Member
From: Jonesborough, TN
Registered: 2007-01-26
Posts: 1,938
Website

Re: Kernel traps & core dumps

I must be honest and say that I didn't study this thread -- I only hit the highlights. With that said....

Two weeks ago I bought a laptop while it was on sale (http://www.tigerdirect.com/applications … No=8939384). It has an i7-4810MQ, and unless I am mistaken is a Haswell processor with Hyper Threading support. To that effect, I am not experiencing any of the issues you guys are speaking of, and I am running the microcode updates. Is there anything specific I may be able to answer that will help run down the issues? Anything specific you want me to look for in my logs?

I am running the latest updates from the standard repositories (I don't use [testing]). I can suspend and resume without problems. I have a UEFI system (I don't dual boot with anything, I just have Arch only on the machine).

Again, I may be just misunderstanding the issue here, and not be any help at all. But I want to offer help if I can.


Matt

"It is very difficult to educate the educated."

Offline

#22 2014-11-12 14:33:18

MK13
Member
From: Germany
Registered: 2014-04-12
Posts: 80

Re: Kernel traps & core dumps

You could post output of

dmesg | grep microcode

after resume from suspend.

Edit: According to a comment by Namarrgon hyper-threading cpus are not affected. See the bug report at the kernel bugtracker for more information and a detailed explanation.

Last edited by MK13 (2014-11-12 15:20:28)

Offline

#23 2014-11-12 20:35:41

quietraccoon
Member
Registered: 2014-05-27
Posts: 4

Re: Kernel traps & core dumps

t.ask wrote:

Can someone else confirm that this error is only effecting Haswell systems without Hyper Threading CPUs? With my systems it looks like only non-HT Haswell systems are affected.

Both InvalidInterrupt and I have Haswell Refresh i5's and are experiencing the issue. Unfortunately, I don't have access to any HT CPUs that are Haswell or newer to test on.

Offline

#24 2014-11-12 20:37:56

GourdCaptain
Member
Registered: 2009-04-18
Posts: 121

Re: Kernel traps & core dumps

I'm having the issue on my system with no Hyper Threading - Core i5 4590. I have no Hyper Threading-using Haswell to check it on.

Offline

#25 2014-11-13 01:50:14

mrunion
Member
From: Jonesborough, TN
Registered: 2007-01-26
Posts: 1,938
Website

Re: Kernel traps & core dumps

As per @MK13, here is my output:

[mrunion@rustyhump ~]$ sudo dmesg | grep microcode
[    0.000000] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03
[    0.091465] CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
[    0.112458] CPU2 microcode updated early to revision 0x1c, date = 2014-07-03
[    0.133422] CPU3 microcode updated early to revision 0x1c, date = 2014-07-03
[    0.446644] microcode: CPU0 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446648] microcode: CPU1 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446653] microcode: CPU2 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446660] microcode: CPU3 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446667] microcode: CPU4 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446672] microcode: CPU5 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446678] microcode: CPU6 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446684] microcode: CPU7 sig=0x306c3, pf=0x10, revision=0x1c
[    0.446718] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

Matt

"It is very difficult to educate the educated."

Offline

Board footer

Powered by FluxBB