You are not logged in.

#1 2017-05-03 20:20:48

tonnz
Member
Registered: 2015-08-02
Posts: 38

[Solved]Why is lock-elision still enabled in glibc?

In regards to this closed bug report:
https://bugs.archlinux.org/task/39631

I apologize in advance if this comes across as a rant, but it has been a stressful day.
I went through hours researching the problem until i found the solution (which is to compile glibc without lock-elision).
And yes, i have tried everything else (nvidia drivers, bios updates, microcode updates, etc).
Disabling "lock elision" is the only solution at the moment.

So here is my question: Why is lock-elision still enabled in glibc as compiled for Arch?
I did read through the bug report.
"I see no reason to disable it. glibc is not the issue." Whether or not glibc is the issue, disabling it is currently the only solution.
Any newer bug reports i found are just referenced to the original bug report and closed.
This problem still exists on Broadwell and Skylake platforms, it is not exclusive to Haswell.

Yes this looks like a HW bug that Intel should fix. Yes Software should work around that problem. Yes glibc devs should take care of it. But i can't do any of that. And neither do most of Arch users.
But if it really is a problem in the applications, why does recompiling glibc, and leaving the "faulty" application as-is, fix the problem?
Lock elision makes glibc unreliable. Why not abandon it then? Are there benefits of lock elision so great that it is worth having such a big hazard in Arch?

Thank you for reading this,
tonnz


If anyone is interested in my specific problem: libpthread (in glibc) crashes with a general protection fault when i try to run some university software.
Further reading:
A lengthy "grave" level Debian bug report where people were working on a solution/workaround:
https://bugs.debian.org/cgi-bin/bugrepo … bug=800574

Last edited by tonnz (2017-05-03 22:50:10)

Offline

#2 2017-05-03 21:20:15

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: [Solved]Why is lock-elision still enabled in glibc?

Why is applying the microcode update at boot not working for you? Does Intel believe your CPU model has HLE working correctly, so they don't work around the problem in the microcode for this model? Did you find discussion about this somewhere?

Offline

#3 2017-05-03 22:21:27

tonnz
Member
Registered: 2015-08-02
Posts: 38

Re: [Solved]Why is lock-elision still enabled in glibc?

Ropid wrote:

Why is applying the microcode update at boot not working for you? Does Intel believe your CPU model has HLE working correctly, so they don't work around the problem in the microcode for this model? Did you find discussion about this somewhere?

I did not find anything about it. It's the i7 6700hq (mobile). And according to intel, it supports TSX (i believe that encompasses HLE?). Here's a /proc/cpuinfo of an identical model:
http://www.linux-hardware-guide.com/201 … pu-2-60ghz
HLE is on the list of extensions (however, it's an older microcode version. I am using 0x9e i believe, i'd have to boot up linux again to check whether HLE is on the list with my processor/microcode)

So it's either a problem in glibc or my specific processor. But i've also seen people using broadwell-xeons reporting the same problems with glibc and lock elision.

Offline

#4 2017-05-03 22:22:35

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,389
Website

Re: [Solved]Why is lock-elision still enabled in glibc?

If you have the microcode update, then glibc is not the problem.  It is the software you are running.   Every single case of software crashing due to lock elison (after microcode update) has been fixed by correcting the buggy software.

Online

#5 2017-05-03 22:27:04

tonnz
Member
Registered: 2015-08-02
Posts: 38

Re: [Solved]Why is lock-elision still enabled in glibc?

Allan wrote:

If you have the microcode update, then glibc is not the problem.  It is the software you are running.

But in that case i don't understand why recompiling glibc with different options solves it. I thought changes like that are completely hidden to the applications using glibc?
And the fact that only people with newer skylake notebooks are heaving trouble running software doesn't help me either.

Help me understand.

Last edited by tonnz (2017-05-03 23:09:12)

Offline

#6 2017-05-03 22:39:53

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,389
Website

Re: [Solved]Why is lock-elision still enabled in glibc?

Compiling glibc with lock elison has exposed many bugs in memory management of software. Of course, if you disable lock elison, you go back to hiding the issue. Newer skylake processors have the issue because their microcode does not disable TSX, so the bugs remain exposed.

There is not a single example where software crashing due to lock elison was a glibc issue.

Online

#7 2017-05-03 22:49:24

tonnz
Member
Registered: 2015-08-02
Posts: 38

Re: [Solved]Why is lock-elision still enabled in glibc?

Allan wrote:

Compiling glibc with lock elison has exposed many bugs in memory management of software. Of course, if you disable lock elison, you go back to hiding the issue. Newer skylake processors have the issue because their microcode does not disable TSX, so the bugs remain exposed.

There is not a single example where software crashing due to lock elison was a glibc issue.

That makes sense, thanks. I'll write to the creators of that software, maybe something can be done about it. Until then i will have to use glibc without lock-elision.

Offline

#8 2017-05-08 04:29:03

severach
Member
Registered: 2015-05-23
Posts: 192

Re: [Solved]Why is lock-elision still enabled in glibc?

The i7-6700HQ does not support TSX-NI.

 Intel® TSX-NI No

The problem I have with lock elision is that on rare occasion both the BIOS and microcode fail to update the CPU (Lenovo TS140, Haswell E3-1245) and the system boots with TSX turned on. Rebooting, when even possible, does not help. Must repower to fix.

Offline

Board footer

Powered by FluxBB