You are not logged in.

#1 2018-01-22 12:51:09

cyayon
Member
Registered: 2016-09-05
Posts: 35

mce : harware error CPU

Hi all,

Since some week i have some random crash with no log output and black screen.

I ran some memtest86 test (only 3 passes), and no reported errors...

I just use 'stress' command (pacman -S stress) and try the following command : "stress --cpu 8", after about 10 seconds, i get this errors messages on the console :

mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 0: 9000004000010005             
mce: [Hardware Error]: TSC 139be719174                                               
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1516624544 SOCKET 0 APIC 4 microcode 1c
mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 0: 9000004000010005             
mce: [Hardware Error]: TSC 13b4643c2a7                                               
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1516624546 SOCKET 0 APIC 6 microcode 1c
mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 0: 9000004000010005             
mce: [Hardware Error]: TSC 13ce936121b                                               
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1516624548 SOCKET 0 APIC 6 microcode 1c
mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 0: 9000004000010005             
mce: [Hardware Error]: TSC 13d452276d6                                               
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1516624549 SOCKET 0 APIC 4 microcode 1c

and crash !

Is it my CPU which need to be replaced ? motherboard ?

Thanks in advance.

Offline

#2 2018-01-22 13:15:57

seth
Member
Registered: 2012-09-03
Posts: 52,199

Re: mce : harware error CPU

temperature issue?

Offline

#3 2018-01-22 13:35:40

drcouzelis
Member
From: Connecticut, USA
Registered: 2009-11-09
Posts: 4,092
Website

Re: mce : harware error CPU

What CPU are you using?

If it is a Ryzen CPU (like mine) I might have some bad news for you... sad

Offline

#4 2018-01-22 13:53:34

cyayon
Member
Registered: 2016-09-05
Posts: 35

Re: mce : harware error CPU

Hi,

I don't think it'a thermal issue, in normal operation, i monitor it each 15s and i do not detected anything wrong.

It's an "old" Intel core i5-3570K with an motherboad Asus p8Z77MPRO.

I got no problem since 2012 (buy date), but since few week, i got some random crash after some I/O intensive batch. No change after few kernel updates...

I have done some memtest86 test (3 passes only without any errors), and try today after a crash, le "stress" command from stress pacman package.

After about 5-10 seconds of the command "stress --cpu 8", the system crash with MCE hardware error.

thanks.

Offline

#5 2018-01-22 13:58:59

seth
Member
Registered: 2012-09-03
Posts: 52,199

Re: mce : harware error CPU

Don't think if you can measure tongue
Your poll interval is bigger than the stress time and it literally happens when you heat up all cores at once.
It's an "old" board so this can easily be a D.U.S.T. error.

Offline

#6 2018-01-22 14:02:42

cyayon
Member
Registered: 2016-09-05
Posts: 35

Re: mce : harware error CPU

thanks seth.

But when the crash appears on my batch, it's after about 15-30 min, so i check temperature each 15s and it's stable.
Of course, when i used stress, i can't monitor temperature, it's too short.

what is a D.U.S.T error please ?

Offline

#7 2018-01-22 14:07:42

seth
Member
Registered: 2012-09-03
Posts: 52,199

Offline

#8 2018-01-22 14:16:21

cyayon
Member
Registered: 2016-09-05
Posts: 35

Re: mce : harware error CPU

Ah... OK, no it's not a dust problem :-) it's clean.

I can't see any error while batch is running because, there is no screen and system had not the time to syslog something...
Sometime, the same batch has no problem, sometimes it crash the system.
But i suppose, the "stress" crash result is abnormal ...

Yes, i have installed and enabled last available intel-ucode (20180108). it has been patched on boot :

kernel: microcode: microcode updated early to revision 0x1c, date = 2015-02-26
kernel: microcode: sig=0x306a9, pf=0x2, revision=0x1c
kernel: microcode: Microcode Update Driver: v2.2.

of course, no overclock or something else.

Offline

#9 2018-01-22 15:11:37

drcouzelis
Member
From: Connecticut, USA
Registered: 2009-11-09
Posts: 4,092
Website

Re: mce : harware error CPU

cyayon wrote:

It's an "old" Intel core i5-3570K with an motherboad Asus p8Z77MPRO.

Ok. There were some known issues with early batches of Ryzen processors, but of course that's not an issue here. smile

From what I understood (and I might be waaaaay wrong) is that an MCE error is pretty much just the computer saying "ACK! SOMETHING WENT WRONG!". It could be the CPU, the motherboard, the RAM, or even the PSU... sad

My advice would be to take things slow, don't start blowing money replacing parts if you don't know if they're actually broken, test your computer with other known working parts as much as you can, and lastly, take advice from someone much smarter than me. big_smile

Good luck!

Offline

#10 2018-01-22 15:23:07

cyayon
Member
Registered: 2016-09-05
Posts: 35

Re: mce : harware error CPU

Ok thanks.

But could you please confirm that running “stress —cpu 8” must not generate these mce logs and crash the system ?

Thanks

Offline

#11 2018-01-22 19:22:46

drcouzelis
Member
From: Connecticut, USA
Registered: 2009-11-09
Posts: 4,092
Website

Re: mce : harware error CPU

Sure! I ran the command "stress --cpu 12" for a little over two hours just now. This was the only output to my terminal:

stress: info: [2544] dispatching hogs: 12 cpu, 0 io, 0 vm, 0 hdd

In the output of "journalctl" (as root), the last MCE error I have was from August 2016.

Offline

#12 2018-01-22 19:39:33

seth
Member
Registered: 2012-09-03
Posts: 52,199

Re: mce : harware error CPU

This is in 88% of all cases a thermal issue and afaiu we did not rule that out with your (unexplained) batch halt.
Please watch the temps closely and whether they boost. If the coller is loosely connected, the temperature under full load can rise extremely fast.

Offline

#13 2018-01-23 06:24:34

cyayon
Member
Registered: 2016-09-05
Posts: 35

Re: mce : harware error CPU

Hi,

After a simple clear cmos, everything is fine now !
No more crash.

I think it was a bad default overclock from asus in turbo mode...

Thanks :-)

Offline

Board footer

Powered by FluxBB