You are not logged in.

#1 2016-03-09 13:43:30

Magbed
Member
Registered: 2013-06-02
Posts: 30

MCE Error Computer hangs

Im getting random hangs on my computer and this is what mcelog says, its  a repetitive pattern.

Could use some help knowing if this is a CPU hardware issue or memory issue, i have ran several torture test like mprime... but it passed them all for a few hours... so im out of ideas.

mcelog wrote:

MCE 0
CPU 0 BANK 0
TIME 1457528941 Wed Mar  9 14:09:01 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 0 Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout
Running trigger `bus-error-trigger'
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout). No micro-instruction retired for some time
failure that caused IERR
STATUS f200084000000800 MCGSTATUS 0
MCGCAP 6 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 15
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 5
TIME 1457528941 Wed Mar  9 14:09:01 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 0 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Running trigger `bus-error-trigger'
BQ_DCU_READ_TYPE BQ_ERR_AERR2_TYPE BQ_ERR_AERR2_TYPE
received parity error on response transaction
MCE driven
STATUS f200001014000e0f MCGSTATUS 0
MCGCAP 6 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 15
Hardware event. This is not a software error.
MCE 2
CPU 1 BANK 5
TIME 1457528941 Wed Mar  9 14:09:01 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 1 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Running trigger `bus-error-trigger'
mcelog: Too many trigger children running already
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
received parity error on response transaction
MCE driven MCE is observed
STATUS f200001030000e0f MCGSTATUS 0
MCGCAP 6 APICID 1 SOCKETID 0
CPUID Vendor Intel Family 6 Model 15

Offline

#2 2016-03-09 18:02:48

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: MCE Error Computer hangs

"BUS error", "parity error" - maybe something with the FSB? This would mean bad CPU, northbridge or some wet electrolytic capacitors no longer being that wet anymore wink

I don't think it's RAM, but if you have multiple sticks you can remove one and wait until new MCE confirms it wasn't this stick's fault, then remove the next one and so on...

Similarly, you can install microcode updates to confirm that this isn't the reason either smile

If the CPU didn't run insanely hot or have other legitimate reasons to suddenly fail, I'd suspect the motherboard.

Offline

#3 2016-03-15 07:53:39

Magbed
Member
Registered: 2013-06-02
Posts: 30

Re: MCE Error Computer hangs

Thanks for the reply will try those advices.

Offline

#4 2016-08-24 18:37:06

halimzhz
Member
Registered: 2016-08-24
Posts: 4

Re: MCE Error Computer hangs

Dear All,

Today my server just reboot and when i check on mcelog, its show as below:

Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 0
TIME 1471977806 Wed Aug 24 02:43:26 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 0 Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout
Running trigger `bus-error-trigger'
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE failure that caused IERR
MCE driven PIC or FSB data parity error
STATUS f200080410000800 MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 5
TIME 1471977806 Wed Aug 24 02:43:26 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 0 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Running trigger `bus-error-trigger'
BQ_DCU_READ_TYPE BQ_ERR_AERR2_TYPE BQ_ERR_AERR2_TYPE internal BINIT
received parity error on response transaction
STATUS f200001044000e0f MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 2
CPU 1 BANK 5
TIME 1471977806 Wed Aug 24 02:43:26 2016
MCG status:
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 1 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Running trigger `bus-error-trigger'
mcelog: Too many trigger children running already
BQ_DCU_READ_TYPE BQ_ERR_AERR2_TYPE BQ_ERR_AERR2_TYPE
BINIT observed
STATUS b200000084000e0f MCGSTATUS 0
MCGCAP 806 APICID 2 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 3
CPU 4 BANK 5
TIME 1471977806 Wed Aug 24 02:43:26 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 4 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Running trigger `bus-error-trigger'
mcelog: Too many trigger children running already
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE internal BINIT
received parity error on response transaction
STATUS f200001040000e0f MCGSTATUS 0
MCGCAP 806 APICID 1 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 4
CPU 5 BANK 5
TIME 1471977806 Wed Aug 24 02:43:26 2016
MCG status:
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS error: 0 5 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Running trigger `bus-error-trigger'
mcelog: Too many trigger children running already
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
BINIT observed
STATUS b200000080000e0f MCGSTATUS 0
MCGCAP 806 APICID 3 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23

Is it because of RAM or CPU? Or maybe motherboard ?

Please help. TQ

Last edited by halimzhz (2016-08-25 15:41:45)

Offline

#5 2016-08-24 23:10:17

mrunion
Member
From: Jonesborough, TN
Registered: 2007-01-26
Posts: 1,938
Website

Re: MCE Error Computer hangs

Edit your post and use code tags, please.


Matt

"It is very difficult to educate the educated."

Offline

#6 2016-08-25 10:05:06

halimzhz
Member
Registered: 2016-08-24
Posts: 4

Re: MCE Error Computer hangs

Dear All,

Anybody having experience on the mcelog error message ?

TQ

Last edited by halimzhz (2016-08-25 10:05:18)

Offline

#7 2016-08-25 14:39:31

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,911

Re: MCE Error Computer hangs

received parity error on response transaction

Does the server use ECC-memory ?

The messages appear to mention only 2 banks , bank 0 and bank 5 .
I'd run memory diagnostics .

If your mobo is server/workstation class HW it should have extensive diagnostics accesssible from firmware .
The archlinux iso has memtst and is an alternative.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#8 2016-08-25 14:55:33

halimzhz
Member
Registered: 2016-08-24
Posts: 4

Re: MCE Error Computer hangs

Dear Lone_Wolf

Thank you so much for responce, yes the mobo is S5000VSA and its using ECC RAM, the server are drastically rebooted after live about 7 days and before that i did run the memtest86+ and i didnt found any error on RAM, i search around and i found somebody change the CPU, i dont know is that the best way i should do.

Please advice. TQ so much

Offline

#9 2016-08-25 15:17:43

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,774

Re: MCE Error Computer hangs

Here is a shot in the dark, but is your processor an Intel processor?  Have you installed and configured the microcode updates?
https://wiki.archlinux.org/index.php/Microcode

Edit:  BTW, could I bother you to edit your posts and use code tags rather than quote tags ?

Last edited by ewaller (2016-08-25 15:18:34)


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#10 2016-08-25 15:23:54

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,911

Re: MCE Error Computer hangs

For clarity, the reboot happened after the CPU was changed ?

If this is happening in a corporate/medium-sized business environment : cover your ass .
Somebody WILL be blamed for this, better make sure it's the responsible party.

report the isssues to your superior asap.
gather/request detailed info about the cpu change, support level and support contract of the hardware.
If you're lucky the support contract will include on-site assistance from expert technicians / engineers .


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#11 2016-08-25 15:40:34

halimzhz
Member
Registered: 2016-08-24
Posts: 4

Re: MCE Error Computer hangs

Dear Lone_Wolf,

Sorry for misundertanding from my ealier post, what i mean is i found from other forum on the net and somebody fix the problem by change the CPU

Offline

#12 2016-08-25 16:45:52

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: MCE Error Computer hangs

In all the cases I've read here on the forum about MCE errors, the problem was always broken hardware. If you are not the one in charge of hardware support just follow Lone_Wolf's advice, pass the hot potato to whoever is responsible for hardware maintenance.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB