You are not logged in.

#1 2017-04-06 22:58:50

titaniumbones
Member
Registered: 2013-12-20
Posts: 52

Understanding hardware errors in dmesg

Can anyone tell me what's up with these hardware erros, and what I should be oding to prevent htem?

[106457.698282] IPv6: ADDRCONF(NETDEV_CHANGE): wlp4s0: link becomes ready
[106759.201082] wlp4s0: AP d8:6c:e9:23:0e:65 changed bandwidth, new config is 2412 MHz, width 1 (2412/0 MHz)
[111093.253024] thinkpad_acpi: EC reports that Thermal Table has changed
[111890.093627] CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
[111890.093628] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
[111890.093630] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
[111890.093631] CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
[111890.093633] CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
[111890.093635] mce: [Hardware Error]: Machine check events logged
[111890.093637] CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
[111890.093639] mce: [Hardware Error]: Machine check events logged
[111890.093642] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 128: 000000008819280b
[111890.093644] mce: [Hardware Error]: TSC de4b05aa535 
[111890.093647] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491506681 SOCKET 0 APIC 2 microcode 8a
[111890.093649] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 128: 000000008819280b
[111890.093650] mce: [Hardware Error]: TSC de4b05af8d8 
[111890.093653] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491506681 SOCKET 0 APIC 3 microcode 8a
[111890.094612] CPU1: Core temperature/speed normal
[111890.094612] CPU3: Core temperature/speed normal
[111890.094614] CPU0: Package temperature/speed normal
[111890.094614] CPU2: Package temperature/speed normal
[111890.094615] CPU3: Package temperature/speed normal
[111890.094616] CPU1: Package temperature/speed normal
[111890.094669] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 128: 00000000881a280a
[111890.094671] mce: [Hardware Error]: TSC de4b084d1f9 
[111890.094674] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491506681 SOCKET 0 APIC 3 microcode 8a
[111890.094676] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 128: 00000000881a280a
[111890.094677] mce: [Hardware Error]: TSC de4b084f047 
[111890.094679] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491506681 SOCKET 0 APIC 2 microcode 8a
[112455.807623] CPU1: Core temperature above threshold, cpu clock throttled (total events = 23)
[112455.807624] CPU3: Core temperature above threshold, cpu clock throttled (total events = 23)
[112455.807626] CPU0: Package temperature above threshold, cpu clock throttled (total events = 23)
[112455.807627] CPU2: Package temperature above threshold, cpu clock throttled (total events = 23)
[112455.807629] CPU3: Package temperature above threshold, cpu clock throttled (total events = 23)
[112455.807631] mce_notify_irq: 1 callbacks suppressed
[112455.807632] mce: [Hardware Error]: Machine check events logged
[112455.807634] CPU1: Package temperature above threshold, cpu clock throttled (total events = 23)
[112455.807636] mce: [Hardware Error]: Machine check events logged
[112455.807641] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 128: 000000008819280b
[112455.807642] mce: [Hardware Error]: TSC f568bd108f8 
[112455.807645] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491507247 SOCKET 0 APIC 3 microcode 8a
[112455.807646] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 128: 000000008819280b
[112455.807647] mce: [Hardware Error]: TSC f568bd16225 
[112455.807649] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491507247 SOCKET 0 APIC 2 microcode 8a
[112455.808653] CPU3: Core temperature/speed normal
[112455.808654] CPU0: Package temperature/speed normal
[112455.808655] CPU1: Core temperature/speed normal
[112455.808656] CPU2: Package temperature/speed normal
[112455.808662] CPU3: Package temperature/speed normal
[112455.808663] CPU1: Package temperature/speed normal
[112455.808694] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 128: 00000000881a280a
[112455.808700] mce: [Hardware Error]: TSC f568bfd3d1e 
[112455.808704] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491507247 SOCKET 0 APIC 2 microcode 8a
[112455.808715] mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 128: 00000000881a280a
[112455.808715] mce: [Hardware Error]: TSC f568bfd55c6 
[112455.808717] mce: [Hardware Error]: PROCESSOR 0:406e3 TIME 1491507247 SOCKET 0 APIC 3 microcode 8a

Offline

#2 2017-04-07 00:07:34

jonno2002
Member
Registered: 2016-11-21
Posts: 684

Re: Understanding hardware errors in dmesg

it appears your cpu is overheating and thermal throttling according to those messages, check your heatsink and fan for dust buildup etc etc

Offline

#3 2017-04-16 22:19:32

hipmanbro
Member
Registered: 2016-09-26
Posts: 3

Re: Understanding hardware errors in dmesg

I'm encountering the exact same issue... I noticed you're using a thinkpad. I have a Lenovo T470, completely brand new. It shouldn't have built up any dust.

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 142
Model name:            Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz
Stepping:              9
CPU MHz:               499.987
CPU max MHz:           3500.0000
CPU min MHz:           400.0000
BogoMIPS:              5426.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

Last edited by hipmanbro (2017-04-16 22:23:39)

Offline

#4 2017-04-17 00:59:15

titaniumbones
Member
Registered: 2013-12-20
Posts: 52

Re: Understanding hardware errors in dmesg

Yeah, mine is a T460 so pretty similar:

sudo lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 78
Model name:            Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Stepping:              3
CPU MHz:               1599.438
CPU max MHz:           3400.0000
CPU min MHz:           400.0000
BogoMIPS:              5618.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

Don't see much dust accumulation myself. Maybe we need to look at these flags or something; but this problem is a little beyond me.

Offline

#5 2017-04-19 05:54:56

SmallAndSimple
Member
Registered: 2015-11-25
Posts: 50

Re: Understanding hardware errors in dmesg

Whatever it is, its not the flags: those are the additions on X86 that are supported by your CPU.

Does your CPU only overhead when there is load, or does it also overheat if the machine is idle?

I have a very similar CPU in my laptop, and when I load it properly (Prime95, for example) it will overheat and no longer be able to use its highest turbo (it drops to 3.2 instead of 3.4). I dont considered this a bug before, is this similar to your issues?

Offline

#6 2017-04-22 04:06:13

ghostInTheSSH
Member
Registered: 2017-04-21
Posts: 4

Re: Understanding hardware errors in dmesg

SmallAndSimple wrote:

I have a very similar CPU in my laptop, and when I load it properly (Prime95, for example) it will overheat and no longer be able to use its highest turbo (it drops to 3.2 instead of 3.4). I dont considered this a bug before, is this similar to your issues?

From what I understand this is the intended behavior of turbo/boost speeds. It is to provide extra resources to the system for a short period of time in order to make certain workloads smoother. I do not know if these errors are thrown by a CPU throttling from turbo to it's advertised non turbo clock.

Offline

#7 2017-04-22 11:46:45

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Understanding hardware errors in dmesg

ghostInTheSSH wrote:

From what I understand this is the intended behavior of turbo/boost speeds. It is to provide extra resources to the system for a short period of time in order to make certain workloads smoother. I do not know if these errors are thrown by a CPU throttling from turbo to it's advertised non turbo clock.

Normal system operation with turbo boost should not issue any MCEs.

If the machine is overheating check that the heatsink is not full of lint/dust, check that the heatsink is correctly seated and thermal paste is properly applied and make sure you don't obstruct any air vents, both admission and exhaust.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#8 2017-04-23 04:20:07

hipmanbro
Member
Registered: 2016-09-26
Posts: 3

Re: Understanding hardware errors in dmesg

Hmm.. this seems to be occurring to some other new lenovo laptops.

Found two other people in similar situation as us on the internet:
https://abridge2devnull.com/
https://np.reddit.com/r/thinkpad/commen … ure_above/

I'll look into redoing the thermal paste, but not sure how feasible it is.

Offline

#9 2017-04-23 19:23:33

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Understanding hardware errors in dmesg

If it is a new machine then I would assume that this must be some other problem, I would not expect a new machine to need new thermal paste.

It might be a model specific problem though. With my thinkpad e560 I can leave it compiling the kernel with 4 jobs at the same time and I do not get  any mce errors but obviously the cpu speed will not be the same as the turbo speed.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB