You are not logged in.

#1 2013-01-27 10:34:34

bucaneer
Member
Registered: 2011-06-08
Posts: 21

[SOLVED] Troubleshooting kernel panics since 3.4

All kernel versions since 3.4 have been absolutely unstable on my machine. A panic might happen right during bootup, or if it boots up, there will be random segfaults all over the place, followed by a panic no later than 20 minutes or so from bootup. I have been forced to use kernel 3.3.7-1-ck, which runs perfectly fine. I haven't done any particularly rigorous diagnostics of the problem because watching the system fail repeatedly is not very fun when I can just revert to a working setup. However, these are the observations I've come up with over the months:

- a typical panic dump looks like this (although the "spurious ACK" like is new - in the few other screenshots I've got there is nothing prior to the hardware errors, or the first line is no longer on screen);

- it happens with the same symptoms on LiveCDs - I've tried Arch 2012.10.06, Ubuntu 12.10 and Mint 14.1 disks, all panicking during bootup or shortly afterwards (Arch 2010.05 loads fine);

- I've run Memtest overnight since the problems started and it came up clean;

- as mentioned, kernel 3.3.7 works fine, as does Windows 7 that I double-boot with;

- it is not GPU related - I previously had other problems with Radeon HD4850 and suspected it might be involved here, but the problem persists after a recent upgrade to GTX 650Ti;

- going by my pacman.log, I've definitely tried using (and then reverted to 3.3.7) kernels 3.4.2-1, 3.4.3-1, 3.4.4-2, 3.5.2-1, 3.6.3-1, 3.7.1-3, 3.7.4-1 from repo-ck, as well as other versions from core repo that I can't reliably keep track of;

- system info:
Motherboard: Asus P5Q-E
CPU: Intel Core2Duo E8400
RAM: 5GB in 3 sticks (2+2+1)
GPU: GTX 650Ti (Radeon HD4850 until a few weeks ago)
Storage: 2 SATA HDDs, 1 SATA SSD
IDE optical drive
Sound card: Creatve Audigy LS
External devices: PS/2 keyboard, USB mouse, USB Wacom Bamboo tablet, USB external HDD

Any suggestions how to further investigate and/or solve this problem are welcome. I'll be happy to provide any other information that might be useful.

EDIT: blacklisting acpi-cpufreq module did the trick.

Last edited by bucaneer (2013-01-28 13:48:16)

Offline

#2 2013-01-27 11:39:39

the sad clown
Member
From: 192.168.0.X
Registered: 2011-03-20
Posts: 837

Re: [SOLVED] Troubleshooting kernel panics since 3.4

Have you tried running the error message through mcelog with the --ascii option?

Here is the man page for mcelog: http://www.mcelog.org/manpage.html

You can download mcelog from community repository.

Last edited by the sad clown (2013-01-27 11:42:09)


I laugh, yet the joke is on me

Offline

#3 2013-01-27 11:54:16

bucaneer
Member
Registered: 2011-06-08
Posts: 21

Re: [SOLVED] Troubleshooting kernel panics since 3.4

Running mcelog for one of the error messages in the screenshot gives me this:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 
TIME 1359226311 Sat Jan 26 20:51:51 2013
STATUS b200001024000e0f MCGSTATUS 0
PROCESSOR 0:1067a TIME 1359226311 SOCKET 0 APIC microcode a07

It would be entirely reasonable to believe it's faulty hardware, if not for the fact that the same hardware functions normally with different software...

Offline

#4 2013-01-27 14:06:08

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: [SOLVED] Troubleshooting kernel panics since 3.4

Random segfaults followed by MCE and all of this only on certain kernel versions?

No idea what could be causing this. I'd try to run git bisect, but it takes few hours and requires some care since IIRC there was a bug around v3.4 which could damage ext4 filesystems during frequent reboots.

Offline

#5 2013-01-28 00:54:16

the sad clown
Member
From: 192.168.0.X
Registered: 2011-03-20
Posts: 837

Re: [SOLVED] Troubleshooting kernel panics since 3.4

What sort of peripheral hardware do you have attached?  There might be something that  newer software isn't playing nicely with.  This might be something to investigate.

The only other thing I can think of is to check /proc/cpuinfo in the off chance that your error message has a reason (missing flag?) for mentioning cpu0.

Last edited by the sad clown (2013-01-28 00:55:10)


I laugh, yet the joke is on me

Offline

#6 2013-01-28 10:28:50

bucaneer
Member
Registered: 2011-06-08
Posts: 21

Re: [SOLVED] Troubleshooting kernel panics since 3.4

@the sad clown: all hardware that is normally connected to the PC is listed in the first post. Keyboard and mouse are both very generic and unremarkable, and disconnecting the tablet and external HDD has no effect.

/proc/cpuinfo when running 3.3.7-1-ck:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
microcode	: 0xa07
cpu MHz		: 2999.564
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips	: 6001.22
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
microcode	: 0xa07
cpu MHz		: 2999.564
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips	: 6001.22
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

/proc/cpuinfo when running 3.7.4-1:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
microcode	: 0xa07
cpu MHz		: 670.000
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 6002.82
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
microcode	: 0xa07
cpu MHz		: 670.000
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 6002.82
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

First of all, it appears that 3.7.4 doesn't recognize CPU frequency right, showing 670 MHz instead of 3 GHz. Also, 3.7.4 has a "dtherm" flag instead of "dts". No idea what to make of it.

@mich41: thanks for the suggestion. I may try that later, when I have a bigger block of time to spare.

Offline

#7 2013-01-28 11:34:36

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: [SOLVED] Troubleshooting kernel panics since 3.4

cpufreq?

Maybe try a kernel without cpufreq or disable frequency scaling in BIOS (if possible).


BTW, dts vs dtherm: http://lkml.indiana.edu/hypermail/linux … 00109.html

Offline

#8 2013-01-28 13:47:26

bucaneer
Member
Registered: 2011-06-08
Posts: 21

Re: [SOLVED] Troubleshooting kernel panics since 3.4

I wanted to say I never used cpufreq (after all, it's a desktop PC and I'm pretty sure there aren't even options regarding this in BIOS), but checking out the Arch wiki proved me wrong: "Note: As of kernel 3.4; the native cpu module is loaded automatically". Sure enough, trying to load acpi-cpufreq module on 3.3.7 messed up the CPU frequency as reported by /proc/cpuinfo (bumped all the way to 5 GHz) and led to a panic shortly afterwards. Blacklisting acpi-cpufreq through modprobe has let me to upgrade to 3.7.4-1-ck and run it for a while stably. Marking the thread as solved, although I will need to remember the blacklisting trick if I ever need to use a LiveCD or install a different OS...

Thanks a lot for your help!

Offline

Board footer

Powered by FluxBB