You are not logged in.

#1 2010-05-05 20:52:18

NightOwl
Member
Registered: 2010-05-05
Posts: 6

[SOLVED] Kernel 2.6.33 Unexplained Crashing

Hey,

First off, I'm new to the forums, although I've been a longtime reader to solve my other problems. However, this past weekend I encountered a problem that I haven't been able to solve with the questions already posted and answered.

I seem to be having Kernel panics of sorts, although no messages appear in single user mode. No "aieee" or anything of the like. I believe it's a module or hardware problem, and I'll post my lsmod and lspci below. I have verified it only crashes with Kernel 2.6.33, if I downgrade packages (down all the way to 2.6.26 or so) it doesn't freeze up. To be as simplistic as possible here are my symptoms:

Boot computer. All is well.
No extra processes open (ps aux is below. This is actually the PS AUX of a boot that crashed in minutes)
Modules Autoloaded (never caused a problem before...)
SSH working fine (This is a headless box, no GUI, etc)
EDIT: No wifi on this box, just nice classic 100Mbps Ethernet

This lasts for anywhere between 8 minutes to 3 and a half hours or so. But for the last 20 boots, every single time it's froze at some point. It freezes as follows:

No SSH Response
All open SSH sessions lost
Physically Connected Keyboard (PS/2) doesn't respond. NumLock won't turn on or off. Just sits there.
CPU fan goes high, although my "top" command on a crashing system showed CPU idle at 98.9 percent right before crash
If I plug in VGA monitor to box directly, and wait for crash, no warning messages appear. Just la di da all is well and... BOOM crash no keyboard or monitor response.

I believe this must be a kernel issue of some sort, now my hardware conflicts or some module is conflicting... I'm not sure. That's why I'm here! See all my output below. And if there's any log files you'd like to see, if you suspect there might be something there, ask. I was unable to find any errors in any log, actually they seemed to suggest that everything was running fine. But I'm not nearly as educated as all you, so maybe I'm looking in the wrong place.

Thanks in advance, hopefully we can solve this!

LS MOD output

Module                  Size  Used by
ext2                   55924  1 
b44                    26761  0 
ssb                    39339  1 b44
snd_intel8x0           22200  0 
snd_seq_dummy           1067  0 
mmc_core               45599  1 ssb
snd_ac97_codec         87943  1 snd_intel8x0
i915                  258219  0 
pcmcia                 26354  1 ssb
drm_kms_helper         21732  1 i915
snd_seq_oss            24984  0 
ac97_bus                 750  1 snd_ac97_codec
pcmcia_core            25699  1 pcmcia
drm                   130130  2 i915,drm_kms_helper
mii                     3186  1 b44
snd_seq_midi_event      4484  1 snd_seq_oss
i2c_algo_bit            4283  1 i915
ppdev                   4850  0 
snd_seq                41656  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
snd_seq_device          4349  3 snd_seq_dummy,snd_seq_oss,snd_seq
uhci_hcd               19244  0 
video                  15449  1 i915
parport_pc             27735  1 
ehci_hcd               31167  0 
snd_pcm_oss            33442  0 
snd_mixer_oss          14356  1 snd_pcm_oss
snd_pcm                57767  3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss
snd_timer              15629  2 snd_seq,snd_pcm
snd                    42562  9 snd_intel8x0,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer
iTCO_wdt                8537  0 
i2c_i801                7058  0 
shpchp                 26528  0 
intel_agp              22681  1 
soundcore               5017  1 snd
iTCO_vendor_support     1453  1 iTCO_wdt
pci_hotplug            23303  1 shpchp
snd_page_alloc          5873  2 snd_intel8x0,snd_pcm
agpgart                23119  2 drm,intel_agp
output                  1436  1 video
i2c_core               14791  5 i915,drm_kms_helper,drm,i2c_algo_bit,i2c_i801
usbcore               119636  3 uhci_hcd,ehci_hcd
button                  3702  1 i915
lp                      6652  0 
thermal                 9614  0 
processor              25806  0 
dcdbas                  4376  0 
parport                25371  3 ppdev,parport_pc,lp
sg                     20820  0 
evdev                   6716  0 
pcspkr                  1347  0 
rtc_cmos                7546  0 
rtc_core               11851  1 rtc_cmos
rtc_lib                 1482  1 rtc_core
ext4                  302677  2 
mbcache                 4278  2 ext2,ext4
jbd2                   63651  1 ext4
crc16                   1041  1 ext4
sd_mod                 25183  5 
ata_generic             2171  0 
pata_acpi               2296  0 
ata_piix               17884  4 
libata                138071  3 ata_generic,pata_acpi,ata_piix
scsi_mod               79404  3 sg,sd_mod,libata

LSPCI output

00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02)
00:06.0 System peripheral: Intel Corporation 82865G/PE/P Processor to I/O Memory Interface (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:01.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)

PS AUX Output

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.5  0.2   1708   572 ?        Ss   12:37   0:00 init [3]  
root         2  0.0  0.0      0     0 ?        S    12:37   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    12:37   0:00 [migration/0]
root         4  0.0  0.0      0     0 ?        S    12:37   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    12:37   0:00 [watchdog/0]
root         6  0.0  0.0      0     0 ?        S    12:37   0:00 [events/0]
root         7  0.0  0.0      0     0 ?        S    12:37   0:00 [cpuset]
root         8  0.0  0.0      0     0 ?        S    12:37   0:00 [khelper]
root         9  0.0  0.0      0     0 ?        S    12:37   0:00 [netns]
root        10  0.0  0.0      0     0 ?        S    12:37   0:00 [async/mgr]
root        11  0.0  0.0      0     0 ?        S    12:37   0:00 [pm]
root        12  0.0  0.0      0     0 ?        S    12:37   0:00 [sync_supers]
root        13  0.0  0.0      0     0 ?        S    12:37   0:00 [bdi-default]
root        14  0.0  0.0      0     0 ?        S    12:37   0:00 [kblockd/0]
root        15  0.0  0.0      0     0 ?        S    12:37   0:00 [kacpid]
root        16  0.0  0.0      0     0 ?        S    12:37   0:00 [kacpi_notify]
root        17  0.0  0.0      0     0 ?        S    12:37   0:00 [kacpi_hotplug]
root        18  0.0  0.0      0     0 ?        S    12:37   0:00 [kseriod]
root        20  0.0  0.0      0     0 ?        S    12:37   0:00 [khungtaskd]
root        21  0.0  0.0      0     0 ?        S    12:37   0:00 [kswapd0]
root        22  0.0  0.0      0     0 ?        SN   12:37   0:00 [ksmd]
root        23  0.0  0.0      0     0 ?        S    12:37   0:00 [aio/0]
root        24  0.0  0.0      0     0 ?        S    12:37   0:00 [crypto/0]
root       340  0.0  0.0      0     0 ?        S    12:37   0:00 [ata/0]
root       341  0.0  0.0      0     0 ?        S    12:37   0:00 [ata_aux]
root       342  0.0  0.0      0     0 ?        S    12:37   0:00 [scsi_eh_0]
root       343  0.0  0.0      0     0 ?        S    12:37   0:00 [scsi_eh_1]
root       344  0.0  0.0      0     0 ?        S    12:37   0:00 [scsi_eh_2]
root       347  0.0  0.0      0     0 ?        S    12:37   0:00 [scsi_eh_3]
root       378  0.0  0.0      0     0 ?        S    12:37   0:00 [jbd2/sda3-8]
root       379  0.0  0.0      0     0 ?        S    12:37   0:00 [ext4-dio-unwrit]
root       410  0.0  0.2   1972   708 ?        S<s  12:37   0:00 /sbin/udevd --daemon
root       607  0.0  0.0      0     0 ?        S    12:37   0:00 [ksuspend_usbd]
root       608  0.0  0.0      0     0 ?        S    12:37   0:00 [khubd]
root       680  0.0  0.0      0     0 ?        S    12:37   0:00 [i915]
root       687  0.0  0.0      0     0 ?        S    12:37   0:00 [kmmcd]
root       801  0.0  0.2   1968   728 ?        S<   12:37   0:00 /sbin/udevd --daemon
root       808  0.0  0.2   1968   728 ?        S<   12:37   0:00 /sbin/udevd --daemon
root       814  0.0  0.0      0     0 ?        S    12:37   0:00 [flush-8:0]
root       821  0.0  0.0      0     0 ?        S    12:37   0:00 [jbd2/sda5-8]
root       822  0.0  0.0      0     0 ?        S    12:37   0:00 [ext4-dio-unwrit]
root       932  0.0  0.1   5024   492 ?        S    12:37   0:00 supervising syslog-ng
root       933  0.0  0.7   5204  1852 ?        Ss   12:37   0:00 /usr/sbin/syslog-ng
root       964  0.0  0.1   1892   348 ?        Ss   12:38   0:00 /sbin/dhcpcd -q eth0
root       982  0.0  0.4   6520  1060 ?        Ss   12:38   0:00 /usr/sbin/sshd
root       991  0.0  0.2   1756   588 ?        Ss   12:38   0:00 /usr/sbin/crond -S -l info
root       995  0.0  0.2   1708   516 tty1     Ss+  12:38   0:00 /sbin/agetty -8 38400 tty1 linux
root       996  0.0  0.2   1708   520 tty2     Ss+  12:38   0:00 /sbin/agetty -8 38400 tty2 linux
root       997  0.0  0.2   1708   516 tty3     Ss+  12:38   0:00 /sbin/agetty -8 38400 tty3 linux
root       998  0.0  0.2   1708   512 tty4     Ss+  12:38   0:00 /sbin/agetty -8 38400 tty4 linux
root       999  0.0  0.2   1708   520 tty5     Ss+  12:38   0:00 /sbin/agetty -8 38400 tty5 linux
root      1000  0.0  0.2   1708   516 tty6     Ss+  12:38   0:00 /sbin/agetty -8 38400 tty6 linux
root      1001  0.0  1.0   9248  2712 ?        Ss   12:39   0:00 sshd: genuser [priv]
root      1003  0.0  1.0   9248  2732 ?        Ss   12:39   0:00 sshd: genuser [priv]
genuser   1005  0.0  0.5   9248  1476 ?        S    12:39   0:00 sshd: genuser@notty
genuser   1006  0.0  0.2   1876   748 ?        Ss   12:39   0:00 /usr/lib/ssh/sftp-server
genuser   1007  0.0  0.5   9248  1444 ?        S    12:39   0:00 sshd: genuser@pts/0
genuser   1008  0.0  0.7   4744  1856 pts/0    Ss   12:39   0:00 -bash
genuser   1015  0.0  0.4   4052  1028 pts/0    R+   12:40   0:00 ps aux

Recent Update: Ran memtest86+ not a single issue there. So it's doubtful my memory is the problem.

Solution: What I found to be the problem is the way the kernel does power management. I'm not sure what exactly it does wrong, probably forgets to tell the CPU fan to rev up or something and then the CPU heats to a point of no return and crashes. I solved this by going into /boot/grub/menu.lst and simply adding "acpi=off" to the end of my kernel line. It's a catch all for power management problems, I admit, but I've been running for the past 48 hours (with test reboots every 3 hours to make sure it wasn't luck) without a single crash. If you're experiencing random crashes that give no indication of their cause and leave no trace other than a completely frozen system and perhaps a CPU fan on HIGH, try adding "acpi=off" to the end of your kernel line in /boot/grub/menu.lst. Hopefully this works for some of you too!

Last edited by NightOwl (2010-05-09 13:41:10)

Offline

#2 2010-05-05 21:13:40

xstaticxgpx
Member
Registered: 2008-10-22
Posts: 48

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

I've experienced essentially the same thing, except I didn't notice it until one of the 2.6.34-rc's (3 i think, i don't remember - but it still happens in 6)

Offline

#3 2010-05-06 07:36:50

cngn
Member
Registered: 2010-03-20
Posts: 65

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

maybe there is a problem how kernel handles ram, next time you will get multiple freezes in a row, power off the machine, unplug it, make some clean up (maybe overheating problem caused by dust) and swap the ram slots

that's what i did after some x freezes in a row, the freezes were random, many times I didn't even see my desktop and now I'm 4 days without a freeze

Offline

#4 2010-05-06 10:47:47

Isola
Member
Registered: 2010-02-02
Posts: 99

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

Same problem here. I thought it could be because of the nvidia driver not getting along so well with xorg 1.8, but since you got panics too, on a headless box, I guess not.

Another reason I could think of is that I took out the 2nd disk in my raid-1 array, which gives me some messages at boot. "1 out of 2 mirrors active..."
That shouldn't be the problem. I'll know once I put the 2nd disk back today.

Edit: Disk is back. No panics yet.

Last edited by Isola (2010-05-06 16:54:01)

Offline

#5 2010-05-06 11:19:45

NightOwl
Member
Registered: 2010-05-05
Posts: 6

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

cngn wrote:

maybe there is a problem how kernel handles ram, next time you will get multiple freezes in a row, power off the machine, unplug it, make some clean up (maybe overheating problem caused by dust) and swap the ram slots

Invalid :I'll make sure it's not some memory addressing issue later today, swap out the ram, but I've done that in the past and nothing's changed. Also, you mentioned it froze while loading or in X windows, which might suggest heat problems with whatever video device you've got. I (hopefully) wouldn't expect overheating due to core processes and SSH. But of course, thank you, I'll test your solution.

Update: Still a problem. It's definitely not RAM it seems. I also swapped my PSU with another, just for fun, it's not that either. Unless the kernel does something bad to power management.

Last edited by NightOwl (2010-05-06 18:04:03)

Offline

#6 2010-05-06 15:11:20

numasan
Member
Registered: 2009-11-13
Posts: 26

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

I have freezing issues too with kernel 2.6.33, with same issues (no messages or any signs of why).

My problem is not random though. The machine hangs everytime precisely 10 min. after boot, the last ~8 reboots. I thought maybe 'cron' started something the kernel didn't like, but this is pretty much a vanilla install (looked in crontab but nothing set to run) and very basic with no X, and only pureFTPd and SSH running. I tried stopping the crond and pureFTPd services, leaving SSH since this is a headless machine.

I haven't done extensive testing since this is a test-machine and because of other priorities, so no swapping of RAM modules (think I will run a memtest though). It's just "funny" the machine freezes precisely every 10 min. It has run for days before the upgrade to 2.6.33.

Sorry for not helping with something constructive...

Offline

#7 2010-05-06 18:07:11

NightOwl
Member
Registered: 2010-05-05
Posts: 6

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

Numasan, I'm actually sort of on your boat here, because although I said 8 minutes to 3 hours, of the past 50 test boots (yes, too much time on my hands) it's frozen between 9:30 and 11 minutes 47 times. That's a 94% failure rate around 10 minutes. I'd wager the other 6% is pure luck, but the job running 10 minutes in idea is interesting. Although I've checked all tasks and nothing runs anywhere between 5 minutes and 30 minutes on my system.

Offline

#8 2010-05-09 13:41:24

NightOwl
Member
Registered: 2010-05-05
Posts: 6

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

Solution appears at the bottom of the first post.

Offline

#9 2010-05-11 14:25:18

numasan
Member
Registered: 2009-11-13
Posts: 26

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

I turned off ACPI and now the machine continues to run (~40 min so far). Before that I tried 'hpet=disable' suggested in another thread about 2.6.33 freezing, but that did not help, so turning off ACPI it is.

Thank you NightOwl!

Offline

#10 2010-05-11 18:48:38

nawitus
Member
Registered: 2009-05-11
Posts: 112

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

I have the same exact problem, and Ubuntu (my backup OS) seems to work perfectly. I know it's a heating issue because a) after a crash if I start without letting it cool it won't start again and b) the fans are unusually quiet, and the temperature reading is still so low. I usually have 60c, but now it seems to be 40c, and rises upto 60c at which point it crashes.

So this issue seems widespread and extremely critical. Hope someone is working on it somewhere. I have HP Pavilion dv6544eo laptop. I'll now try the "acpi=off" solution, which I dislike (don't remember the exact reason but I think acpi should be on if possible).

Update: tried it, and I get the error: 'fatal error inserting powernow_k8 into ... not found', and after that loads of errors. X doesn't start with error "Id "x" spawning too fast, wait 5 min" etc.

Also, my "backup" OS is broken too big_smile. sudo or su root doesn't work (sudo: /etc/sudoers is mode 0446, should be 0440). Maybe I have to hack my own system..

Last edited by nawitus (2010-05-11 19:11:55)

Offline

#11 2010-05-11 20:05:28

NightOwl
Member
Registered: 2010-05-05
Posts: 6

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

Numasan: Happy to be of assistance in solving your problem, which was mine exactly!
Nawitus: Sorry to hear that quick fix didn't do it for you mate, I'm quite sure powernow_k8 is a module that relies on ACPI to run, as it controls the "Cool 'N Quiet" function on some processors. However, If I might suggest one thing, try to blacklist powernow_k8 by going on live CD and changing the MODULE = () to MODULE = ( !powernow_k8 ) in /etc/rc.conf. That'll blacklist it and maybe prevent the error. Again, I'm speculating, and sorry to not fix your problem...

Offline

#12 2010-05-11 20:21:20

nawitus
Member
Registered: 2009-05-11
Posts: 112

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

Well, I tried to boot without "cpufreq" and it seems to work. Then I put cpufreq back and rebooted. Now I'm just waiting if I'll get a crash. Seems like a heisenbug.

Offline

#13 2010-05-12 19:23:17

nawitus
Member
Registered: 2009-05-11
Posts: 112

Re: [SOLVED] Kernel 2.6.33 Unexplained Crashing

Sorry for bumping, but something is seriously wrong. My laptop worked fine for 6 hours of pretty much idle & transmission. Then I had to reboot, after which the system froze after 10 sec after my desktop appeared, with the caps lock blinking. Then I could not start my whole laptop, since all the lights went up but the screen was completely black. Then after I waited for 10min I could now boot again, but the fans are unusually quiet. I'm inches away from abandoning Arch.

Currently Ubuntu shows no problems whatsoever.

EDIT: I enabled testing and installed .4 kernel, and the problem seems to be fixed so far.

Last edited by nawitus (2010-05-13 20:05:01)

Offline

Board footer

Powered by FluxBB