You are not logged in.
Hey,
First off, I'm new to the forums, although I've been a longtime reader to solve my other problems. However, this past weekend I encountered a problem that I haven't been able to solve with the questions already posted and answered.
I seem to be having Kernel panics of sorts, although no messages appear in single user mode. No "aieee" or anything of the like. I believe it's a module or hardware problem, and I'll post my lsmod and lspci below. I have verified it only crashes with Kernel 2.6.33, if I downgrade packages (down all the way to 2.6.26 or so) it doesn't freeze up. To be as simplistic as possible here are my symptoms:
Boot computer. All is well.
No extra processes open (ps aux is below. This is actually the PS AUX of a boot that crashed in minutes)
Modules Autoloaded (never caused a problem before...)
SSH working fine (This is a headless box, no GUI, etc)
EDIT: No wifi on this box, just nice classic 100Mbps Ethernet
This lasts for anywhere between 8 minutes to 3 and a half hours or so. But for the last 20 boots, every single time it's froze at some point. It freezes as follows:
No SSH Response
All open SSH sessions lost
Physically Connected Keyboard (PS/2) doesn't respond. NumLock won't turn on or off. Just sits there.
CPU fan goes high, although my "top" command on a crashing system showed CPU idle at 98.9 percent right before crash
If I plug in VGA monitor to box directly, and wait for crash, no warning messages appear. Just la di da all is well and... BOOM crash no keyboard or monitor response.
I believe this must be a kernel issue of some sort, now my hardware conflicts or some module is conflicting... I'm not sure. That's why I'm here! See all my output below. And if there's any log files you'd like to see, if you suspect there might be something there, ask. I was unable to find any errors in any log, actually they seemed to suggest that everything was running fine. But I'm not nearly as educated as all you, so maybe I'm looking in the wrong place.
Thanks in advance, hopefully we can solve this!
LS MOD output
Module Size Used by
ext2 55924 1
b44 26761 0
ssb 39339 1 b44
snd_intel8x0 22200 0
snd_seq_dummy 1067 0
mmc_core 45599 1 ssb
snd_ac97_codec 87943 1 snd_intel8x0
i915 258219 0
pcmcia 26354 1 ssb
drm_kms_helper 21732 1 i915
snd_seq_oss 24984 0
ac97_bus 750 1 snd_ac97_codec
pcmcia_core 25699 1 pcmcia
drm 130130 2 i915,drm_kms_helper
mii 3186 1 b44
snd_seq_midi_event 4484 1 snd_seq_oss
i2c_algo_bit 4283 1 i915
ppdev 4850 0
snd_seq 41656 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
snd_seq_device 4349 3 snd_seq_dummy,snd_seq_oss,snd_seq
uhci_hcd 19244 0
video 15449 1 i915
parport_pc 27735 1
ehci_hcd 31167 0
snd_pcm_oss 33442 0
snd_mixer_oss 14356 1 snd_pcm_oss
snd_pcm 57767 3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss
snd_timer 15629 2 snd_seq,snd_pcm
snd 42562 9 snd_intel8x0,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer
iTCO_wdt 8537 0
i2c_i801 7058 0
shpchp 26528 0
intel_agp 22681 1
soundcore 5017 1 snd
iTCO_vendor_support 1453 1 iTCO_wdt
pci_hotplug 23303 1 shpchp
snd_page_alloc 5873 2 snd_intel8x0,snd_pcm
agpgart 23119 2 drm,intel_agp
output 1436 1 video
i2c_core 14791 5 i915,drm_kms_helper,drm,i2c_algo_bit,i2c_i801
usbcore 119636 3 uhci_hcd,ehci_hcd
button 3702 1 i915
lp 6652 0
thermal 9614 0
processor 25806 0
dcdbas 4376 0
parport 25371 3 ppdev,parport_pc,lp
sg 20820 0
evdev 6716 0
pcspkr 1347 0
rtc_cmos 7546 0
rtc_core 11851 1 rtc_cmos
rtc_lib 1482 1 rtc_core
ext4 302677 2
mbcache 4278 2 ext2,ext4
jbd2 63651 1 ext4
crc16 1041 1 ext4
sd_mod 25183 5
ata_generic 2171 0
pata_acpi 2296 0
ata_piix 17884 4
libata 138071 3 ata_generic,pata_acpi,ata_piix
scsi_mod 79404 3 sg,sd_mod,libata
LSPCI output
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02)
00:06.0 System peripheral: Intel Corporation 82865G/PE/P Processor to I/O Memory Interface (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:01.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
PS AUX Output
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.5 0.2 1708 572 ? Ss 12:37 0:00 init [3]
root 2 0.0 0.0 0 0 ? S 12:37 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 12:37 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S 12:37 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S 12:37 0:00 [watchdog/0]
root 6 0.0 0.0 0 0 ? S 12:37 0:00 [events/0]
root 7 0.0 0.0 0 0 ? S 12:37 0:00 [cpuset]
root 8 0.0 0.0 0 0 ? S 12:37 0:00 [khelper]
root 9 0.0 0.0 0 0 ? S 12:37 0:00 [netns]
root 10 0.0 0.0 0 0 ? S 12:37 0:00 [async/mgr]
root 11 0.0 0.0 0 0 ? S 12:37 0:00 [pm]
root 12 0.0 0.0 0 0 ? S 12:37 0:00 [sync_supers]
root 13 0.0 0.0 0 0 ? S 12:37 0:00 [bdi-default]
root 14 0.0 0.0 0 0 ? S 12:37 0:00 [kblockd/0]
root 15 0.0 0.0 0 0 ? S 12:37 0:00 [kacpid]
root 16 0.0 0.0 0 0 ? S 12:37 0:00 [kacpi_notify]
root 17 0.0 0.0 0 0 ? S 12:37 0:00 [kacpi_hotplug]
root 18 0.0 0.0 0 0 ? S 12:37 0:00 [kseriod]
root 20 0.0 0.0 0 0 ? S 12:37 0:00 [khungtaskd]
root 21 0.0 0.0 0 0 ? S 12:37 0:00 [kswapd0]
root 22 0.0 0.0 0 0 ? SN 12:37 0:00 [ksmd]
root 23 0.0 0.0 0 0 ? S 12:37 0:00 [aio/0]
root 24 0.0 0.0 0 0 ? S 12:37 0:00 [crypto/0]
root 340 0.0 0.0 0 0 ? S 12:37 0:00 [ata/0]
root 341 0.0 0.0 0 0 ? S 12:37 0:00 [ata_aux]
root 342 0.0 0.0 0 0 ? S 12:37 0:00 [scsi_eh_0]
root 343 0.0 0.0 0 0 ? S 12:37 0:00 [scsi_eh_1]
root 344 0.0 0.0 0 0 ? S 12:37 0:00 [scsi_eh_2]
root 347 0.0 0.0 0 0 ? S 12:37 0:00 [scsi_eh_3]
root 378 0.0 0.0 0 0 ? S 12:37 0:00 [jbd2/sda3-8]
root 379 0.0 0.0 0 0 ? S 12:37 0:00 [ext4-dio-unwrit]
root 410 0.0 0.2 1972 708 ? S<s 12:37 0:00 /sbin/udevd --daemon
root 607 0.0 0.0 0 0 ? S 12:37 0:00 [ksuspend_usbd]
root 608 0.0 0.0 0 0 ? S 12:37 0:00 [khubd]
root 680 0.0 0.0 0 0 ? S 12:37 0:00 [i915]
root 687 0.0 0.0 0 0 ? S 12:37 0:00 [kmmcd]
root 801 0.0 0.2 1968 728 ? S< 12:37 0:00 /sbin/udevd --daemon
root 808 0.0 0.2 1968 728 ? S< 12:37 0:00 /sbin/udevd --daemon
root 814 0.0 0.0 0 0 ? S 12:37 0:00 [flush-8:0]
root 821 0.0 0.0 0 0 ? S 12:37 0:00 [jbd2/sda5-8]
root 822 0.0 0.0 0 0 ? S 12:37 0:00 [ext4-dio-unwrit]
root 932 0.0 0.1 5024 492 ? S 12:37 0:00 supervising syslog-ng
root 933 0.0 0.7 5204 1852 ? Ss 12:37 0:00 /usr/sbin/syslog-ng
root 964 0.0 0.1 1892 348 ? Ss 12:38 0:00 /sbin/dhcpcd -q eth0
root 982 0.0 0.4 6520 1060 ? Ss 12:38 0:00 /usr/sbin/sshd
root 991 0.0 0.2 1756 588 ? Ss 12:38 0:00 /usr/sbin/crond -S -l info
root 995 0.0 0.2 1708 516 tty1 Ss+ 12:38 0:00 /sbin/agetty -8 38400 tty1 linux
root 996 0.0 0.2 1708 520 tty2 Ss+ 12:38 0:00 /sbin/agetty -8 38400 tty2 linux
root 997 0.0 0.2 1708 516 tty3 Ss+ 12:38 0:00 /sbin/agetty -8 38400 tty3 linux
root 998 0.0 0.2 1708 512 tty4 Ss+ 12:38 0:00 /sbin/agetty -8 38400 tty4 linux
root 999 0.0 0.2 1708 520 tty5 Ss+ 12:38 0:00 /sbin/agetty -8 38400 tty5 linux
root 1000 0.0 0.2 1708 516 tty6 Ss+ 12:38 0:00 /sbin/agetty -8 38400 tty6 linux
root 1001 0.0 1.0 9248 2712 ? Ss 12:39 0:00 sshd: genuser [priv]
root 1003 0.0 1.0 9248 2732 ? Ss 12:39 0:00 sshd: genuser [priv]
genuser 1005 0.0 0.5 9248 1476 ? S 12:39 0:00 sshd: genuser@notty
genuser 1006 0.0 0.2 1876 748 ? Ss 12:39 0:00 /usr/lib/ssh/sftp-server
genuser 1007 0.0 0.5 9248 1444 ? S 12:39 0:00 sshd: genuser@pts/0
genuser 1008 0.0 0.7 4744 1856 pts/0 Ss 12:39 0:00 -bash
genuser 1015 0.0 0.4 4052 1028 pts/0 R+ 12:40 0:00 ps aux
Recent Update: Ran memtest86+ not a single issue there. So it's doubtful my memory is the problem.
Solution: What I found to be the problem is the way the kernel does power management. I'm not sure what exactly it does wrong, probably forgets to tell the CPU fan to rev up or something and then the CPU heats to a point of no return and crashes. I solved this by going into /boot/grub/menu.lst and simply adding "acpi=off" to the end of my kernel line. It's a catch all for power management problems, I admit, but I've been running for the past 48 hours (with test reboots every 3 hours to make sure it wasn't luck) without a single crash. If you're experiencing random crashes that give no indication of their cause and leave no trace other than a completely frozen system and perhaps a CPU fan on HIGH, try adding "acpi=off" to the end of your kernel line in /boot/grub/menu.lst. Hopefully this works for some of you too!
Last edited by NightOwl (2010-05-09 13:41:10)
Offline
I've experienced essentially the same thing, except I didn't notice it until one of the 2.6.34-rc's (3 i think, i don't remember - but it still happens in 6)
Offline
maybe there is a problem how kernel handles ram, next time you will get multiple freezes in a row, power off the machine, unplug it, make some clean up (maybe overheating problem caused by dust) and swap the ram slots
that's what i did after some x freezes in a row, the freezes were random, many times I didn't even see my desktop and now I'm 4 days without a freeze
Offline
Same problem here. I thought it could be because of the nvidia driver not getting along so well with xorg 1.8, but since you got panics too, on a headless box, I guess not.
Another reason I could think of is that I took out the 2nd disk in my raid-1 array, which gives me some messages at boot. "1 out of 2 mirrors active..."
That shouldn't be the problem. I'll know once I put the 2nd disk back today.
Edit: Disk is back. No panics yet.
Last edited by Isola (2010-05-06 16:54:01)
Offline
maybe there is a problem how kernel handles ram, next time you will get multiple freezes in a row, power off the machine, unplug it, make some clean up (maybe overheating problem caused by dust) and swap the ram slots
Invalid :I'll make sure it's not some memory addressing issue later today, swap out the ram, but I've done that in the past and nothing's changed. Also, you mentioned it froze while loading or in X windows, which might suggest heat problems with whatever video device you've got. I (hopefully) wouldn't expect overheating due to core processes and SSH. But of course, thank you, I'll test your solution.
Update: Still a problem. It's definitely not RAM it seems. I also swapped my PSU with another, just for fun, it's not that either. Unless the kernel does something bad to power management.
Last edited by NightOwl (2010-05-06 18:04:03)
Offline
I have freezing issues too with kernel 2.6.33, with same issues (no messages or any signs of why).
My problem is not random though. The machine hangs everytime precisely 10 min. after boot, the last ~8 reboots. I thought maybe 'cron' started something the kernel didn't like, but this is pretty much a vanilla install (looked in crontab but nothing set to run) and very basic with no X, and only pureFTPd and SSH running. I tried stopping the crond and pureFTPd services, leaving SSH since this is a headless machine.
I haven't done extensive testing since this is a test-machine and because of other priorities, so no swapping of RAM modules (think I will run a memtest though). It's just "funny" the machine freezes precisely every 10 min. It has run for days before the upgrade to 2.6.33.
Sorry for not helping with something constructive...
Offline
Numasan, I'm actually sort of on your boat here, because although I said 8 minutes to 3 hours, of the past 50 test boots (yes, too much time on my hands) it's frozen between 9:30 and 11 minutes 47 times. That's a 94% failure rate around 10 minutes. I'd wager the other 6% is pure luck, but the job running 10 minutes in idea is interesting. Although I've checked all tasks and nothing runs anywhere between 5 minutes and 30 minutes on my system.
Offline
Solution appears at the bottom of the first post.
Offline
I turned off ACPI and now the machine continues to run (~40 min so far). Before that I tried 'hpet=disable' suggested in another thread about 2.6.33 freezing, but that did not help, so turning off ACPI it is.
Thank you NightOwl!
Offline
I have the same exact problem, and Ubuntu (my backup OS) seems to work perfectly. I know it's a heating issue because a) after a crash if I start without letting it cool it won't start again and b) the fans are unusually quiet, and the temperature reading is still so low. I usually have 60c, but now it seems to be 40c, and rises upto 60c at which point it crashes.
So this issue seems widespread and extremely critical. Hope someone is working on it somewhere. I have HP Pavilion dv6544eo laptop. I'll now try the "acpi=off" solution, which I dislike (don't remember the exact reason but I think acpi should be on if possible).
Update: tried it, and I get the error: 'fatal error inserting powernow_k8 into ... not found', and after that loads of errors. X doesn't start with error "Id "x" spawning too fast, wait 5 min" etc.
Also, my "backup" OS is broken too . sudo or su root doesn't work (sudo: /etc/sudoers is mode 0446, should be 0440). Maybe I have to hack my own system..
Last edited by nawitus (2010-05-11 19:11:55)
Offline
Numasan: Happy to be of assistance in solving your problem, which was mine exactly!
Nawitus: Sorry to hear that quick fix didn't do it for you mate, I'm quite sure powernow_k8 is a module that relies on ACPI to run, as it controls the "Cool 'N Quiet" function on some processors. However, If I might suggest one thing, try to blacklist powernow_k8 by going on live CD and changing the MODULE = () to MODULE = ( !powernow_k8 ) in /etc/rc.conf. That'll blacklist it and maybe prevent the error. Again, I'm speculating, and sorry to not fix your problem...
Offline
Well, I tried to boot without "cpufreq" and it seems to work. Then I put cpufreq back and rebooted. Now I'm just waiting if I'll get a crash. Seems like a heisenbug.
Offline
Sorry for bumping, but something is seriously wrong. My laptop worked fine for 6 hours of pretty much idle & transmission. Then I had to reboot, after which the system froze after 10 sec after my desktop appeared, with the caps lock blinking. Then I could not start my whole laptop, since all the lights went up but the screen was completely black. Then after I waited for 10min I could now boot again, but the fans are unusually quiet. I'm inches away from abandoning Arch.
Currently Ubuntu shows no problems whatsoever.
EDIT: I enabled testing and installed .4 kernel, and the problem seems to be fixed so far.
Last edited by nawitus (2010-05-13 20:05:01)
Offline