You are not logged in.
I have been experiencing a lot of hard locks and kernel crashes/oops(es?) lately. This computer has always had random freezes in both WinXP and Linux (various distributions -- including live cds).
Here is my latest crash (from dmesg):
PCI: setting IRQ 12 as level-triggered
PCI: Found IRQ 12 for device 0000:02:05.0
Bad pte = dfbdf272, process = ???, vm_flags = 3f3f, vaddr = b7e6d0ac
Pid: 1470, comm: load-modules.sh Not tainted 2.6.25-ARCH #1
[<c016e775>] handle_mm_fault+0x2c5/0x850
[<c011d0c9>] do_page_fault+0x2a9/0x790
[<c014169d>] hrtimer_start+0xdd/0x1e0
[<c0125a16>] hrtick_set+0xc6/0x140
[<c03053ef>] schedule+0x3af/0x850
[<c01437e5>] getnstimeofday+0x35/0xe0
[<c01419e8>] ktime_get+0x18/0x40
[<c0305a76>] preempt_schedule+0x56/0x70
[<c012386f>] wake_up_new_task+0x8f/0xd0
[<c0128643>] do_fork+0xd3/0x2b0
[<c011ce20>] do_page_fault+0x0/0x790
[<c0307d52>] error_code+0x72/0x78
[<c0300000>] cpu_init+0x220/0x29a
=======================
VM: killing process load-modules.sh
lshwd output:
00:00.0 Class 0600: Intel Corp.|82865G [Springdale-G] Chipset Host Bridge (intel-agp)
00:01.0 Class 0604: Intel Corp.|82865G/PE/P Processor to AGP Controller (unknown)
00:1d.0 Class 0c03: Intel Corp.|USB Controller (uhci_hcd)
00:1d.1 Class 0c03: Intel Corp.|USB Controller (uhci_hcd)
00:1d.2 Class 0c03: Intel Corp.|USB Controller (uhci_hcd)
00:1d.3 Class 0c03: Intel Corp.|82801EB USB EHCI Controller #2 (uhci_hcd)
00:1d.7 Class 0c03: Intel Corp.|USB Enhanced Controller (ehci-hcd)
00:1e.0 Class 0604: Intel Corp.|82820 815e (Camino 2) Chipset PCI (hw_random)
00:1f.0 Class 0601: Intel Corp.|82801EB ISA Bridge (LPC) (i810-tco)
00:1f.1 Class 0101: Intel Corp.|82801EB ICH5 IDE (ata_piix)
00:1f.2 Class 0101: Intel Corp.|82801EB ICH5 IDE (SATA) (ata_piix)
00:1f.3 Class 0c05: Intel Corp.|82801EB SMBus (i2c-i801)
01:00.0 Class 0300: nVidia Corp.|NV18 GeForce4 MX440 AGP 8x (nv)
02:05.0 Class 0401: Avance Logic Inc.|ALS4000 Audio Chipset (snd-als4000)
02:07.0 Class 0200: D-Link System Inc.|DFE 530 TX+ Fast Ethernet Adapter (8139too)
02:0a.0 Class 0c03: VIA Technologies Inc.|VT82C586B USB (uhci_hcd)
02:0a.1 Class 0c03: VIA Technologies Inc.|VT82C586B USB (uhci_hcd)
02:0a.2 Class 0c03: VIA Technologies Inc.|VT8235 USB Enhanced Controller (ehci-hcd)
relevant entry in grub's menu.lst:
# (0) Arch Linux-ARCH
title Arch Linux
root (hd0,0)
kernel /boot/vmlinuz26 root=/dev/sda1 ro nosmp noapic acpi=off pci=routeirq ide-legacy
initrd /boot/kernel26.img
I added nosmp and the other parameters at the install/rescue cd's suggestion. I tried using the microcode package (I have a P4 HT) as suggested by someone in another thread.
I've found that the freezes will happen at the beginning of heavy CPU or disk usage (such as compilation or pacman db parsing). The hd is not the original one from this system (it was a hand-me-down), so I don't think that's the problem. The video card is different and the RAM is new. The only components that are still around are:
P4 HT processor (2.60 ghz)
Albatron PX865PE II
Enermax power supply
Any ideas? It also says "bad eip value" when the system has kernel crashes (especially on boot).
Last edited by mrbug (2008-06-21 03:08:35)
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline
If its happening in both Windows + Linux Id say it is faulty memory and/or motherboard. If you have 2 memory sticks I would try running memtest on both of them individually. If that doesnt show up faults its possibly a faulty motherboard, unfortunately the only way to test that is to replace it with a motherboard of the same model and see if the faults still happen.
Another less likely possibility is the BIOS doing something screwy you could try updating that (though not before testing your memory since if thats bad it could corrupt the BIOS on the transfer).
These kind of problems really suck because it can really be anything, hardware incompatibility (though thats virtually unheard of now days), bad PSU (enermax are usually good but, most products from any manufacturer have around 5-10% DOA/fault rate), check the 12v rails in the bios if you can and make sure theyre sitting nicely around 12v, the other rails rarely fall off even when the 12v ones are really bad.
I see you said the RAM is new, which would suggest its the motherboard, but test the ram first for sanitys sake, I cant count the amount of DOA/faulty ram Ive tested before installing and had it fail.
Last comment Albatron motherboards have notoriously weak capacitors so thats another finger to point at the motherboard. Check the capacitors for any obvious "popping" (white/brown stuff leaking out of them or even the surfaces being slightly raised).
Offline
Looks in dmesg, there is advise to add 'pci=routeirq'. According dmesg, `02:05.0 Class 0401: Avance Logic Inc.|ALS4000 Audio Chipset (snd-als4000)` is failed. How about disable it?
Offline
I'm one step ahead of you (Shazeal).. I've already memtested the memory and it's clean. I'll need to take another look to check for bad capacitors. I didn't see any while checking the usual suspects, but there could be a bad one lurking under a cable.
Output from sensors:
+3.3V: +3.30 V (min = +2.82 V, max = +3.79 V)
+5V: +4.92 V (min = +0.00 V, max = +3.87 V) ALARM
+12V: +12.04 V (min = +0.06 V, max = +7.84 V) ALARM
-12V: -12.28 V (min = -11.95 V, max = -13.27 V) ALARM
-5V: -5.70 V (min = -7.66 V, max = -6.10 V) ALARM
I'm assuming that the 12v voltages are within the safe range, even though it says ALARM. Am I wrong?
Metal:
I had to read the output of dmesg a few times, but now I see what you're saying. I'd hate to lose the ability to have sound, but if it means that my computer will stop freezing, I'll remove the card. Of course, I've never been able to get the onboard sound (envy24 chipset) to work!
I'll pull out the card as soon as I get the chance and report back.
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline
Yea that power supply is very healthy. Down to a hardware conflict or just the motherboard itself, Im guessing its not under 3 years old being an 865
Id try yanking all your addon cards and seeing if it still happens, if it does, might be upgrade time
Offline
I've removed the card. I had one weird kernel crash, but that was during a coreutils upgrade. Maybe that was the cause...
I'm going to keep pushing until I figure out what the problem is, but I definitely still want to upgrade.
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline
Latest message while mkinitcpio was running:
Linux agpgart interface v0.103
nvidia: module license 'NVIDIA' taints kernel.
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 10 (level, low) -> IRQ 10
NVRM: loading NVIDIA Linux x86 Kernel Module 96.43.05 Tue Jan 22 19:36:58 PST 2008
BUG: unable to handle kernel paging request at 756e696c
IP: [<c018a137>] flush_old_exec+0x577/0x750
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: nvidia(P) agpgart nfnetlink_queue nfnetlink nf_conntrack_ipv4 iptable_filter ip_tables xt_state nf_conntrack xt_NFQUEUE x_tables w83627hf hwmon_vid 8139too mii i2c_i801 i2c_core pcspkr shpchp pci_hotplug sg evdev thermal processor fan button battery ac snd_als4000 gameport snd_sb_common snd_opl3_lib snd_hwdep snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec ac97_bus snd_ak4114 snd_pt2258 snd_i2c snd_ak4xxx_adda snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore e1000 rtc_cmos rtc_core rtc_lib ext3 jbd mbcache usbhid hid ff_memless usb_storage sr_mod cdrom sd_mod ehci_hcd uhci_hcd pata_acpi usbcore ata_piix ata_generic libata scsi_mod dock
Pid: 9173, comm: dirname Tainted: P (2.6.25-ARCH #1)
EIP: 0060:[<c018a137>] EFLAGS: 00210202 CPU: 0
EIP is at flush_old_exec+0x577/0x750
EAX: 756e696c EBX: 00000000 ECX: 00000500 EDX: df269004
ESI: de546c00 EDI: 00000000 EBP: df269080 ESP: de121da4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process dirname (pid: 9173, ti=de120000 task=de546c00 task.ti=de120000)
Stack: 00000001 de694b00 00000000 df269000 df269004 de710504 de710000 de547134
6e726964 00656d61 c018a39d de121dd4 00000003 de18f878 c035ccb5 de18f840
c01b73be 00000080 00000001 c13d8ec0 dec77000 de694b00 c016421f 00000001
Call Trace:
[<c018a39d>] kernel_read+0x3d/0x60
[<c01b73be>] load_elf_binary+0x36e/0x1aa0
[<c016421f>] get_page_from_freelist+0x2cf/0x4d0
[<c01646ce>] __alloc_pages+0x5e/0x360
[<c0175705>] anon_vma_prepare+0x85/0xe0
[<c016eabd>] handle_mm_fault+0x60d/0x850
[<c016eacb>] handle_mm_fault+0x61b/0x850
[<c0184b45>] do_sync_read+0xd5/0x120
[<c016c60d>] vm_normal_page+0x1d/0x70
[<c016c60d>] vm_normal_page+0x1d/0x70
[<c016d479>] follow_page+0x119/0x1c0
[<c016edd1>] get_user_pages+0xd1/0x2f0
[<c01b7050>] load_elf_binary+0x0/0x1aa0
[<c01898ec>] search_binary_handler+0x15c/0x290
[<c018ab9c>] do_execve+0x21c/0x250
[<c01033c6>] sys_execve+0x46/0x80
[<c01050d8>] sysenter_past_esp+0x6d/0xa5
[<c0300000>] cpu_init+0x220/0x29a
=======================
Code: c3 04 89 e8 e8 eb d6 17 00 89 5c 24 10 bb ff ff ff ff 8b 54 24 10 83 c3 01 89 df c1 e7 05 8b 02 39 38 0f 86 11 01 00 00 8b 40 08 <8b> 34 98 85 f6 74 e0 c7 04 98 00 00 00 00 89 e8 e8 a4 d9 17 00
EIP: [<c018a137>] flush_old_exec+0x577/0x750 SS:ESP 0068:de121da4
---[ end trace 255f477a12072e9b ]---
note: dirname[9173] exited with preempt_count 1
BUG: unable to handle kernel paging request at 6f732e78
IP: [<c012b737>] put_files_struct+0x37/0xb0
*pde = 00000000
Oops: 0000 [#2] PREEMPT SMP
Modules linked in: nvidia(P) agpgart nfnetlink_queue nfnetlink nf_conntrack_ipv4 iptable_filter ip_tables xt_state nf_conntrack xt_NFQUEUE x_tables w83627hf hwmon_vid 8139too mii i2c_i801 i2c_core pcspkr shpchp pci_hotplug sg evdev thermal processor fan button battery ac snd_als4000 gameport snd_sb_common snd_opl3_lib snd_hwdep snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec ac97_bus snd_ak4114 snd_pt2258 snd_i2c snd_ak4xxx_adda snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore e1000 rtc_cmos rtc_core rtc_lib ext3 jbd mbcache usbhid hid ff_memless usb_storage sr_mod cdrom sd_mod ehci_hcd uhci_hcd pata_acpi usbcore ata_piix ata_generic libata scsi_mod dock
Pid: 9173, comm: dirname Tainted: P D (2.6.25-ARCH #1)
EIP: 0060:[<c012b737>] EFLAGS: 00210202 CPU: 0
EIP is at put_files_struct+0x37/0xb0
EAX: 6f732e78 EBX: 00000000 ECX: c1407460 EDX: 00000303
ESI: de546c00 EDI: df399840 EBP: 00200206 ESP: de121bb8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process dirname (pid: 9173, ti=de120000 task=de546c00 task.ti=de120000)
Stack: df269000 0000000b de546c00 c035c5ec 00200206 c012ce7a c036075c de546e11
000023d5 00000001 00000000 00000001 00200206 c0308c6c c0360510 de121c00
de121d6c 00000000 c035c5ec 00200206 c0106e28 de5d5e34 00000000 1f901000
Call Trace:
[<c012ce7a>] do_exit+0x16a/0x6a0
[<c0106e28>] die+0x1b8/0x1c0
[<c011cfd1>] do_page_fault+0x1b1/0x790
[<c018cc23>] do_lookup+0xf3/0x1b0
[<c0197ff3>] __d_lookup+0x143/0x160
[<c011e6cf>] kunmap_atomic+0x3f/0xe0
[<c015e059>] file_read_actor+0xd9/0xf0
[<c0160e3b>] generic_file_aio_read+0x59b/0x630
[<c019cd83>] mntput_no_expire+0x13/0x70
[<c017ebf9>] add_partial+0x19/0x70
[<c0171309>] remove_vma+0x39/0x50
[<c017fd43>] __slab_free+0xd3/0x360
[<c017ebf9>] add_partial+0x19/0x70
[<c0189d8d>] flush_old_exec+0x1cd/0x750
[<c017fd43>] __slab_free+0xd3/0x360
[<c0171309>] remove_vma+0x39/0x50
[<c011ce20>] do_page_fault+0x0/0x790
[<c0307d52>] error_code+0x72/0x78
[<c018007b>] alloc_loc_track+0xab/0xc0
[<c018a137>] flush_old_exec+0x577/0x750
[<c018a39d>] kernel_read+0x3d/0x60
[<c01b73be>] load_elf_binary+0x36e/0x1aa0
[<c016421f>] get_page_from_freelist+0x2cf/0x4d0
[<c01646ce>] __alloc_pages+0x5e/0x360
[<c0175705>] anon_vma_prepare+0x85/0xe0
[<c016eabd>] handle_mm_fault+0x60d/0x850
[<c016eacb>] handle_mm_fault+0x61b/0x850
[<c0184b45>] do_sync_read+0xd5/0x120
[<c016c60d>] vm_normal_page+0x1d/0x70
[<c016c60d>] vm_normal_page+0x1d/0x70
[<c016d479>] follow_page+0x119/0x1c0
[<c016edd1>] get_user_pages+0xd1/0x2f0
[<c01b7050>] load_elf_binary+0x0/0x1aa0
[<c01898ec>] search_binary_handler+0x15c/0x290
[<c018ab9c>] do_execve+0x21c/0x250
[<c01033c6>] sys_execve+0x46/0x80
[<c01050d8>] sysenter_past_esp+0x6d/0xa5
[<c0300000>] cpu_init+0x220/0x29a
=======================
Code: 08 0f 94 c0 84 c0 0f 84 88 00 00 00 8b 04 24 31 db 8b 78 04 eb 09 8d b6 00 00 00 00 83 c3 01 89 d8 c1 e0 05 39 07 76 3c 8b 47 0c <8b> 34 98 85 f6 74 ea 89 dd c1 e5 07 f7 c6 01 00 00 00 74 17 89
EIP: [<c012b737>] put_files_struct+0x37/0xb0 SS:ESP 0068:de121bb8
---[ end trace 255f477a12072e9b ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: dirname/9173/0x00000002
Pid: 9173, comm: dirname Tainted: P D 2.6.25-ARCH #1
[<c030542d>] schedule+0x3ed/0x850
[<c0129763>] release_console_sem+0x1b3/0x1d0
[<c012d373>] do_exit+0x663/0x6a0
[<c0106e28>] die+0x1b8/0x1c0
[<c011cfd1>] do_page_fault+0x1b1/0x790
[<c01291d7>] __call_console_drivers+0x57/0x70
[<c01fad04>] number+0x2b4/0x2c0
[<c01fb76f>] vsnprintf+0x43f/0x790
[<c024ec7f>] vt_console_print+0x23f/0x330
[<c016fe66>] free_pgd_range+0x166/0x1e0
[<c011ce20>] do_page_fault+0x0/0x790
[<c0307d52>] error_code+0x72/0x78
[<c017007b>] sys_mincore+0xeb/0x3e0
[<c012b737>] put_files_struct+0x37/0xb0
[<c012ce7a>] do_exit+0x16a/0x6a0
[<c0106e28>] die+0x1b8/0x1c0
[<c011cfd1>] do_page_fault+0x1b1/0x790
[<c018cc23>] do_lookup+0xf3/0x1b0
[<c0197ff3>] __d_lookup+0x143/0x160
[<c011e6cf>] kunmap_atomic+0x3f/0xe0
[<c015e059>] file_read_actor+0xd9/0xf0
[<c0160e3b>] generic_file_aio_read+0x59b/0x630
[<c019cd83>] mntput_no_expire+0x13/0x70
[<c017ebf9>] add_partial+0x19/0x70
[<c0171309>] remove_vma+0x39/0x50
[<c017fd43>] __slab_free+0xd3/0x360
[<c017ebf9>] add_partial+0x19/0x70
[<c0189d8d>] flush_old_exec+0x1cd/0x750
[<c017fd43>] __slab_free+0xd3/0x360
[<c0171309>] remove_vma+0x39/0x50
[<c011ce20>] do_page_fault+0x0/0x790
[<c0307d52>] error_code+0x72/0x78
[<c018007b>] alloc_loc_track+0xab/0xc0
[<c018a137>] flush_old_exec+0x577/0x750
[<c018a39d>] kernel_read+0x3d/0x60
[<c01b73be>] load_elf_binary+0x36e/0x1aa0
[<c016421f>] get_page_from_freelist+0x2cf/0x4d0
[<c01646ce>] __alloc_pages+0x5e/0x360
[<c0175705>] anon_vma_prepare+0x85/0xe0
[<c016eabd>] handle_mm_fault+0x60d/0x850
[<c016eacb>] handle_mm_fault+0x61b/0x850
[<c0184b45>] do_sync_read+0xd5/0x120
[<c016c60d>] vm_normal_page+0x1d/0x70
[<c016c60d>] vm_normal_page+0x1d/0x70
[<c016d479>] follow_page+0x119/0x1c0
[<c016edd1>] get_user_pages+0xd1/0x2f0
[<c01b7050>] load_elf_binary+0x0/0x1aa0
[<c01898ec>] search_binary_handler+0x15c/0x290
[<c018ab9c>] do_execve+0x21c/0x250
[<c01033c6>] sys_execve+0x46/0x80
[<c01050d8>] sysenter_past_esp+0x6d/0xa5
[<c0300000>] cpu_init+0x220/0x29a
=======================
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline
Try to disable acpi, if it doesn't help and memory is ok, I think, motherboard faults How about upgrade bios? Do you have good cooling?
Offline
I am concerned that maybe my cooling is not good enough, but sensors generally tells me that it's fine. I'm generally getting about 45c for all of the temperature sensors.
I will try disabling acpi, but it seems like my machine runs hot without it.
EDIT: Oh, one more thing... Memtest will run all night (and then some) without locking. It's only when the processor and/or hd are hit hard.
I'm leaning mostly toward the processor being faulty, due to the fact that it can lock when the hd is not being accessed at all.
Sometimes it seems like it locks with mouse movement, but it will also lock when it's not being used.
EDIT2: I just rebooted with the kernel option "noacpi" ... We'll see how it goes. However, the init (or whatever) screen said "loading standard acpi modules" at some point around the normal "loading udev events" line (near the beginning). Do I need to do something like add !acpi to the modules array?
Well, that was great. My computer froze while writing this post... with acpi supposedly off.
Last edited by mrbug (2008-06-19 20:12:55)
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline
I would lean towards CPU issues.. possibly heat related.
Do some CPU stress testing only and see if it locks up. Stress is a good tool from community. Theres also mprime (windows/linux).
Keep an eye on the temps..
If your desperate.. re-seat the ram and CPU. I've seen it work before!
Offline
I'll try mprime tonight...
Having disabled basically everything except for the things that I absolutely need (both modules and daemons), I think that I have to agree. My guess is that the processor was damaged by heat at some point and now it's really sensitive.
I've reseated the ram numerous times, but I was unable to get the processor to come off of the motherboard. I could have just been doing it wrong, though. I have a socket 478 with stock heatsink/fan... I'll look into that now.
<joking on the square> On a completely unrelated topic, anyone want to donate to the new computer fund? =-) </joking on the square>
EDIT: Could it be possible that the problem in thread http://bbs.archlinux.org/viewtopic.php?id=49135 is the same? It has always locked, but not as often as this. It happened sometime after upgrading to 2.6.25. However, that still doesn't explain why it will freeze/segfault/panic when I'm using a live cd.
Last edited by mrbug (2008-06-20 11:37:23)
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline
You are kinda SOL with this whole problem, unless its the ram the blame almost always falls on the motherboard or power supply.
Either way its a real bitch to track the problem down without some spare parts to test all the components.
That thread is related to ASUS laptops, and as you said it happened before 2.6.25 and on windows so that kind of problem is irrelevant anyway.
If it were me I would just bite the bullet and upgrade the system, Ive seen people put up with intermittent faults for years, its just not worth the hair loss. You have a hardware fault, it could be the Motherboard, CPU or even the PSU. Primetests are great for overclocking, they dont test for real hardware faults though.
If the CPU fails, was it the CPU itself, or the Motherboard faulting which caused a CPU error, or the PSU voltage dipping due to overheating causing a CPU fault?
Offline
Yeah, that's what I was thinking/worried about... I believe that the power supply is healthy (at least sensors makes it seem to be that way). The crashes always mention different IRQs, so it's most likely a faulty motherboard. The P4 processor isn't worth keeping, especially now that the core line is available in power-friendly 45nm construction.
dvdtube - download all uploads from a YouTube user and then optionally create a DVD.
(Regular version AUR link / SVN version AUR link)
Offline