You are not logged in.

#1 2011-07-05 23:01:22

exuvo
Member
Registered: 2010-12-07
Posts: 19

New installation - Kernel Issues

Well im out of options for my server, it keeps freezing and spitting out kernel errors after a while. have tried kernel26-2.6.37.5, .38.8, lts-2.6.32.40, .41, kernel26-ck kernel26-lts-ck kernel-ck-k8. all give similar results. memtest(14Hours) and mprime(4Hours) shows no errors. It appears quickly (1-20min) when i put some load on the system (make, torrenting, mkinitcpio). The fact that i cant get 1 action to instantly result in the error has made my own error searching difficult. I have disabled most services to try to narrow down the issue but to no avail.
The hardware has been running fine for 2 years with "Mandriva Linux 2010 x86". I run archlinux on my laptop and last week i decided to install archlinux on the server, the installer went along and everything seemed fine. Then i suddenly go a freeze (same day). During boot i get an error due to udev trying to use mdadm but the binary is not there, but the 6 disk raid still mounts correctly.

The system is fully up to date. Mirror = Server = http://mirror.archlinux.no/$repo/os/x86_64

# pacman -Syyu
:: Synchronizing package databases...
 core                                                  36.0K  633.0K/s 00:00:00 [#############################################] 100%
 extra                                                465.6K    3.5M/s 00:00:00 [#############################################] 100%
 community                                            445.7K    3.4M/s 00:00:00 [#############################################] 100%
 kernel26-ck                                            3.9K   29.8K/s 00:00:00 [#############################################] 100%
:: Starting full system upgrade...
 there is nothing to do

DAEMONS=(hwclock syslog-ng !iptables network !openvpn !sshguard sshd mdadm !netfs !crond samba !sensors !hddtemp !mysqld !eagledns !tomcat6 !svnserve)

System info:
CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ 3GHz
RAM: MemTotal: 6050888 kB
Motherboard: Asus M4A785-M

A few of the stacktraces. If anyone could understand these it would be helpful, as i am unable to:

not a complete lockup could still use ssh and dmesg
[  387.777824] general protection fault: 0000 [#1] SMP 
[  387.786327] last sysfs file: /sys/devices/pci0000:00/0000:00:12.2/usb1/1-6/1-6:1.0/host6/target6:0:0/6:0:0:0/block/sdg/uevent
[  387.786327] CPU 0 
[  387.786327] Modules linked in: ipv6 usb_storage ohci_hcd pata_acpi ata_generic ide_pci_generic pata_atiixp evdev i2c_piix4 asus_atk0110 ehci_hcd atiixp i2c_core shpchp edac_core ide_core usbcore edac_mce_amd k8temp pci_hotplug wmi r8169 thermal button mii processor sg ext4 mbcache jbd2 crc16 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 md_mod sd_mod ahci libata scsi_mod
[  387.868834] Pid: 1301, comm: java Not tainted 2.6.32-lts #1 System Product Name
[  387.868834] RIP: 0010:[<ffffffff8120627f>]  [<ffffffff8120627f>] rb_erase+0x20f/0x310
[  387.868834] RSP: 0018:ffff880194a35e98  EFLAGS: 00010206
[  387.868834] RAX: 0000000000000000 RBX: 00ff8800cf924000 RCX: ffff8801986798e8
[  387.868834] RDX: 00ff8800cf924000 RSI: ffff880192cfb078 RDI: ffff88019784f700
[  387.868834] RBP: ffff880194a35ea8 R08: 0000000000000000 R09: 2222222222222222
[  387.868834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880192cfb078
[  387.868834] R13: ffff880198679840 R14: 000000000000003e R15: ffff880192cfb000
[  387.868834] FS:  00007f1b70449700(0000) GS:ffff880006e00000(0000) knlGS:0000000000000000
[  387.868834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  387.868834] CR2: 00007f1b5d5d2718 CR3: 000000019518d000 CR4: 00000000000006f0
[  387.868834] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  387.868834] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  388.117928] Process java (pid: 1301, threadinfo ffff880194a34000, task ffff8801994f5d70)
[  388.117928] Stack:
[  388.117928]  ffff88019784f700 ffff880192cfb000 ffff880194a35ed8 ffffffff811827de
[  388.117928] <0> ffff880194a35ed8 ffff880198679840 ffff880192cfbb40 0000000000000002
[  388.117928] <0> ffff880194a35f78 ffffffff81183432 ffff880194a35f78 ffff880000000000
[  388.117928] Call Trace:
[  388.117928]  [<ffffffff811827de>] ep_remove+0x5e/0xc0
[  388.117928]  [<ffffffff81183432>] sys_epoll_ctl+0x3b2/0x590
[  388.117928]  [<ffffffff81182630>] ? ep_ptable_queue_proc+0x0/0xc0
[  388.117928]  [<ffffffff81012072>] system_call_fastpath+0x16/0x1b
[  388.117928] Code: 8b 47 08 4c 89 c2 41 83 e0 01 48 83 e2 fc 48 85 c0 48 89 d3 74 0c 48 8b 08 83 e1 03 48 09 d1 48 89 08 48 85 d2 0f 84 ba 00 00 00 <48> 39 7a 10 0f 84 b9 00 00 00 48 89 42 08 e9 8f fe ff ff 4c 8b 
[  388.117928] RIP  [<ffffffff8120627f>] rb_erase+0x20f/0x310
[  388.117928]  RSP <ffff880194a35e98>
[  388.435560] ---[ end trace e9bef56bd87d805e ]---
#Had to copy this by hand so i skipped some parts.

general protection fault: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:00.00/firmware/0000:02:00.0/loading
CPU 1
Modules linked in: ...
Pid: 1224, comm: java Not tianted 2.6.39-ARCH #1 System manufacturer System Product Name/M4A785-M
RIP 0010:[<ffffffff8113d669>]  [<ffffffff8113d669>] kmem_cache_alloc+0x49/0x160
RSP: 0018:ffff88019523dd38 EFLAGS: 00010202
Process java (pid: 1224, threadinfo ffff88019523c000, task ffff880194c846a0)
Stack ffff88019523dda8 0000000000001000 0000000000000000 000080d000000000
ffff88019523df28 0000000000000001 ffff880194dc6180 ffff880037ff3000
00000000ffffff9c 0000000000000041 ffff88019523dda8 ffffffff811532db
Call Trace:
[] get_empty_filp+0x5b/0x170
[] path_openat+0x3d/0x3c0
[] ? putname+0x35/0x50
[] ? user_path_at+0x64/0xa0
[] do_filp_open+0x42/0xa0
[] ? alloc_fd+0xec/0x140
[] do_sys_open+0xf7/0x1d0
[] sys_open+0x20/0x30
[] system_call_fastpath+0x16/0x1b
Code: 75 cc 49 8b 45 00 65 48 03 04 25 10 dc 00 00 48 8b 50 08 4c 8b 20 4d 85 e4 0f 84 fb 00 00 00 49 63 45 20 49 8b 75 00
#Had to copy this by hand so i skipped some parts.
kernel26-lts seems to have a lower resolution so top is not visible. ca 5min this time before error using vuze (torrenting).

cs:..
process md2_raid5 (pid: 390, threadinfo ffff880195b3a000, task ffff880197248730)
Stack:
ffff880195b3bc40 0000000000000086 ffff880199454de0 0000000000000003
<0> ffff880195f08db0 ffff880195b3bd78 ffff880195d14e00 ffff880195d14f88
<0> ffff880195b3bc70 ffffffffa00791d7 ffff880195b3bbcd0 ffffffffa00cda52
Call Trace:
md_write_end+0x47/0x60 [md_mod]
handle_stripe_clean_event+0x102/0x1d0 [raid456]
handle_stripe+0xf1e/ox1c70 [raid456]
? __wake_up+0x64/0x70
raid5d+0x363/0x4a0 [raid456]
? process_timeout+0x0/0x10
md_thread+0x4b/0x120 [md_mod]
? autoremove_wake_function+0x0/0x40
? md_thread+0x0/0x120 [md_mod]
kthread+0x88/0x90
? finish_task_switch+0x48/0xd0
child_rip+0xa/0x20
?kthread+0x0/0x90
? child_rip+0x0/0x20
Code: 8b 03 48 85 c0 74 3e 44 8b 0d d4 23 58 00 45 85 c9 0f 85 7a 01 00 00 48 8b 53 08 49 b8 00 02 20 00 00 00 ad de 41 be 01 00 00 00 <48> 89 50 08 48 89 02 4c 89 43 08 49 8b 44 24 18 48 39 43 10 0f
RIP: [<ffffffff810720d1>] mod_timer+0x91/0x230
RSP <ffff880195b3bc20>
CR2: ffff88ff81788458
--[ end trace 079c6e93730f5131 ]---

Any help is appreciated!

Last edited by exuvo (2011-07-05 23:03:09)

Offline

#2 2011-07-06 01:03:39

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: New installation - Kernel Issues

Your mdadm error on boot is unrelated, and a non-issue. We package mdadm's udev rule in the initcpio, which does autodetection and autoassembly of mdadm arrays via the mdadm binary. However, our mdadm initcpio hook uses /etc/mdadm.conf based assembly with mdassemble instead. You can safely ignore this "error".

All the activities you mention are disk related, and all of the crashes you post are from the VFS or block layer. You're got a disk problem somewhere. I'm sorry I can't be more specific than that.

Last edited by falconindy (2011-07-06 01:04:16)

Offline

#3 2011-07-08 14:14:24

exuvo
Member
Registered: 2010-12-07
Posts: 19

Re: New installation - Kernel Issues

I still dont really understand what the problem was, but after reinstalling arch as i686 i can say that the problems went away.

Offline

Board footer

Powered by FluxBB