Arch on Xen Lockups

fukawi2 · 2009-06-30 05:49:10

I've been having some problems recently with one of installations of Arch on a VPS (Linode with uses XEN virtualization)

I'll be doing something simple and boring and the whole thing will die. In the most recent example, I had just started extracting a tarball when it died.

This is all I can gather from the VM's serial console (via the Xen host):

 [<c016141d>] mempool_alloc+0x2d/0xe0
 [<c016141d>] mempool_alloc+0x2d/0xe0
 [<c01a72ab>] bvec_alloc_bs+0x7b/0x140
 [<c01a7571>] bio_alloc_bioset+0x51/0xe0
 [<c0425852>] clone_bio+0x42/0x90
 [<c0426a60>] __split_bio+0x370/0x3a0
 [<c0426e3f>] dm_request+0xff/0x170
 [<c03a6566>] generic_make_request+0xe6/0x230
 [<c0105c53>] xen_restore_fl_direct_end+0x0/0x1
 [<c01825f7>] kmem_cache_alloc+0x57/0xb0
 [<c016141d>] mempool_alloc+0x2d/0xe0
 [<c03a78d3>] submit_bio+0x63/0xf0
 [<c01a72bd>] bvec_alloc_bs+0x8d/0x140
 [<c01a758b>] bio_alloc_bioset+0x6b/0xe0
 [<c01a389a>] submit_bh+0xba/0xf0
 [<c01a5639>] __block_write_full_page+0x1a9/0x310
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0212880>] ext3_get_block+0x0/0x100
 [<c01a588a>] block_write_full_page+0xea/0x100
 [<c0212880>] ext3_get_block+0x0/0x100
 [<c02141b3>] ext3_ordered_writepage+0xa3/0x170
 [<c0210f70>] bget_one+0x0/0x10
 [<c0164c78>] __writepage+0x8/0x30
 [<c016521f>] write_cache_pages

I know these things are near on impossible to diagnose, but any suggestions folks? It's quite annoying

Kernel is a custom Linode one - kernel 2.6.28-linode15

EDIT 2: Here's the logs from the time it died:

Jun 25 17:12:34 platypus kernel: [IPT ISC] : IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:fe:fd:40:16:47:15:08:00 SRC=192.168.139.100 DST=192.168.255.255 LEN=243 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=138 DPT=138 LEN=223 
Jun 25 17:12:34 platypus kernel: [IPT ISC] : IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:fe:fd:40:16:47:15:08:00 SRC=192.168.139.100 DST=192.168.255.255 LEN=235 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=138 DPT=138 LEN=215 
Jun 25 17:24:16 dingo syslog-ng[3743]: syslog-ng starting up; version='3.0.1'
Jun 25 17:24:16 dingo kernel: Reserving virtual address space above 0xf5800000
Jun 25 17:24:16 dingo kernel: Linux version 2.6.28-linode15 (root@db1.linode.com) (gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)) #2 SMP Wed Jan 14 09:18:53 EST 2009

fukawi2 · 2009-09-01 23:22:59

Well it happened again, I managed to get a proper kernel trace this time.

Anyone got any ideas?

------------[ cut here ]------------
kernel BUG at drivers/block/xen-blkfront.c:243!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/block/dm-4/removable
Modules linked in:

Pid: 21028, comm: perl Not tainted (2.6.28-linode15 #2)
EIP: 0061:[<c03ee830>] EFLAGS: 00010046 CPU: 0
EIP is at do_blkif_request+0x2e0/0x360
EAX: 00000001 EBX: 00000000 ECX: d43a5bc0 EDX: c343edb0
ESI: d5952288 EDI: d59522c8 EBP: 000001c3 ESP: c151fe98
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
Process perl (pid: 21028, ti=c151e000 task=d49d2040 task.ti=c151e000)
Stack:
 00000005 d5952288 00000288 d5988028 d5956000 c420864c 00000007 0000000d
 d5956000 00000002 00000006 d5952000 00000000 d43a5bc0 d2c7de0c ffffffff
 d5988028 d5956000 0000000b 00000014 c03a6ca5 d5956000 c03ee8c6 00000000
Call Trace:
 [<c03a6ca5>] blk_invoke_request_fn+0x95/0x100
 [<c03ee8c6>] kick_pending_request_queues+0x16/0x30
 [<c03eea6d>] blkif_interrupt+0x18d/0x1d0
 [<c0159510>] handle_IRQ_event+0x30/0x60
 [<c015b428>] handle_level_irq+0x78/0xf0
 [<c010aae7>] do_IRQ+0x77/0x90
 [<c03c8968>] xen_evtchn_do_upcall+0xe8/0x150
 [<c0109197>] xen_do_upcall+0x7/0xc
Code: 2c 8d 54 03 40 8d 44 0e 54 b9 6c 00 00 00 e8 98 a5 fc ff 8b 44 24 3c e8 ff 92 fd ff 83 44 24 18 01 e9 40 fd ff ff 0f 0b eb fe 90 <0f> 0b eb fe 8b 44 24 20 ba 40 e5 3e c0 8b 4c 24 20 c7 04 24 0b
EIP: [<c03ee830>] do_blkif_request+0x2e0/0x360 SS:ESP 0069:c151fe98
Kernel panic - not syncing: Fatal exception in interrupt
------------[ cut here ]------------
WARNING: at kernel/smp.c:333 smp_call_function_mask+0x1cb/0x1d0()
Modules linked in:
Pid: 21028, comm: perl Tainted: G      D    2.6.28-linode15 #2
Call Trace:
 [<c0128adf>] warn_on_slowpath+0x5f/0x90
 [<c03b8e26>] memmove+0x36/0x40
 [<c03dcc5a>] scrup+0x7a/0xe0
 [<c0140987>] atomic_notifier_call_chain+0x17/0x20
 [<c03dccdf>] notify_update+0x1f/0x30
 [<c03dcf6a>] vt_console_print+0x20a/0x2d0
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105cea>] check_events+0x8/0xe
 [<c0105c53>] xen_restore_fl_direct_end+0x0/0x1
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105cea>] check_events+0x8/0xe
 [<c0105c53>] xen_restore_fl_direct_end+0x0/0x1
 [<c01295e0>] vprintk+0x170/0x350
 [<c014a46b>] smp_call_function_mask+0x1cb/0x1d0
 [<c0105fd0>] stop_self+0x0/0x30
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105cea>] check_events+0x8/0xe
 [<c0105c53>] xen_restore_fl_direct_end+0x0/0x1
 [<c0561ca3>] _spin_unlock_irqrestore+0x13/0x20
 [<c03dec96>] do_unblank_screen+0x16/0x130
 [<c014a484>] smp_call_function+0x14/0x20
 [<c0128b6e>] panic+0x4e/0x100
 [<c010ac3c>] oops_end+0x8c/0xa0
 [<c0109b50>] do_invalid_op+0x0/0xa0
 [<c0109bcf>] do_invalid_op+0x7f/0xa0
 [<c03ee830>] do_blkif_request+0x2e0/0x360
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105cea>] check_events+0x8/0xe
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105cea>] check_events+0x8/0xe
 [<c0105407>] xen_force_evtchn_callback+0x17/0x30
 [<c0105cea>] check_events+0x8/0xe
 [<c0105c53>] xen_restore_fl_direct_end+0x0/0x1
 [<c0561ca3>] _spin_unlock_irqrestore+0x13/0x20
 [<c0561f4a>] error_code+0x72/0x78
 [<c03ee830>] do_blkif_request+0x2e0/0x360
 [<c03a6ca5>] blk_invoke_request_fn+0x95/0x100
 [<c03ee8c6>] kick_pending_request_queues+0x16/0x30
 [<c03eea6d>] blkif_interrupt+0x18d/0x1d0
 [<c0159510>] handle_IRQ_event+0x30/0x60
 [<c015b428>] handle_level_irq+0x78/0xf0
 [<c010aae7>] do_IRQ+0x77/0x90
 [<c03c8968>] xen_evtchn_do_upcall+0xe8/0x150
 [<c0109197>] xen_do_upcall+0x7/0xc
---[ end trace c449499288c87a80 ]---

fukawi2 · 2009-09-01 23:46:49

Linode support says it's a known bug in 2.6.28 so I've updated all my kernels to 2.6.30 now. Hopefully that should solve it!

Arch Linux

#1 2009-06-30 05:49:10

Arch on Xen Lockups

#2 2009-09-01 23:22:59

Re: Arch on Xen Lockups

#3 2009-09-01 23:46:49

Re: Arch on Xen Lockups

Board footer