You are not logged in.

#1 2013-06-06 15:30:33

_djm_
Member
Registered: 2009-11-30
Posts: 33
Website

"Watchdog detected hard LOCKUP on cpu 1"

Hi,

Last night, I was connected to my (remote) computer via SSH (the machine was idle) when my connection dropped out and didn't reappear, which is unusual. Looking at the logs this morning, there was a strange sequence of entries (commencing at the time of the dropout) that I haven't seen before:

Jun 05 18:41:36 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:37 djm kernel: ------------[ cut here ]------------
Jun 05 18:41:37 djm kernel: WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0()
Jun 05 18:41:37 djm kernel: Hardware name: SYX-P55-CD53
Jun 05 18:41:37 djm kernel: Watchdog detected hard LOCKUP on cpu 1
Jun 05 18:41:37 djm kernel: Modules linked in:
Jun 05 18:41:37 djm kernel:  bsd_comp ppp_async crc_ccitt ppp_generic slhc nls_cp437 vfat fat fuse ses enclosure joydev snd_hda_codec_hdmi nouveau m
Jun 05 18:41:37 djm kernel: Pid: 0, comm: swapper/1 Not tainted 3.9.4-1-ARCH #1
Jun 05 18:41:37 djm kernel: Call Trace:
Jun 05 18:41:37 djm kernel:  <NMI>  [<ffffffff810dc700>] ? watchdog_overflow_callback+0x90/0xd0
Jun 05 18:41:37 djm kernel:  [<ffffffff810580b0>] warn_slowpath_common+0x70/0xa0
Jun 05 18:41:37 djm kernel:  [<ffffffff8105812c>] warn_slowpath_fmt+0x4c/0x50
Jun 05 18:41:37 djm kernel:  [<ffffffff8101ce15>] ? native_sched_clock+0x15/0x80
Jun 05 18:41:37 djm kernel:  [<ffffffff8101ce89>] ? sched_clock+0x9/0x10
Jun 05 18:41:37 djm kernel:  [<ffffffff810dc670>] ? watchdog_enable+0x1e0/0x1e0
Jun 05 18:41:37 djm kernel:  [<ffffffff810dc70c>] watchdog_overflow_callback+0x9c/0xd0
Jun 05 18:41:37 djm kernel:  [<ffffffff81118a72>] __perf_event_overflow+0x92/0x2b0
Jun 05 18:41:37 djm kernel:  [<ffffffff811179c4>] ? perf_event_update_userpage+0xe4/0x160
Jun 05 18:41:37 djm kernel:  [<ffffffff81119784>] perf_event_overflow+0x14/0x20
Jun 05 18:41:37 djm kernel:  [<ffffffff8102b8f2>] intel_pmu_handle_irq+0x1b2/0x370
Jun 05 18:41:37 djm kernel:  [<ffffffff814d434d>] perf_event_nmi_handler+0x1d/0x20
Jun 05 18:41:37 djm kernel:  [<ffffffff814d3a59>] nmi_handle.isra.0+0x59/0x90
Jun 05 18:41:37 djm kernel:  [<ffffffff814d3bf9>] do_nmi+0x169/0x340
Jun 05 18:41:37 djm kernel:  [<ffffffff814d30f1>] end_repeat_nmi+0x1e/0x2e
Jun 05 18:41:37 djm kernel:  [<ffffffff814d2082>] ? _raw_spin_lock_irqsave+0x42/0x50
Jun 05 18:41:37 djm kernel:  [<ffffffff814d2082>] ? _raw_spin_lock_irqsave+0x42/0x50
Jun 05 18:41:37 djm kernel:  [<ffffffff814d2082>] ? _raw_spin_lock_irqsave+0x42/0x50
Jun 05 18:41:37 djm kernel:  <<EOE>>  <IRQ>  [<ffffffffa0a8a882>] nouveau_fan_update+0x52/0x1c0 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8aa49>] nouveau_therm_fan_set+0x19/0x20 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8a39a>] nouveau_therm_update+0x19a/0x2b0 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8a4ca>] nouveau_therm_alarm+0x1a/0x20 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8d666>] nv04_timer_alarm_trigger+0x126/0x170 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8d7cd>] nv04_timer_alarm+0x8d/0xa0 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8ba0a>] alarm_timer_callback+0xda/0xe0 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8d666>] nv04_timer_alarm_trigger+0x126/0x170 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8d7cd>] nv04_timer_alarm+0x8d/0xa0 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8a9e4>] nouveau_fan_update+0x1b4/0x1c0 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8aa0c>] nouveau_fan_alarm+0x1c/0x20 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8d666>] nv04_timer_alarm_trigger+0x126/0x170 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a8d723>] nv04_timer_intr+0x73/0x90 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffffa0a88bf5>] nouveau_mc_intr+0xb5/0x110 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffff81394840>] ? intel_pstate_set_policy+0x150/0x150
Jun 05 18:41:37 djm kernel:  [<ffffffffa0ae628b>] nouveau_irq_handler+0x6b/0x80 [nouveau]
Jun 05 18:41:37 djm kernel:  [<ffffffff810dd226>] handle_irq_event_percpu+0x56/0x250
Jun 05 18:41:37 djm kernel:  [<ffffffff81394840>] ? intel_pstate_set_policy+0x150/0x150
Jun 05 18:41:37 djm kernel:  [<ffffffff810dd45d>] handle_irq_event+0x3d/0x60
Jun 05 18:41:37 djm kernel:  [<ffffffff810e03ea>] handle_fasteoi_irq+0x5a/0x100
Jun 05 18:41:37 djm kernel:  [<ffffffff8101763e>] handle_irq+0x1e/0x30
Jun 05 18:41:37 djm kernel:  [<ffffffff814dbc0d>] do_IRQ+0x4d/0xc0
Jun 05 18:41:37 djm kernel:  [<ffffffff814d2a6d>] common_interrupt+0x6d/0x6d
Jun 05 18:41:37 djm kernel:  <EOI>  [<ffffffff810b009f>] ? clockevents_notify+0x19f/0x1b0
Jun 05 18:41:37 djm kernel:  [<ffffffff813952d1>] ? cpuidle_wrap_enter+0x41/0x80
Jun 05 18:41:37 djm kernel:  [<ffffffff81394870>] cpuidle_enter_tk+0x10/0x20
Jun 05 18:41:37 djm kernel:  [<ffffffff81395002>] cpuidle_idle_call+0xb2/0x260
Jun 05 18:41:37 djm kernel:  [<ffffffff8101e86c>] cpu_idle+0xec/0x150
Jun 05 18:41:37 djm kernel:  [<ffffffff814bfb5a>] start_secondary+0x270/0x272
Jun 05 18:41:37 djm kernel: ---[ end trace 6384d506fa0c6806 ]---
Jun 05 18:41:37 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:38 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:39 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:40 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:40 djm kernel: hub 1-1:1.0: Cannot enable port 5.  Maybe the USB cable is bad?
Jun 05 18:41:41 djm kernel: hub 1-1:1.0: cannot disable port 5 (err = -110)
Jun 05 18:41:42 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:43 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:44 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:45 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:46 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:46 djm kernel: hub 1-1:1.0: Cannot enable port 5.  Maybe the USB cable is bad?
Jun 05 18:41:47 djm kernel: hub 1-1:1.0: cannot disable port 5 (err = -110)
Jun 05 18:41:48 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:49 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:50 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:51 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:52 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:52 djm kernel: hub 1-1:1.0: Cannot enable port 5.  Maybe the USB cable is bad?
Jun 05 18:41:54 djm kernel: hub 1-1:1.0: cannot disable port 5 (err = -110)
Jun 05 18:41:55 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:56 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:57 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:58 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:59 djm kernel: hub 1-1:1.0: cannot reset port 5 (err = -110)
Jun 05 18:41:59 djm kernel: hub 1-1:1.0: Cannot enable port 5.  Maybe the USB cable is bad?
Jun 05 18:42:00 djm kernel: hub 1-1:1.0: cannot disable port 5 (err = -110)
Jun 05 18:42:01 djm kernel: hub 1-1:1.0: cannot disable port 5 (err = -110)
Jun 05 18:42:01 djm kernel: sd 12:0:0:0: Device offlined - not ready after error recovery
Jun 05 18:42:04 djm kernel: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=18002 jiffies, g=948951, c=948950, q=1432)
Jun 05 18:42:04 djm kernel: sending NMI to all CPUs:
Jun 05 18:42:04 djm kernel: NMI backtrace for cpu 1
Jun 05 18:42:04 djm kernel: CPU 1 
Jun 05 18:42:04 djm kernel: Pid: 0, comm: swapper/1 Tainted: G        W    3.9.4-1-ARCH #1 Si SYX-P55-CD53/P55-CD53 (MS-7586)  
Jun 05 18:42:04 djm kernel: RIP: 0010:[<ffffffff814d2080>]  [<ffffffff814d2080>] _raw_spin_lock_irqsave+0x40/0x50
Jun 05 18:42:04 djm kernel: RSP: 0018:ffff88023fc43b78  EFLAGS: 00000097
Jun 05 18:42:04 djm kernel: RAX: ffff88023413bfd8 RBX: ffff880233222800 RCX: 00000000000000c8
Jun 05 18:42:04 djm kernel: RDX: 00000000000000c7 RSI: 0000000000000086 RDI: ffff8802331115b8
Jun 05 18:42:04 djm kernel: RBP: ffff88023fc43b78 R08: 0000000000000019 R09: 0000aef3a149ae40
Jun 05 18:42:04 djm kernel: R10: 0000000000000061 R11: 0000000000003e7d R12: 0000000000000019
Jun 05 18:42:04 djm kernel: R13: ffff8802331115b8 R14: ffff880233222800 R15: ffff880233111500
Jun 05 18:42:04 djm kernel: FS:  0000000000000000(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
Jun 05 18:42:04 djm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 05 18:42:04 djm kernel: CR2: 00007f87c6167090 CR3: 000000000180d000 CR4: 00000000000007e0
Jun 05 18:42:04 djm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 05 18:42:04 djm kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 05 18:42:04 djm kernel: Process swapper/1 (pid: 0, threadinfo ffff88023413a000, task ffff88023409e480)
Jun 05 18:42:04 djm kernel: Stack:
Jun 05 18:42:04 djm kernel:  ffff88023fc43bc8 ffffffffa0a8a882 ffff8802333e7b40 00000000813c9b9f
Jun 05 18:42:04 djm kernel:  ffff8800b33eb800 0000000000000002 ffff8802332228e0 0000000000000086
Jun 05 18:42:04 djm kernel:  ffff880233222800 0000000000000019 ffff88023fc43bd8 ffffffffa0a8aa49
Jun 05 18:42:04 djm kernel: Call Trace:
Jun 05 18:42:04 djm kernel:  <IRQ> 
Jun 05 18:42:04 djm kernel: 
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8a882>] nouveau_fan_update+0x52/0x1c0 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8aa49>] nouveau_therm_fan_set+0x19/0x20 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8a39a>] nouveau_therm_update+0x19a/0x2b0 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8a4ca>] nouveau_therm_alarm+0x1a/0x20 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8d666>] nv04_timer_alarm_trigger+0x126/0x170 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8d7cd>] nv04_timer_alarm+0x8d/0xa0 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8ba0a>] alarm_timer_callback+0xda/0xe0 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8d666>] nv04_timer_alarm_trigger+0x126/0x170 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8d7cd>] nv04_timer_alarm+0x8d/0xa0 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8a9e4>] nouveau_fan_update+0x1b4/0x1c0 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8aa0c>] nouveau_fan_alarm+0x1c/0x20 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8d666>] nv04_timer_alarm_trigger+0x126/0x170 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a8d723>] nv04_timer_intr+0x73/0x90 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffffa0a88bf5>] nouveau_mc_intr+0xb5/0x110 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffff81394840>] ? intel_pstate_set_policy+0x150/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffffa0ae628b>] nouveau_irq_handler+0x6b/0x80 [nouveau]
Jun 05 18:42:04 djm kernel:  [<ffffffff810dd226>] handle_irq_event_percpu+0x56/0x250
Jun 05 18:42:04 djm kernel:  [<ffffffff81394840>] ? intel_pstate_set_policy+0x150/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff810dd45d>] handle_irq_event+0x3d/0x60
Jun 05 18:42:04 djm kernel:  [<ffffffff810e03ea>] handle_fasteoi_irq+0x5a/0x100
Jun 05 18:42:04 djm kernel:  [<ffffffff8101763e>] handle_irq+0x1e/0x30
Jun 05 18:42:04 djm kernel:  [<ffffffff814dbc0d>] do_IRQ+0x4d/0xc0
Jun 05 18:42:04 djm kernel:  [<ffffffff814d2a6d>] common_interrupt+0x6d/0x6d
Jun 05 18:42:04 djm kernel:  <EOI> 
Jun 05 18:42:04 djm kernel: 
Jun 05 18:42:04 djm kernel:  [<ffffffff810b009f>] ? clockevents_notify+0x19f/0x1b0
Jun 05 18:42:04 djm kernel:  [<ffffffff813952d1>] ? cpuidle_wrap_enter+0x41/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff81394870>] cpuidle_enter_tk+0x10/0x20
Jun 05 18:42:04 djm kernel:  [<ffffffff81395002>] cpuidle_idle_call+0xb2/0x260
Jun 05 18:42:04 djm kernel:  [<ffffffff8101e86c>] cpu_idle+0xec/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff814bfb5a>] start_secondary+0x270/0x272
Jun 05 18:42:04 djm kernel: Code: 66 90 66 66 90 65 48 8b 04 25 f0 c6 00 00 83 80 44 e0 ff ff 01 ba 00 01 00 00 f0 66 0f c1 17 0f b6 ce 38 d1 74 0e 0f 1f 44 00
Jun 05 18:42:04 djm kernel: NMI backtrace for cpu 0
Jun 05 18:42:04 djm kernel: CPU 0 
Jun 05 18:42:04 djm kernel: Pid: 0, comm: swapper/0 Tainted: G        W    3.9.4-1-ARCH #1 Si SYX-P55-CD53/P55-CD53 (MS-7586)  
Jun 05 18:42:04 djm kernel: RIP: 0010:[<ffffffff8127a235>]  [<ffffffff8127a235>] __const_udelay+0x25/0x30
Jun 05 18:42:04 djm kernel: RSP: 0018:ffff88023fc03cf0  EFLAGS: 00000807
Jun 05 18:42:04 djm kernel: RAX: 000000005abd8800 RBX: 0000000000002710 RCX: 0000000002a7cf40
Jun 05 18:42:04 djm kernel: RDX: 000000000028c9f4 RSI: 0000000000000002 RDI: 0000000000418958
Jun 05 18:42:04 djm kernel: RBP: ffff88023fc03cf0 R08: 000000000000000a R09: 00000000000003bb
Jun 05 18:42:04 djm kernel: R10: 0000000000000000 R11: ffff88023fc03a36 R12: ffffffff8184bbc0
Jun 05 18:42:04 djm kernel: R13: ffff88023fc0e720 R14: ffffffff8184bbc0 R15: 0000000000000001
Jun 05 18:42:04 djm kernel: FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
Jun 05 18:42:04 djm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 05 18:42:04 djm kernel: CR2: 00007ffaf8c69090 CR3: 000000000180d000 CR4: 00000000000007f0
Jun 05 18:42:04 djm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 05 18:42:04 djm kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 05 18:42:04 djm kernel: Process swapper/0 (pid: 0, threadinfo ffffffff81800000, task ffffffff81814440)
Jun 05 18:42:04 djm kernel: Stack:
Jun 05 18:42:04 djm kernel:  ffff88023fc03d08 ffffffff8103acf2 ffffffff818b4228 ffff88023fc03d68
Jun 05 18:42:04 djm kernel:  ffffffff810e4c37 0000000101067298 0000000000000046 0000000000000598
Jun 05 18:42:04 djm kernel:  ffffffff81800000 0000000000000000 ffffffff81814440 0000000000000000
Jun 05 18:42:04 djm kernel: Call Trace:
Jun 05 18:42:04 djm kernel:  <IRQ> 
Jun 05 18:42:04 djm kernel: 
Jun 05 18:42:04 djm kernel:  [<ffffffff8103acf2>] arch_trigger_all_cpu_backtrace+0x62/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff810e4c37>] rcu_pending+0x4c7/0x510
Jun 05 18:42:04 djm kernel:  [<ffffffff810e72e6>] rcu_check_callbacks+0x1f6/0x210
Jun 05 18:42:04 djm kernel:  [<ffffffff81069347>] update_process_times+0x47/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff810b2355>] tick_sched_handle.isra.13+0x25/0x60
Jun 05 18:42:04 djm kernel:  [<ffffffff810b23d1>] tick_sched_timer+0x41/0x60
Jun 05 18:42:04 djm kernel:  [<ffffffff8107ecb7>] __run_hrtimer+0x77/0x280
Jun 05 18:42:04 djm kernel:  [<ffffffff810b2390>] ? tick_sched_handle.isra.13+0x60/0x60
Jun 05 18:42:04 djm kernel:  [<ffffffff8107f7d7>] hrtimer_interrupt+0xe7/0x230
Jun 05 18:42:04 djm kernel:  [<ffffffff81394840>] ? intel_pstate_set_policy+0x150/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff810428a6>] hpet_interrupt_handler+0x16/0x40
Jun 05 18:42:04 djm kernel:  [<ffffffff810dd226>] handle_irq_event_percpu+0x56/0x250
Jun 05 18:42:04 djm kernel:  [<ffffffff81394840>] ? intel_pstate_set_policy+0x150/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff810dd45d>] handle_irq_event+0x3d/0x60
Jun 05 18:42:04 djm kernel:  [<ffffffff810dfea7>] handle_edge_irq+0x77/0x140
Jun 05 18:42:04 djm kernel:  [<ffffffff8101763e>] handle_irq+0x1e/0x30
Jun 05 18:42:04 djm kernel:  [<ffffffff814dbc0d>] do_IRQ+0x4d/0xc0
Jun 05 18:42:04 djm kernel:  [<ffffffff814d2a6d>] common_interrupt+0x6d/0x6d
Jun 05 18:42:04 djm kernel:  <EOI> 
Jun 05 18:42:04 djm kernel: 
Jun 05 18:42:04 djm kernel:  [<ffffffff813952d1>] ? cpuidle_wrap_enter+0x41/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff813952ca>] ? cpuidle_wrap_enter+0x3a/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff81394870>] cpuidle_enter_tk+0x10/0x20
Jun 05 18:42:04 djm kernel:  [<ffffffff81395002>] cpuidle_idle_call+0xb2/0x260
Jun 05 18:42:04 djm kernel:  [<ffffffff8101e86c>] cpu_idle+0xec/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff814b439f>] rest_init+0x7f/0x90
Jun 05 18:42:04 djm kernel:  [<ffffffff818d4ed0>] start_kernel+0x3fc/0x408
Jun 05 18:42:04 djm kernel:  [<ffffffff818d48dd>] ? repair_env_string+0x5c/0x5c
Jun 05 18:42:04 djm kernel:  [<ffffffff818d4120>] ? early_idt_handlers+0x120/0x120
Jun 05 18:42:04 djm kernel:  [<ffffffff818d45dc>] x86_64_start_reservations+0x2a/0x2c
Jun 05 18:42:04 djm kernel:  [<ffffffff818d46d1>] x86_64_start_kernel+0xf3/0x100
Jun 05 18:42:04 djm kernel: Code: 84 00 00 00 00 00 65 48 8b 14 25 18 3b 01 00 48 8d 0c 92 55 48 8d 04 bd 00 00 00 00 48 89 ca 48 89 e5 48 c1 e2 04 48 29 ca f7
Jun 05 18:42:04 djm kernel: NMI backtrace for cpu 2
Jun 05 18:42:04 djm kernel: CPU 2 
Jun 05 18:42:04 djm kernel: Pid: 0, comm: swapper/2 Tainted: G        W    3.9.4-1-ARCH #1 Si SYX-P55-CD53/P55-CD53 (MS-7586)  
Jun 05 18:42:04 djm kernel: RIP: 0010:[<ffffffff812d6d0b>]  [<ffffffff812d6d0b>] intel_idle+0xab/0x100
Jun 05 18:42:04 djm kernel: RSP: 0018:ffff88023413de38  EFLAGS: 00000046
Jun 05 18:42:04 djm kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
Jun 05 18:42:04 djm kernel: RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
Jun 05 18:42:04 djm kernel: RBP: ffff88023413de60 R08: 0000000000000000 R09: 0000000000003317
Jun 05 18:42:04 djm kernel: R10: 0000000000000000 R11: 0000000000002d2b R12: 0000000000000004
Jun 05 18:42:04 djm kernel: R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff81394840
Jun 05 18:42:04 djm kernel: FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
Jun 05 18:42:04 djm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 05 18:42:04 djm kernel: CR2: 00007f0f3f1baf30 CR3: 000000000180d000 CR4: 00000000000007e0
Jun 05 18:42:04 djm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 05 18:42:04 djm kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 05 18:42:04 djm kernel: Process swapper/2 (pid: 0, threadinfo ffff88023413c000, task ffff88023409ece0)
Jun 05 18:42:04 djm kernel: Stack:
Jun 05 18:42:04 djm kernel:  000000023413de48 ffff88023fc9a170 ffffffff81879f60 0000aefcf760f06a
Jun 05 18:42:04 djm kernel:  0000000000000004 ffff88023413de70 ffffffff81394859 ffff88023413dea8
Jun 05 18:42:04 djm kernel:  ffffffff813952c2 0000000000000004 ffff88023fc9a170 0000000000000000
Jun 05 18:42:04 djm kernel: Call Trace:
Jun 05 18:42:04 djm kernel:  [<ffffffff81394859>] cpuidle_enter+0x19/0x20
Jun 05 18:42:04 djm kernel:  [<ffffffff813952c2>] cpuidle_wrap_enter+0x32/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff81394870>] cpuidle_enter_tk+0x10/0x20
Jun 05 18:42:04 djm kernel:  [<ffffffff81395002>] cpuidle_idle_call+0xb2/0x260
Jun 05 18:42:04 djm kernel:  [<ffffffff8101e86c>] cpu_idle+0xec/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff814bfb5a>] start_secondary+0x270/0x272
Jun 05 18:42:04 djm kernel: Code: 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 65 48 8b 04 25 f0 c6 00 00 48 8b 80 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01
Jun 05 18:42:04 djm kernel: NMI backtrace for cpu 3
Jun 05 18:42:04 djm kernel: CPU 3 
Jun 05 18:42:04 djm kernel: Pid: 0, comm: swapper/3 Tainted: G        W    3.9.4-1-ARCH #1 Si SYX-P55-CD53/P55-CD53 (MS-7586)  
Jun 05 18:42:04 djm kernel: RIP: 0010:[<ffffffff812d6d0b>]  [<ffffffff812d6d0b>] intel_idle+0xab/0x100
Jun 05 18:42:04 djm kernel: RSP: 0018:ffff88023413fe38  EFLAGS: 00000046
Jun 05 18:42:04 djm kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
Jun 05 18:42:04 djm kernel: RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
Jun 05 18:42:04 djm kernel: RBP: ffff88023413fe60 R08: 0000000000000000 R09: 0000000000005a95
Jun 05 18:42:04 djm kernel: R10: 0000000000000000 R11: 000000000000042d R12: 0000000000000004
Jun 05 18:42:04 djm kernel: R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff81394840
Jun 05 18:42:04 djm kernel: FS:  0000000000000000(0000) GS:ffff88023fcc0000(0000) knlGS:0000000000000000
Jun 05 18:42:04 djm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 05 18:42:04 djm kernel: CR2: 00007f243f1eb000 CR3: 000000000180d000 CR4: 00000000000007e0
Jun 05 18:42:04 djm kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 05 18:42:04 djm kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 05 18:42:04 djm kernel: Process swapper/3 (pid: 0, threadinfo ffff88023413e000, task ffff88023409f540)
Jun 05 18:42:04 djm kernel: Stack:
Jun 05 18:42:04 djm kernel:  000000033413fe48 ffff88023fcda170 ffffffff81879f60 0000aefcf760f22b
Jun 05 18:42:04 djm kernel:  0000000000000004 ffff88023413fe70 ffffffff81394859 ffff88023413fea8
Jun 05 18:42:04 djm kernel:  ffffffff813952c2 0000000000000004 ffff88023fcda170 0000000000000000
Jun 05 18:42:04 djm kernel: Call Trace:
Jun 05 18:42:04 djm kernel:  [<ffffffff81394859>] cpuidle_enter+0x19/0x20
Jun 05 18:42:04 djm kernel:  [<ffffffff813952c2>] cpuidle_wrap_enter+0x32/0x80
Jun 05 18:42:04 djm kernel:  [<ffffffff81394870>] cpuidle_enter_tk+0x10/0x20
Jun 05 18:42:04 djm kernel:  [<ffffffff81395002>] cpuidle_idle_call+0xb2/0x260
Jun 05 18:42:04 djm kernel:  [<ffffffff8101e86c>] cpu_idle+0xec/0x150
Jun 05 18:42:04 djm kernel:  [<ffffffff814bfb5a>] start_secondary+0x270/0x272
Jun 05 18:42:04 djm kernel: Code: 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 65 48 8b 04 25 f0 c6 00 00 48 8b 80 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01
Jun 05 18:42:06 djm kernel: hub 1-1:1.0: hub_port_status failed (err = -110)

The above entries repeated every 3 minutes until I rebooted today. All seems fine so far today.

The machine is ~2.5 years old and has been very solid until now. All packages are up to date.

Other info:

Linux djm 3.9.4-1-ARCH #1 SMP PREEMPT Sat May 25 16:14:55 CEST 2013 x86_64 GNU/Linux
cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 30
model name	: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
stepping	: 5
microcode	: 0x3
cpu MHz		: 1200.000
cache size	: 8192 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5348.47
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 30
model name	: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
stepping	: 5
microcode	: 0x3
cpu MHz		: 1200.000
cache size	: 8192 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5348.47
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 30
model name	: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
stepping	: 5
microcode	: 0x3
cpu MHz		: 1200.000
cache size	: 8192 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5348.47
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 30
model name	: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
stepping	: 5
microcode	: 0x3
cpu MHz		: 1200.000
cache size	: 8192 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5348.47
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:
Command line: root=/dev/sda5 ro init=/usr/lib/systemd/systemd consoleblank=0 initrd=../initramfs-linux.img BOOT_IMAGE=../vmlinuz-linux

I'm not sure how to interpret the log entries. Has anyone seen something similar before or could suggest a way to understand what happened / what is happening?

Thanks

Offline

#2 2013-06-06 21:42:09

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: "Watchdog detected hard LOCKUP on cpu 1"

Apparently nouveau_fan_update locked some spinlock, called itself recursively in some indirect way and attempted to grab the same spinlock again, locking up one CPU core for good.

It's a bug and it may happen again. If you use this machine only remotely, blacklisting drivers for "human interface" devices that you don't use may increase stability.

I'm not sure if this SSH dropout was related.

You can try reporting this bug to nouveau mailing list with this backtrace from CPU1 attached.

Last edited by mich41 (2013-06-06 21:51:59)

Offline

#3 2013-06-06 22:08:56

_djm_
Member
Registered: 2009-11-30
Posts: 33
Website

Re: "Watchdog detected hard LOCKUP on cpu 1"

Thanks for your reply.

mich41 wrote:

Apparently nouveau_fan_update locked some spinlock, called itself recursively in some indirect way and attempted to grab the same spinlock again, locking up one CPU core for good.
It's a bug and it may happen again.

If you use this machine only remotely, blacklisting drivers for "human interface" devices that you don't use may increase stability.

I'm not sure if this SSH dropout was related.

You can try reporting this bug to nouveau mailing list with this backtrace from CPU1 attached.

I use the machine locally during the day. The SSH dropout happened within a couple of minutes of the above log entries, so I'd say they're related. The network connection stayed down locally until the reboot.

Hopefully it was just a one-off.

Offline

#4 2013-06-07 13:03:03

_djm_
Member
Registered: 2009-11-30
Posts: 33
Website

Re: "Watchdog detected hard LOCKUP on cpu 1"

_djm_ wrote:

Hopefully it was just a one-off.

Nope, happened again today. I'll report it to nouveau.

Offline

#5 2014-02-07 10:52:11

frostbittenking
Member
Registered: 2013-05-01
Posts: 54

Re: "Watchdog detected hard LOCKUP on cpu 1"

Nope, isn't nouveau. I have similar symptoms. Connection drops out, and system behaves strangely. Mouse jumps and whatnot.
Except, my watchdog trace is quite different, without mentioning nouveau. Except same symptoms, I see no similarities between the traces. The root/leaf depending on the point of view is the same (watchdog).

[ 2065.236141] WARNING: CPU: 6 PID: 0 at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0()
[ 2065.236144] Watchdog detected hard LOCKUP on cpu 6
[ 2065.236146] Modules linked in:
[ 2065.236149]  nfsv4 fuse coretemp kvm_intel kvm mxm_wmi iTCO_wdt nvidia(PO) iTCO_vendor_support evdev joydev microcode psmouse serio_raw pcspkr i2c_i801 snd_hda_codec_realtek r8169 mii snd_hda_intel snd_hda_codec drm snd_hwdep asus_atk0110 snd_pcm i2c_core snd_page_alloc snd_timer wmi i7core_edac snd button lpc_ich soundcore shpchp edac_core acpi_cpufreq processor nfs lockd sunrpc fscache sr_mod cdrom btrfs hid_generic raid6_pq libcrc32c usbhid hid xor dm_mod sd_mod usb_storage crc32c_intel ahci uhci_hcd libahci xhci_hcd ehci_pci firewire_ohci ehci_hcd libata firewire_core crc_itu_t usbcore scsi_mod usb_common ext4 crc16 mbcache jbd2
[ 2065.236224] CPU: 6 PID: 0 Comm: swapper/6 Tainted: P        W IO 3.12.9-2-ARCH #1
[ 2065.236227] Hardware name: System manufacturer System Product Name/P6T, BIOS 1303    03/08/2010
[ 2065.236230]  0000000000000009 ffff8801bfcc6c10 ffffffff814ed0d3 ffff8801bfcc6c58
[ 2065.236236]  ffff8801bfcc6c48 ffffffff81062aed ffff8801b6b68000 0000000000000000
[ 2065.236241]  ffff8801bfcc6d78 0000000000000000 ffff8801bfcc6ef8 ffff8801bfcc6ca8
[ 2065.236246] Call Trace:
[ 2065.236248]  <NMI>  [<ffffffff814ed0d3>] dump_stack+0x54/0x8d
[ 2065.236262]  [<ffffffff81062aed>] warn_slowpath_common+0x7d/0xa0
[ 2065.236268]  [<ffffffff81062b5c>] warn_slowpath_fmt+0x4c/0x50
[ 2065.236273]  [<ffffffff810f1e70>] ? restart_watchdog_hrtimer+0x50/0x50
[ 2065.236278]  [<ffffffff810f1f0c>] watchdog_overflow_callback+0x9c/0xd0
[ 2065.236284]  [<ffffffff8112c31e>] __perf_event_overflow+0x8e/0x2b0
[ 2065.236290]  [<ffffffff8112b188>] ? perf_event_update_userpage+0xe8/0x160
[ 2065.236295]  [<ffffffff8112cef4>] perf_event_overflow+0x14/0x20
[ 2065.236302]  [<ffffffff81030abd>] intel_pmu_handle_irq+0x1bd/0x3c0
[ 2065.236309]  [<ffffffff814f5f1b>] perf_event_nmi_handler+0x2b/0x50
[ 2065.236315]  [<ffffffff814f5671>] nmi_handle.isra.3+0xa1/0x1d0
[ 2065.236320]  [<ffffffff814f5870>] do_nmi+0xd0/0x340
[ 2065.236325]  [<ffffffff814f4cb1>] end_repeat_nmi+0x1e/0x2e
[ 2065.236332]  [<ffffffff810b624a>] ? cpu_startup_entry+0x11a/0x2f0
[ 2065.236337]  [<ffffffff810b624a>] ? cpu_startup_entry+0x11a/0x2f0
[ 2065.236342]  [<ffffffff810b624a>] ? cpu_startup_entry+0x11a/0x2f0
[ 2065.236345]  <<EOE>>  [<ffffffff81042238>] start_secondary+0x228/0x2d0
[ 2065.236354] ---[ end trace d010dd627b5a5282 ]---

My system:
Asus P6t (mobo)
Intel Core i7 920
Nvidia GTX 285
<rest ommitted due to irrelevance>...

Ooops, sorry. Necrobumping.  Didn't check the date exactly.

Last edited by frostbittenking (2014-02-07 11:25:56)

Offline

Board footer

Powered by FluxBB