You are not logged in.

#1 2016-06-21 06:11:02

parchd
Member
Registered: 2014-03-08
Posts: 421

Computer falling off the bus and freezing

My session has been freezing a lot again - this has been a problem in the past but went away, so I'm wondering whether the fact that I switched back to the standard Arch kernel from linux-ck has anything to do with it.
The problem seems to coincide with problems returning from suspend which I've also posted about here in the past. However, I have some reason to believe all of this could be hardware dying - I just don't know why the problem went away for so long if that is the case.

Journal output below. This happened shortly after boot (in the middle of writing an important email, which is what finally drove me to post).

Jun 21 07:58:55 parchdPC kernel: Disabling IRQ #16
Jun 21 07:58:55 parchdPC kernel: [<ffffffffa117e460>] saa7134_alsa_irq [saa7134_alsa]
Jun 21 07:58:55 parchdPC kernel: [<ffffffffa0484120>] saa7134_irq [saa7134]
Jun 21 07:58:55 parchdPC kernel: [<ffffffffa0062680>] usb_hcd_irq [usbcore]
Jun 21 07:58:55 parchdPC kernel: handlers:
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810510e5>] start_secondary+0x165/0x1a0
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810bde78>] cpu_startup_entry+0x338/0x390
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810bdb2a>] default_idle_call+0x2a/0x40
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff8103955f>] arch_cpu_idle+0xf/0x20
Jun 21 07:58:55 parchdPC kernel:  <EOI>  [<ffffffff81038eba>] ? mwait_idle+0xaa/0x1a0
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff815c7bc2>] common_interrupt+0x82/0x82
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff815c9abb>] do_IRQ+0x4b/0xd0
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff81030e6a>] handle_irq+0x1a/0x30
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810d8e2f>] handle_fasteoi_irq+0x8f/0x160
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810d58b9>] handle_irq_event+0x39/0x60
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810d573e>] handle_irq_event_percpu+0xae/0x1f0
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810d8554>] note_interrupt+0x234/0x280
Jun 21 07:58:55 parchdPC kernel:  [<ffffffff810d81d5>] __report_bad_irq+0x35/0xc0
Jun 21 07:58:55 parchdPC kernel:  <IRQ>  [<ffffffff812e5492>] dump_stack+0x63/0x81
Jun 21 07:58:55 parchdPC kernel: Call Trace:
Jun 21 07:58:55 parchdPC kernel:  ffff8801a7e07800 0000000000000000 0000000000000010 0000000000000000
Jun 21 07:58:55 parchdPC kernel:  ffff8801a7e07800 ffff8801a7e078a4 ffff8801afd03e90 ffffffff810d81d5
Jun 21 07:58:55 parchdPC kernel:  0000000000000086 7d69f0285033b8e2 ffff8801afd03e60 ffffffff812e5492
Jun 21 07:58:55 parchdPC kernel: Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 01/13/2010
Jun 21 07:58:55 parchdPC kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O    4.6.2-1-ARCH #1
Jun 21 07:58:55 parchdPC kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Jun 21 07:58:44 parchdPC kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jun 21 07:58:44 parchdPC kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jun 21 07:57:49 parchdPC systemd[1]: Startup finished in 10.748s (kernel) + 6min 41.183s (userspace) = 6min 51.931s.
Jun 21 07:57:49 parchdPC systemd[1]: Started Update locate database.
Jun 21 07:56:28 parchdPC kernel: perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79800

Offline

#2 2016-06-21 14:41:46

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,784

Re: Computer falling off the bus and freezing

The term "Freezing" is non-deterministic. 

Does the system become non-responsive for a period and then come back?
Does the kernel panic? (Keyboard lights will be flashing)
It just hangs forever?  When it hangs, does the cursor respond to mouse movements?
Can you change to a different console using Ctrl-Alt-F2 (or F3 through F6)?
Can you ping your box? (This is a very low level network function)

It would appear that some piece of hardware in your system is raising an interrupt because it needs attention.  The problem is that the kernel does not know what to do with it because none of the drivers it has loaded are expecting an interrupt on irq16 (or, if they are, the drivers determined the interrupt was not coming from the hardware they are controlling).  Basically, there is a piece of hardware that is raising a hardware exception that has not been told by the kernel that it is authorized to raise such an exception.  The kernel does not know what to do with it, it cannot clear it, and that unhandled exception blocks everything else from using that exception.

Your GPU seems to be involved here.  Which video chipset is it?
Have you tried booting with the irqpoll Linux command line option as suggested?
I note that systemd is reporting that the user space start up time was more than six minutes.  The system crashed shortly after that log entry (less than a minute).  Obviously, you had been working on your email during the six minutes that the user space stuff was starting.   Perhaps you should be looking at the output of systemd-analyze blame and systemd-analyze critical-chain for clues as to what is taking so long.  Good chance it is related.


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#3 2016-06-21 20:48:10

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: Computer falling off the bus and freezing

parchd wrote:

I'm wondering whether the fact that I switched back to the standard Arch kernel from linux-ck has anything to do with it.

Possible. This might be a bug in the kernel or in the NVIDIA driver.

I suspect that your GPU uses IRQ 16. Then, when NV driver loses contact with it (which clearly happens for some reason), the GPU keeps spamming this IRQ but the driver doesn't handle those interrupts anymore and IRQ 16 ends up disabled. So not only you have no video output but now also sound and some USB ports are lost.

You may want to run grep 16: /proc/interrupts and see if nvidia indeed is there. Or just switch back to -ck and forget it.

BTW, this kernel output is upside-down for some reason. Took me a moment to figure it out.

Offline

#4 2016-07-06 16:57:48

parchd
Member
Registered: 2014-03-08
Posts: 421

Re: Computer falling off the bus and freezing

Sorry I took a long time to get back to this, a lot of real life stuff got in the way.
I really appreciate the replies. Also, sorry the original post wasn't as informative as it could have been. It was a rushed post in a moment of frustration.
I actually now think this is a physical hardware issue, it seems like hot weather makes it more likely and giving the computer a kick or pushing on the card seems to stop it happening for a while. I've answered the questions you asked anyway in case you have a further idea, but mostly just because it would be impolite for me not to when you've put an effort in for me.

ewaller wrote:

The term "Freezing" is non-deterministic. 
[...]

No response to ping most of the time, although once it happened and I was even able to ssh in. That was a one off though. Most of the time it will do *something* in response to keyboard input such as ctrl+alt+del or sysrq, as in I hear it react, but it doesn't actually reboot using either of those. It stays stuck with whatever was on the screen when it happened.

ewaller wrote:

Your GPU seems to be involved here.  Which video chipset is it?
Have you tried booting with the irqpoll Linux command line option as suggested?
I note that systemd is reporting that the user space start up time was more than six minutes.  The system crashed shortly after that log entry (less than a minute).  Obviously, you had been working on your email during the six minutes that the user space stuff was starting.   Perhaps you should be looking at the output of systemd-analyze blame and systemd-analyze critical-chain for clues as to what is taking so long.  Good chance it is related.

GeForce 9800 GT. I'll admit I didn't try booting with irqpoll. I've also got no idea why user space stuff takes so long to start according to the journal - it doesn't take much more than 2 or 3 to be writing an email, and adding up the output of systemd-analyze blame I only get to about 3 minutes.


mich41 wrote:
parchd wrote:

I'm wondering whether the fact that I switched back to the standard Arch kernel from linux-ck has anything to do with it.

I suspect that your GPU uses IRQ 16. Then, when NV driver loses contact with it (which clearly happens for some reason), the GPU keeps spamming this IRQ but the driver doesn't handle those interrupts anymore and IRQ 16 ends up disabled. So not only you have no video output but now also sound and some USB ports are lost.

You may want to run grep 16: /proc/interrupts and see if nvidia indeed is there. Or just switch back to -ck and forget it.

It doesn't, but I learnt a lot by checking!

BTW, this kernel output is upside-down for some reason. Took me a moment to figure it out.


Sorry - I tend to use journalctl -r because I usually care about the most recent stuff in the journal.

Many thanks again!

Offline

#5 2016-07-07 10:34:01

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: Computer falling off the bus and freezing

Hot weather, kicks? Maybe the heatsink lost contact with the GPU and it's overheating?

Offline

Board footer

Powered by FluxBB