You are not logged in.
Hi, i need some advice to debug this issue.
When my wife plays civilizationIV under wine, after some time (30 minutes on average), Xorg completely locks.
switching VTs doesn't work, but the network is alive so i can ssh it and check the logs.
Here's what i think is relevant:
my kernel boot line:
kernel /vmlinuz-linux root=/dev/disk/by-uuid/a1468340-35da-456b-8e02-c0a263a8b0ab ro vga=792 resume=/dev/sda5 pcie_aspm=force consoleblank=0
dmesg:
[11439.712213] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[11439.712223] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[11440.425226] irq 16: nobody cared (try booting with the "irqpoll" option)
[11440.425233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P R O 3.10.9-1-pae #1
[11440.425235] Hardware name: System manufacturer System Product Name/P5QL-ASUS-SE, BIOS 0402 10/09/2008
[11440.425237] f4482a80 f4482a80 f4409f50 c050490d f4409f70 c01b51f9 c05db3c0 00000010
[11440.425242] 0032ff30 e08eb01a f4482a80 00000010 f4409f94 c01b5599 e0bc4f2a 00000010
[11440.425247] c0408110 00000000 f4482a80 f4482ad0 00000000 f4409fd4 c01b355d f4482b90
[11440.425252] Call Trace:
[11440.425259] [<c050490d>] dump_stack+0x16/0x18
[11440.425263] [<c01b51f9>] __report_bad_irq+0x29/0xd0
[11440.425266] [<c01b5599>] note_interrupt+0xf9/0x1a0
[11440.425270] [<c0408110>] ? cpuidle_enter_state+0x40/0xd0
[11440.425273] [<c01b355d>] handle_irq_event_percpu+0xcd/0x1f0
[11440.425276] [<c01b5f20>] ? unmask_irq+0x30/0x30
[11440.425279] [<c0509ff7>] ? nmi_stack_correct+0x2f/0x34
[11440.425282] [<c01b36b1>] handle_irq_event+0x31/0x50
[11440.425284] [<c01b5f20>] ? unmask_irq+0x30/0x30
[11440.425287] [<c01b5f6e>] handle_fasteoi_irq+0x4e/0xe0
[11440.425288] <IRQ> [<c051030c>] ? do_IRQ+0x3c/0xb0
[11440.425293] [<c050c9c1>] ? notifier_call_chain+0x41/0x60
[11440.425296] [<c05101b3>] ? common_interrupt+0x33/0x38
[11440.425300] [<c018007b>] ? hibernation_restore+0xeb/0x150
[11440.425315] [<f8a900e0>] ? is_processor_present+0x1f/0x69 [processor]
[11440.425318] [<c0408110>] ? cpuidle_enter_state+0x40/0xd0
[11440.425322] [<c040823e>] ? cpuidle_idle_call+0x9e/0x240
[11440.425326] [<c010999d>] ? arch_cpu_idle+0xd/0x30
[11440.425329] [<c0186b63>] ? cpu_startup_entry+0x1a3/0x210
[11440.425332] [<c04f3e71>] ? rest_init+0x71/0x80
[11440.425335] [<c06c1ab4>] ? start_kernel+0x39c/0x3a2
[11440.425338] [<c06c154f>] ? repair_env_string+0x51/0x51
[11440.425341] [<c06c1376>] ? i386_start_kernel+0x12c/0x12f
[11440.425342] handlers:
[11440.425352] [<f85c3490>] usb_hcd_irq [usbcore]
[11440.425360] [<f84c34c0>] ata_bmdma_interrupt [libata] <---- MY NOTE: IS ATA SOMEHOW RELATED?
[11440.425361] Disabling IRQ #16 <---- MY NOTE: IRQ16 DISABLED... WHY?
[11441.453258] nvidia 0000:01:00.0: irq 47 for MSI/MSI-X
[11441.454355] NVRM: RmInitAdapter failed! (0x23:0x2f:558)
[11441.454359] NVRM: rm_init_adapter(0) failed
At this point, if i rmmod the nvidia module and reprobe for it, i get:
[11685.809853] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=io+mem
[11685.809879] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:0614)
NVRM: installed in this system is not supported by the 325.15
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[11685.809890] nvidia: probe of 0000:01:00.0 failed with error -1
[11685.810087] NVRM: The NVIDIA probe routine failed for 1 device(s).
[11685.810089] NVRM: None of the NVIDIA graphics adapters were initialized!
[11685.810090] [drm] Module unloaded
I've to add that this problem started happening about one month ago, but unfortunately i can't say what is changed.
For sure the system was working with nvidia driver version 319.23 with kernel 3.9.6
After that, i've had this problem with:
linux-3.9.9 + nvidia 319.32
linux-3.10.9 + nvidia 325.15
The log messages suggested me it was an irq problem.
Before nvidia 325.15, the nvidia board shared the IRQ 16 with an ide/Ata adaptor (module named pata_jmicron)
With nvidia 325.15, the driver uses Message signaled interrupts, so that my /proc/interrupts is as follows:
CPU0 CPU1
0: 16188 16075 IO-APIC-edge timer
1: 10 6 IO-APIC-edge i8042
6: 2 1 IO-APIC-edge floppy
7: 1 0 IO-APIC-edge parport0
8: 0 1 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
16: 80 80 IO-APIC-fasteoi uhci_hcd:usb1, pata_jmicron
17: 22 19 IO-APIC-fasteoi snd_ice1712
18: 35 36 IO-APIC-fasteoi uhci_hcd:usb3, ehci_hcd:usb4, uhci_hcd:usb7, i801_smbus
19: 0 0 IO-APIC-fasteoi uhci_hcd:usb6
21: 0 0 IO-APIC-fasteoi uhci_hcd:usb2
23: 16 19 IO-APIC-fasteoi uhci_hcd:usb5, ehci_hcd:usb8
44: 8590 8679 PCI-MSI-edge ahci
45: 292 282 PCI-MSI-edge eth0
46: 158 159 PCI-MSI-edge snd_hda_intel
47: 216 203 PCI-MSI-edge nvidia
NMI: 14 11 Non-maskable interrupts
LOC: 14545 14270 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 14 11 Performance monitoring interrupts
IWI: 167 220 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 11746 11730 Rescheduling interrupts
CAL: 62 94 Function call interrupts
TLB: 199 175 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 0
MIS: 0
# lspci -v|grep "IRQ 16" -B3
00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 16
--
03:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller (prog-if 85 [Master SecO PriO])
Subsystem: ASUSTeK Computer Inc. Device 827e
Flags: bus master, fast devsel, latency 0, IRQ 16
So it seems that nvidia now is using irq 47, while pata_jmicron is using irq 16
-EDIT-
What i tried so far without success:
set the bios to plug'n play os=YES (was = no)
switch nvidia option UseEvents=True in xorg.conf
disabling the ATA incontroller that shared IRQ 16 with gpu board from the bios
Now i'm trying again with the old combination of linux-3.9.6 + nvidia-319.23, it is working since 20 minutes...
Last edited by kokoko3k (2013-08-28 16:23:39)
Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !
Offline
Oh man, i think (i hope) i solved the issue.
And the log message was right, the board was "falling" off the bus... i toke it out, cleaned connector and replugged it, and now it seems to work with latest driver kernel.
Will update the thread in case of other errors.
Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !
Offline
Offline