You are not logged in.

#1 2007-12-10 13:47:02

slide_rule
Member
From: loglogdecalog
Registered: 2007-09-16
Posts: 33

kernel oops with new nvidia drivers [solved - doh!]

I just upgraded to kernel26-2.6.23.9-1 and nvidia-100.14.19, and now when the fancy kdemod ksplash happens, I get this stack trace and kde freezes (I can still ssh in)

NVRM: loading NVIDIA UNIX x86 Kernel Module  100.14.19  Wed Sep 12 14:12:24 PDT 2007
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000014c
 printing eip:
f9f0ce7c
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP 
Modules linked in: nvidia(P) w83627ehf hwmon_vid ipv6 ohci1394 ieee1394 firewire_ohci firewire_core crc_itu_t tsdev usbhid hid ff_memless usb_storage ide_core sky2 intel_agp agpgart ppp_generic sg evdev thermal processor fan button battery ac kqemu i2c_i801 i2c_dev i2c_core coretemp snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore slhc skge rtc ext3 jbd mbcache sr_mod cdrom sd_mod ehci_hcd uhci_hcd usbcore ahci ata_generic pata_jmicron libata
CPU:    0
EIP:    0060:[<f9f0ce7c>]    Tainted: P        VLI
EFLAGS: 00210246   (2.6.23-ARCH #1)
EIP is at os_set_mlock_capability+0xc/0x30 [nvidia]
eax: 00000000   ebx: 00000000   ecx: 00200202   edx: 00000000
esi: f6781000   edi: f6c48400   ebp: f37cbff0   esp: f3757ed8
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process ksplash (pid: 6803, ti=f3756000 task=f37a0000 task.ti=f3756000)
Stack: f9c34d35 f9c36d92 fa1ebd40 f5e5e480 f6d44b50 f9c38c26 f37ca000 0000005b 
       0000000c f9c35a32 fa1ebd40 f5e5e480 0000005b f6d44b50 f6d44b50 0000005b 
       fa1ebd40 f9f08f36 f37ca000 fa1ebd40 f5e5e480 0000005b f6d44b50 00000001 
Call Trace:
 [<f9c34d35>] _nv002553rm+0x5/0x10 [nvidia]
 [<f9c36d92>] rm_write_watch_init+0x3b/0x54 [nvidia]
 [<f9c38c26>] _nv002642rm+0x416/0x5e9 [nvidia]
 [<f9c35a32>] rm_ioctl+0x3e/0x6d [nvidia]
 [<f9f08f36>] nv_kern_ioctl+0xf6/0x3e0 [nvidia]
 [<f9f09258>] nv_kern_unlocked_ioctl+0x18/0x20 [nvidia]
 [<f9f09240>] nv_kern_unlocked_ioctl+0x0/0x20 [nvidia]
 [<c018ae0b>] do_ioctl+0x2b/0x90
 [<c018b09e>] vfs_ioctl+0x22e/0x2b0
 [<c018b17d>] sys_ioctl+0x5d/0x70
 [<c0104482>] sysenter_past_esp+0x6b/0xa1
 [<c0360000>] wait_for_completion+0x30/0xa0
 =======================
Code: 1b f4 21 c6 31 c0 c3 90 8d b4 26 00 00 00 00 64 a1 00 80 47 c0 8b 80 bc 00 00 00 c3 8d 76 00 64 a1 00 80 47 c0 8b 80 7c 04 00 00 <c7> 80 4c 01 00 00 ff ff ff ff 64 a1 00 80 47 c0 81 88 a0 01 00 
EIP: [<f9f0ce7c>] os_set_mlock_capability+0xc/0x30 [nvidia] SS:ESP 0068:f3757ed8

I've "fixed" it by changing "nvidia" to "nv" in my xorg.conf, but I'm wondering if there's anything else I should do before filing a bug report?  I couldn't find anything similar on flyspray, nor in the forums.  The problem is definitely repeatable.

Last edited by slide_rule (2007-12-11 12:19:45)

Offline

#2 2007-12-10 14:18:01

Fackamato
Member
Registered: 2006-03-31
Posts: 579

Re: kernel oops with new nvidia drivers [solved - doh!]

It may be a bug in the nvidia driver... try http://www.nvnews.net/vbulletin/forumdisplay.php?f=14 (be sure to include a bug report as per http://www.nvnews.net/vbulletin/showthread.php?t=46678 .

Offline

#3 2007-12-10 19:38:48

slide_rule
Member
From: loglogdecalog
Registered: 2007-09-16
Posts: 33

Re: kernel oops with new nvidia drivers [solved - doh!]

Thanks, I found this thread.  It seems like it is some kind of nvidia bug, but the thread over there has been quiet since October.  I guess I'm just lucky enough to have hit a corner case, because google is startlingly unhelpful (either that or I haven't hit on the magic search terms).  I've thought about a couple approaches to a fix:

1. Tweak nvidia-100.14.11 to work with kernel26-2.6.23.9-1 (my guess is that this isn't going to work, since it comes pre-built against a specific kernel, yes?)
2. Try to build a vanilla kernel and vanilla nvidia drivers
3. Downgrade kernel26 and nvidia drivers (I should have both packages in my cache)
4. Ignore it and wait for things to get cleared up.

Any thoughts?

Offline

#4 2007-12-10 19:50:43

lloeki
Member
From: France
Registered: 2007-02-20
Posts: 456
Website

Re: kernel oops with new nvidia drivers [solved - doh!]

slide_rule,

1. => use ABS to roll your own nvidia and nvidia-utils pkg.tar.gz


To know recursion, you must first know recursion.

Offline

#5 2007-12-11 12:19:24

slide_rule
Member
From: loglogdecalog
Registered: 2007-09-16
Posts: 33

Re: kernel oops with new nvidia drivers [solved - doh!]

I found the problem, it's my fault.  Without realizing it, I installed a new kernel image without mounting my /boot partition first.  This, apparently, has all kinds of interesting side effects.  Thanks for the replies, but this one's on my shoulders.

Offline

#6 2007-12-12 09:35:05

eNTi
Member
Registered: 2006-04-30
Posts: 109

Re: kernel oops with new nvidia drivers [solved - doh!]

i've got the same problem... but it hasn't anything to do with my /boot partition. i've not changed the default arch behavior, so it's always mounted at start.

i've added "-ignoreABI" in my kdmrc a few weeks ago, when the first issues of that kind arose. that worked for a while, but since a few days (and upgrades) even that stopped working. i really wonder, what those guys do, because my system is going from bad to worse. can't anyone just write code, that doesn't break working systems any more?

Offline

#7 2007-12-12 14:39:52

slide_rule
Member
From: loglogdecalog
Registered: 2007-09-16
Posts: 33

Re: kernel oops with new nvidia drivers [solved - doh!]

eNTi wrote:

i've got the same problem... i really wonder, what those guys do, because my system is going from bad to worse. can't anyone just write code, that doesn't break working systems any more?

There's probably a solution, but insulting the devs without posting any actual information isn't a good way to get help.

Also, you should probably start a new thread, unless you've got precisely the same symptoms I had.  Whether or not you do, more details than "the same problem" would be helpful.  And please leave the sarcastic rhetorical questions outside.

Offline

#8 2007-12-12 19:23:19

lloeki
Member
From: France
Registered: 2007-02-20
Posts: 456
Website

Re: kernel oops with new nvidia drivers [solved - doh!]

There's probably a solution, but insulting the devs without posting any actual information isn't a good way to get help.

well, in fact he did:

i've added "-ignoreABI"

thus, no wonder.

this switch is not here for the sake of making people modifying some config file. that's not the least surprising that with different ABIs things do break. there's a damn reason why it doesn't want to start by default with mismatching ABIs.


To know recursion, you must first know recursion.

Offline

#9 2007-12-12 22:40:03

slide_rule
Member
From: loglogdecalog
Registered: 2007-09-16
Posts: 33

Re: kernel oops with new nvidia drivers [solved - doh!]

Agreed.  I would think switches like ignoreABI are under the "if you need this switch, you'll know, otherwise, don't touch it!" category.  I wasn't clear: I was hoping for more information about the problem ignoreABI 'solved.'

Offline

#10 2007-12-13 07:11:19

lloeki
Member
From: France
Registered: 2007-02-20
Posts: 456
Website

Re: kernel oops with new nvidia drivers [solved - doh!]

-ignoreABI does not actually 'solve' anything, instead it allows you to bypass a safety check so that you can try to use nvidia driver with a xorg ahead of the current nvidia release. it was added in response to the outcry around xorg 7.1 when both nvidia and ati were seriously lagging behind, leaving some consumers stale with an 'unusable' computer.

whatever, you may want to try the 169.04 beta driver.

Last edited by lloeki (2007-12-13 07:12:59)


To know recursion, you must first know recursion.

Offline

#11 2007-12-29 16:46:08

Erroneous
Member
Registered: 2006-08-28
Posts: 35

Re: kernel oops with new nvidia drivers [solved - doh!]

I'm having the same problem and have only a / partition for my boot.

Arch Oops: 0002 [#2]
 Arch PREEMPT SMP
 Arch CPU:    1
 Arch EIP:    0060:[<e220de7c>]    Tainted: P      D VLI
 Arch EFLAGS: 00210246   (2.6.23-ARCH #1)
 Arch EIP is at os_set_mlock_capability+0xc/0x30 [nvidia]
 Arch eax: 00000000   ebx: 00000000   ecx: 00200202   edx: 00000000
 Arch esi: dc401800   edi: ddf54000   ebp: d2c05ff0   esp: d246bed8
 Arch ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
 Arch Process mythtv-setup (pid: 7182, ti=d246a000 task=df123020 task.ti=d246a000)
 Arch Stack: e1f35d35 e1f37d92 e24ecd40 deceec00 d7081d60 e1f39c26 d2c04000 0000005b
 Arch 0000000c e1f36a32 e24ecd40 deceec00 0000005b d7081d60 d7081d60 0000005b
 Arch e24ecd40 e2209f36 d2c04000 e24ecd40 deceec00 0000005b d7081d60 00000001
 Arch Call Trace:
 Arch [<e1f35d35>] _nv002553rm+0x5/0x10 [nvidia]
 Arch [<e1f37d92>] rm_write_watch_init+0x3b/0x54 [nvidia]
 Arch [<e1f39c26>] _nv002642rm+0x416/0x5e9 [nvidia]
 Arch [<e1f36a32>] rm_ioctl+0x3e/0x6d [nvidia]
 Arch [<e2209f36>] nv_kern_ioctl+0xf6/0x3e0 [nvidia]
 Arch [<e220a258>] nv_kern_unlocked_ioctl+0x18/0x20 [nvidia]
 Arch [<e220a240>] nv_kern_unlocked_ioctl+0x0/0x20 [nvidia]
 Arch [<c018ae0b>] do_ioctl+0x2b/0x90
 Arch [<c018b09e>] vfs_ioctl+0x22e/0x2b0
 Arch [<c018b17d>] sys_ioctl+0x5d/0x70
 Arch [<c0104482>] sysenter_past_esp+0x6b/0xa1
 Arch =======================
 Arch Code: 1b e4 f1 dd 31 c0 c3 90 8d b4 26 00 00 00 00 64 a1 00 80 47 c0 8b 80 bc 00 00 00 c3 8d 76 00 64 a1 00 80 47 c0 8b 80 7c 04 00 00 <c7> 80 4c 01 00 00 ff ff ff ff 64 a1 00 80 47 c0 81 88 a0 01 00
 Arch EIP: [<e220de7c>] os_set_mlock_capability+0xc/0x30 [nvidia] SS:ESP 0068:d246bed8

Not sure what to do about it. I found the same threads via google but nothing else. I guess I can use the nv driver for a few months and see if it magically is fixed in the next update.

Offline

Board footer

Powered by FluxBB