You are not logged in.
I just upgraded to kernel26-2.6.23.9-1 and nvidia-100.14.19, and now when the fancy kdemod ksplash happens, I get this stack trace and kde freezes (I can still ssh in)
NVRM: loading NVIDIA UNIX x86 Kernel Module 100.14.19 Wed Sep 12 14:12:24 PDT 2007
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000014c
printing eip:
f9f0ce7c
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: nvidia(P) w83627ehf hwmon_vid ipv6 ohci1394 ieee1394 firewire_ohci firewire_core crc_itu_t tsdev usbhid hid ff_memless usb_storage ide_core sky2 intel_agp agpgart ppp_generic sg evdev thermal processor fan button battery ac kqemu i2c_i801 i2c_dev i2c_core coretemp snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore slhc skge rtc ext3 jbd mbcache sr_mod cdrom sd_mod ehci_hcd uhci_hcd usbcore ahci ata_generic pata_jmicron libata
CPU: 0
EIP: 0060:[<f9f0ce7c>] Tainted: P VLI
EFLAGS: 00210246 (2.6.23-ARCH #1)
EIP is at os_set_mlock_capability+0xc/0x30 [nvidia]
eax: 00000000 ebx: 00000000 ecx: 00200202 edx: 00000000
esi: f6781000 edi: f6c48400 ebp: f37cbff0 esp: f3757ed8
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process ksplash (pid: 6803, ti=f3756000 task=f37a0000 task.ti=f3756000)
Stack: f9c34d35 f9c36d92 fa1ebd40 f5e5e480 f6d44b50 f9c38c26 f37ca000 0000005b
0000000c f9c35a32 fa1ebd40 f5e5e480 0000005b f6d44b50 f6d44b50 0000005b
fa1ebd40 f9f08f36 f37ca000 fa1ebd40 f5e5e480 0000005b f6d44b50 00000001
Call Trace:
[<f9c34d35>] _nv002553rm+0x5/0x10 [nvidia]
[<f9c36d92>] rm_write_watch_init+0x3b/0x54 [nvidia]
[<f9c38c26>] _nv002642rm+0x416/0x5e9 [nvidia]
[<f9c35a32>] rm_ioctl+0x3e/0x6d [nvidia]
[<f9f08f36>] nv_kern_ioctl+0xf6/0x3e0 [nvidia]
[<f9f09258>] nv_kern_unlocked_ioctl+0x18/0x20 [nvidia]
[<f9f09240>] nv_kern_unlocked_ioctl+0x0/0x20 [nvidia]
[<c018ae0b>] do_ioctl+0x2b/0x90
[<c018b09e>] vfs_ioctl+0x22e/0x2b0
[<c018b17d>] sys_ioctl+0x5d/0x70
[<c0104482>] sysenter_past_esp+0x6b/0xa1
[<c0360000>] wait_for_completion+0x30/0xa0
=======================
Code: 1b f4 21 c6 31 c0 c3 90 8d b4 26 00 00 00 00 64 a1 00 80 47 c0 8b 80 bc 00 00 00 c3 8d 76 00 64 a1 00 80 47 c0 8b 80 7c 04 00 00 <c7> 80 4c 01 00 00 ff ff ff ff 64 a1 00 80 47 c0 81 88 a0 01 00
EIP: [<f9f0ce7c>] os_set_mlock_capability+0xc/0x30 [nvidia] SS:ESP 0068:f3757ed8
I've "fixed" it by changing "nvidia" to "nv" in my xorg.conf, but I'm wondering if there's anything else I should do before filing a bug report? I couldn't find anything similar on flyspray, nor in the forums. The problem is definitely repeatable.
Last edited by slide_rule (2007-12-11 12:19:45)
Offline
It may be a bug in the nvidia driver... try http://www.nvnews.net/vbulletin/forumdisplay.php?f=14 (be sure to include a bug report as per http://www.nvnews.net/vbulletin/showthread.php?t=46678 .
Offline
Thanks, I found this thread. It seems like it is some kind of nvidia bug, but the thread over there has been quiet since October. I guess I'm just lucky enough to have hit a corner case, because google is startlingly unhelpful (either that or I haven't hit on the magic search terms). I've thought about a couple approaches to a fix:
1. Tweak nvidia-100.14.11 to work with kernel26-2.6.23.9-1 (my guess is that this isn't going to work, since it comes pre-built against a specific kernel, yes?)
2. Try to build a vanilla kernel and vanilla nvidia drivers
3. Downgrade kernel26 and nvidia drivers (I should have both packages in my cache)
4. Ignore it and wait for things to get cleared up.
Any thoughts?
Offline
slide_rule,
1. => use ABS to roll your own nvidia and nvidia-utils pkg.tar.gz
To know recursion, you must first know recursion.
Offline
I found the problem, it's my fault. Without realizing it, I installed a new kernel image without mounting my /boot partition first. This, apparently, has all kinds of interesting side effects. Thanks for the replies, but this one's on my shoulders.
Offline
i've got the same problem... but it hasn't anything to do with my /boot partition. i've not changed the default arch behavior, so it's always mounted at start.
i've added "-ignoreABI" in my kdmrc a few weeks ago, when the first issues of that kind arose. that worked for a while, but since a few days (and upgrades) even that stopped working. i really wonder, what those guys do, because my system is going from bad to worse. can't anyone just write code, that doesn't break working systems any more?
Offline
i've got the same problem... i really wonder, what those guys do, because my system is going from bad to worse. can't anyone just write code, that doesn't break working systems any more?
There's probably a solution, but insulting the devs without posting any actual information isn't a good way to get help.
Also, you should probably start a new thread, unless you've got precisely the same symptoms I had. Whether or not you do, more details than "the same problem" would be helpful. And please leave the sarcastic rhetorical questions outside.
Offline
There's probably a solution, but insulting the devs without posting any actual information isn't a good way to get help.
well, in fact he did:
i've added "-ignoreABI"
thus, no wonder.
this switch is not here for the sake of making people modifying some config file. that's not the least surprising that with different ABIs things do break. there's a damn reason why it doesn't want to start by default with mismatching ABIs.
To know recursion, you must first know recursion.
Offline
Agreed. I would think switches like ignoreABI are under the "if you need this switch, you'll know, otherwise, don't touch it!" category. I wasn't clear: I was hoping for more information about the problem ignoreABI 'solved.'
Offline
-ignoreABI does not actually 'solve' anything, instead it allows you to bypass a safety check so that you can try to use nvidia driver with a xorg ahead of the current nvidia release. it was added in response to the outcry around xorg 7.1 when both nvidia and ati were seriously lagging behind, leaving some consumers stale with an 'unusable' computer.
whatever, you may want to try the 169.04 beta driver.
Last edited by lloeki (2007-12-13 07:12:59)
To know recursion, you must first know recursion.
Offline
I'm having the same problem and have only a / partition for my boot.
Arch Oops: 0002 [#2]
Arch PREEMPT SMP
Arch CPU: 1
Arch EIP: 0060:[<e220de7c>] Tainted: P D VLI
Arch EFLAGS: 00210246 (2.6.23-ARCH #1)
Arch EIP is at os_set_mlock_capability+0xc/0x30 [nvidia]
Arch eax: 00000000 ebx: 00000000 ecx: 00200202 edx: 00000000
Arch esi: dc401800 edi: ddf54000 ebp: d2c05ff0 esp: d246bed8
Arch ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Arch Process mythtv-setup (pid: 7182, ti=d246a000 task=df123020 task.ti=d246a000)
Arch Stack: e1f35d35 e1f37d92 e24ecd40 deceec00 d7081d60 e1f39c26 d2c04000 0000005b
Arch 0000000c e1f36a32 e24ecd40 deceec00 0000005b d7081d60 d7081d60 0000005b
Arch e24ecd40 e2209f36 d2c04000 e24ecd40 deceec00 0000005b d7081d60 00000001
Arch Call Trace:
Arch [<e1f35d35>] _nv002553rm+0x5/0x10 [nvidia]
Arch [<e1f37d92>] rm_write_watch_init+0x3b/0x54 [nvidia]
Arch [<e1f39c26>] _nv002642rm+0x416/0x5e9 [nvidia]
Arch [<e1f36a32>] rm_ioctl+0x3e/0x6d [nvidia]
Arch [<e2209f36>] nv_kern_ioctl+0xf6/0x3e0 [nvidia]
Arch [<e220a258>] nv_kern_unlocked_ioctl+0x18/0x20 [nvidia]
Arch [<e220a240>] nv_kern_unlocked_ioctl+0x0/0x20 [nvidia]
Arch [<c018ae0b>] do_ioctl+0x2b/0x90
Arch [<c018b09e>] vfs_ioctl+0x22e/0x2b0
Arch [<c018b17d>] sys_ioctl+0x5d/0x70
Arch [<c0104482>] sysenter_past_esp+0x6b/0xa1
Arch =======================
Arch Code: 1b e4 f1 dd 31 c0 c3 90 8d b4 26 00 00 00 00 64 a1 00 80 47 c0 8b 80 bc 00 00 00 c3 8d 76 00 64 a1 00 80 47 c0 8b 80 7c 04 00 00 <c7> 80 4c 01 00 00 ff ff ff ff 64 a1 00 80 47 c0 81 88 a0 01 00
Arch EIP: [<e220de7c>] os_set_mlock_capability+0xc/0x30 [nvidia] SS:ESP 0068:d246bed8
Not sure what to do about it. I found the same threads via google but nothing else. I guess I can use the nv driver for a few months and see if it magically is fixed in the next update.
Offline