You are not logged in.

#1 2010-02-17 19:44:37

clanger
Member
Registered: 2010-01-04
Posts: 33

/proc/mtrr and incorrect GPU memory size

I've been trying to debug some random lockups that I think are related to an Nvidia GeForce9500GT and found (http://www.gentoo.org/doc/en/nvidia-guide.xml) a guide that talked about uncacheable registers reported in /proc/mtrr.

My /proc/mtrr looks like this:

> cat /proc/mtrr 
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c0000000 ( 3072MB), size=  256MB, count=1: write-back
reg03: base=0x0cff00000 ( 3327MB), size=    1MB, count=1: uncachable
reg04: base=0x100000000 ( 4096MB), size=  512MB, count=1: write-back
reg05: base=0x120000000 ( 4608MB), size=  256MB, count=1: write-back

Which (apparently) is "not good" because there are uncacheable regions and no write-combining regions. However I am completely over my head here and have no idea what any of this means. From elsewhere online I found posts about ATI cards having trouble if all registers are write-back/uncacheable. The suggested fix in the gentoo guide (change BIOS settings) doesn't work for me because my BIOS doesn't have the appropriate option. (it also doesn't let me assign an IRQ to VGA, meaning my nvidia card is sharing an IRQ with usb controllers, which could possibly be the cause of my crashes, but I when blacklisted ohci_hcd (the module servicing the interrupts) I still got a crash.. I'm still not ruling this out though). The mobo is a gigabyte MA770T-UD3P in case you want to avoid (I certainly will in future).

Anyway, this led me to see which registers the geforce is using. The 9500GT has 512MB of RAM. This is confirmed by the nvidia splash screen during boot. However lspci -v gives me the following:

01:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9500 GT] (rev a1) (prog-if 00 [VGA controller])
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at ef00 [size=128]
        [virtual] Expansion ROM at fb000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidia, nvidiafb

I understand this to be reporting that the card has 16+256+32M of RAM. I am wondering whether this could be related to my lockups? Are there any other instances of lspci reporting incorrect memory sizes? It seems like the GPU memory isn't using the uncacheable register, but is this the whole story? Is the lack of write-combining registers causing problems?

As I said, I am in over my head at this point. Any help at all, be it links or personal experience, would be very much appreciated. I've had the ultimatum "fix your crashes or your moving to windows" from my boss, which needless to say is not somewhere I want to go!

Thanks

Offline

#2 2010-02-18 00:36:01

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: /proc/mtrr and incorrect GPU memory size

Not much of an help here but my notebook has an ATI card and this is what I have in /proc/mtrr

reg00: base=0x000000000 (    0MB), size= 4096MB, count=1: write-back
reg01: base=0x100000000 ( 4096MB), size= 1024MB, count=1: write-back
reg02: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable

No crashes here or random lockups here, at least if using only the notebook's display, when I also use an external display sometimes I first get a mouse pointer corruption and then a hard lockup.

I also have this

dmesg | grep -i write-combining
mtrr: type mismatch for d0000000,8000000 old: write-back new: write-combining
mtrr: type mismatch for d0000000,8000000 old: write-back new: write-combining
mtrr: type mismatch for d0000000,8000000 old: write-back new: write-combining
mtrr: type mismatch for d0000000,8000000 old: write-back new: write-combining

Maybe the problem is something else or maybe someone may have a clue about that problem. Also you may want to try things with a new PSU, I've seen and experience many strange lockups that stopped happening after changing the PSU.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#3 2010-02-18 05:36:27

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

IIRC the nvidia driver uses PAT instead of MTRR anyway, so don't worry about the MTRR.

$ cat /proc/driver/nvidia/registry
...
UsePageAttributeTable: 1

Edit: Added URL.

Last edited by brebs (2011-01-10 17:07:19)

Offline

#4 2010-02-18 12:24:49

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

Thanks for the info, perhaps I should look elsewhere for the causes of my problem.

I left my computer running last night without X and it didn't crash, so I'm pretty sure it is something graphical which is causing the problem, but I just don't know what :s

Offline

#5 2010-02-18 12:45:23

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

clanger wrote:

Kernel modules: nvidia, nvidiafb

Try without nvidiafb - it only increases the chances of crashes, in my experience.

You have 2 nvidia driver versions to try - the unstable and stable versions. See nvidia driver current versions.

Offline

#6 2010-02-18 13:04:44

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

I'm not sure how to disable nvidiafb. It is already in /etc/modprobe.d/framebuffer_blacklist.conf and doesn't appear to be loaded as a module. Am I missing something here?

Offline

#7 2010-02-18 13:07:19

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

Show your config:

zgrep -i nvidia /proc/config.gz

Offline

#8 2010-02-18 13:08:42

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

> zgrep -i nvidia /proc/config.gz
CONFIG_AGP_NVIDIA=m
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_BACKLIGHT_MBP_NVIDIA=m

Does this mean nvidiafb is being compiled as a module into the kernel and the answer is to recompile the kernel with CONFIG_FB_NVIDIA not set?

I imagine that if I'm doing this I should unset CONFIG_AGP_NVIDIA since I don't even have an AGP slot on my mobo?

Last edited by clanger (2010-02-18 13:14:25)

Offline

#9 2010-02-18 13:21:17

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

clanger wrote:

CONFIG_FB_NVIDIA=m

Then lsmod must be showing it, yes?

$ lsmod | grep nvidia

I assume that lsmod shows nvidiafb, in which case: Try e.g.:

grep nvidia /etc/rc.d/*

Anyway, you can of course do this simple test:

stop Xorg
$ modprobe -r nvidiafb
$ modprobe -r nvidia
startx  (as your normal user)

Last edited by brebs (2010-02-18 13:22:27)

Offline

#10 2010-02-18 13:30:22

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

> lsmod | grep nvidia
nvidia               8795767  38 
i2c_core               15369  1 nvidia
agpgart                23331  2 nvidia,ati_agp

No sign of nvidiafb, yet it is still reported in lspci -v.
I can modprobe nvidiafb and it shows up in lsmod, but by default it is not there.
I guess the framebuffer_blacklist.conf is working?

If I leave X, remove both modules, and reload X, the nvidia module gets reloaded but nvidiafb does not.

Thanks for your timely help on this.

Last edited by clanger (2010-02-18 13:33:40)

Offline

#11 2010-02-18 13:45:19

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

clanger wrote:

I can modprobe nvidiafb and it shows up in lsmod, but by default it is not there.

OK, nvidiafb was a false alarm, lspci is just showing it as a potential, I suppose - ignore.

agpgart                23331  2 nvidia,ati_agp

What is loading ati_agp??

Add to /etc/modprobe.d/whatever.blacklist :
ati_agp

Offline

#12 2010-02-18 13:56:32

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge
        Subsystem: ATI Technologies Inc RX780/RX790 Chipset Host Bridge
        Flags: bus master, 66MHz, medium devsel, latency 32
        Memory at <ignored> (64-bit, non-prefetchable)
        Capabilities: <access denied>
        Kernel modules: ati-agp

Not sure exactly what it is doing. I rmmod'd it and everything still seems to be working. I will blacklist it and try out the substrate screensaver, that seems to pretty consistently cause a lockup.

Offline

#13 2010-02-18 14:46:01

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

Argh, still no luck. Had a lockup, with ati-agp unloaded, when loading a VirtualBox image. Nothing unusual in logs, had to turn off with the power button.

Edit: VirtualBox will crash <5 minutes after starting if the vboxnetadp or vboxnetflt modules are loaded. I can only guess that my kernel is configured wrong for my hardware and certain modules trigger this in some way. Removing ati-agp seems to have stopped lockups related to the GPU (i.e. when a graphics heavy screensaver like pixelcity or substrate is running), but the problem still comes up in others ways.

Still progress is being made, and thats something!

Thanks for your help brebs.

Last edited by clanger (2010-02-18 15:36:16)

Offline

#14 2010-02-18 17:44:56

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: /proc/mtrr and incorrect GPU memory size

Try recompiling the virtualbox modules and check if it still crashes.
A kernel update might have changed something making vbox crash the system. Also it might have something to do with (kernel?) performance counters. Now I don't see anything in dmesg but with vbox 3.1.2 when loading vboxdrv it would issue a warning to disable that otherwise it might cause trouble.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#15 2010-02-18 19:21:20

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

clanger wrote:

nvidia card is sharing an IRQ with usb controllers, which could possibly be the cause of my crashes

So this is worth a try:

In /etc/modprobe.d/whatever.conf

options nvidia NVreg_EnableMSI=1 NVreg_UsePageAttributeTable=1

Then stop xorg, modprobe -r nvidia, and restart xorg, and check with:

cat /proc/driver/nvidia/registry
cat /proc/interrupts

You will see something like:

32:    1680505    1491409   PCI-MSI-edge      nvidia

Which shows that MSI is being used.

Offline

#16 2010-02-19 05:29:31

djolk
Member
Registered: 2008-03-07
Posts: 59

Re: /proc/mtrr and incorrect GPU memory size

I have a junky /proc/mtrr using an intel GMA on a new Dell Inspiron 1440.

<code>[djolk@tunamelt Desktop]$ cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size=32768MB, count=1: write-back
reg01: base=0x0e0000000 ( 3584MB), size=  512MB, count=1: uncachable
reg02: base=0x0dde00000 ( 3550MB), size=    2MB, count=1: uncachable
reg03: base=0x0de000000 ( 3552MB), size=   32MB, count=1: uncachable
</code>

reg00 is way too enormous to be anything useful and my vid card doesn't show up at all.  I get a no writing combining, graphics may suffer at boot and they definitly do. I know the intel driver itself isn't super hot but KDE is downright clunky and dvd playback is grainy, out of sync and tears.

I've read a few posts regarding rewriting your own mtrr table but you have to be careful because you get a lot of hard locks.

Offline

#17 2010-02-19 12:07:03

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

I left the computer running over night with various programs running (including VirtualBox). It reset itself 4 hours after I left and the login prompt was waiting for me when I arrived this morning. There was nothing in the logs relating to the lockup, I only know when it crashed because of the usual startup logs. I experienced a hard lockup shortly after logging in.

I have followed brebs instructions and nvidia is using MSI now. I will see how it performs throughout the day. Fingers crossed.

Thanks for the help.

Offline

#18 2010-02-19 12:59:36

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

Still getting hard lockups. In fact, I think I am getting more now than I was previously. Perhaps the MSI/PageAttributeTable change is responsible. Off to read somemore about exactly what these options do.

> cat /proc/driver/nvidia/registry 
EnableVia4x: 0
EnableALiAGP: 0
NvAGP: 0
ReqAGPRate: 15
EnableAGPSBA: 0
EnableAGPFW: 0
Mobile: 4294967295
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
RemapLimit: 0
UpdateMemoryTypes: 4294967295
UseVBios: 1
RMEdgeIntrCheck: 1
UsePageAttributeTable: 1
EnableMSI: 1
MapRegistersEarly: 0
RegistryDwords: ""

Offline

#19 2010-02-19 13:53:54

Nexx
Member
Registered: 2007-08-08
Posts: 11

Re: /proc/mtrr and incorrect GPU memory size

djolk wrote:

I have a junky /proc/mtrr using an intel GMA on a new Dell Inspiron 1440.

<code>[djolk@tunamelt Desktop]$ cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size=32768MB, count=1: write-back
reg01: base=0x0e0000000 ( 3584MB), size=  512MB, count=1: uncachable
reg02: base=0x0dde00000 ( 3550MB), size=    2MB, count=1: uncachable
reg03: base=0x0de000000 ( 3552MB), size=   32MB, count=1: uncachable
</code>

reg00 is way too enormous to be anything useful and my vid card doesn't show up at all.  I get a no writing combining, graphics may suffer at boot and they definitly do. I know the intel driver itself isn't super hot but KDE is downright clunky and dvd playback is grainy, out of sync and tears.

I've read a few posts regarding rewriting your own mtrr table but you have to be careful because you get a lot of hard locks.

Add the following to the end of your kernel command line in /boot/grub/menu.lst
enable_mtrr_cleanup

I use this on my Fujitsu laptop and it corrects the entire MTRR table automatically at boot.

Offline

#20 2010-02-19 14:59:04

clanger
Member
Registered: 2010-01-04
Posts: 33

Re: /proc/mtrr and incorrect GPU memory size

Using the enable_mtrr_cleanup option changes my mtrr (see first post for orignal) to this:

> cat /proc/mtrr 
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c0000000 ( 3072MB), size=  256MB, count=1: write-back
reg03: base=0x0cff00000 ( 3327MB), size=    1MB, count=1: uncachable

Which seems to be just removing the last 2 registers. Still no write-combining sections, and it appears to be missing 768MB of RAM. free confirms that around 768MB is not being found. If it fixes my stability problems I will be happy to make the sacrifice though.

Offline

#21 2010-02-20 04:17:47

djolk
Member
Registered: 2008-03-07
Posts: 59

Re: /proc/mtrr and incorrect GPU memory size

enable_mtrr_doesn't have any affect on my system. I've tried compiling the git kernel from AUR and compiling it in with no affect as well.

I can't remember the error but I believe it was something to the effect of incorrect gran/chunk size.

I'm also 'missing' some ram.

Offline

#22 2010-02-20 04:51:22

anrxc
Member
From: Croatia
Registered: 2008-03-22
Posts: 834
Website

Re: /proc/mtrr and incorrect GPU memory size

This is how I solved mine,  lspci -v:
Intel GM965/GL960: Memory at d0000000 (64-bit, prefetchable) [size=256M]

Then added to /etc/rc.local:
# Fix MTRR on Intel graphics
/bin/echo "base=0xD0000000 size=0x10000000 type=write-combining" >| /proc/mtrr

Resulting /proc/mtrr

reg00: base=0x000000000 (    0MB), size= 1024MB, count=1: write-back
reg01: base=0x03f700000 ( 1015MB), size=    1MB, count=1: uncachable
reg02: base=0x03f800000 ( 1016MB), size=    8MB, count=1: uncachable
reg03: base=0x0d0000000 ( 3328MB), size=  256MB, count=2: write-combining

You can find a lot of discussion, and help, about this on Linux distribution forums.


You need to install an RTFM interface.

Offline

#23 2010-02-20 06:45:01

djolk
Member
Registered: 2008-03-07
Posts: 59

Re: /proc/mtrr and incorrect GPU memory size

I have been working on a script to add a write combining line however I believe the first problem that needs to be solved is:

reg00: base=0x000000000 (    0MB), size=32768MB, count=1: write-back

Reg00 should not be thirty two thousand mega bytes! It overlaps all the other mtrr regions and I can't add a write combining line.

There is a solution (ish) from ubuntu here:

http://ubuntuforums.org/showthread.php?t=1285176 but setting incorrect mtrrs causes lots of hard freezes so I have yet to get a working mtrr setup.

I am curious while its happened. I've read in several places that it occurs frequently on computers made by Dell. I've also read that mtrr isn't used anymore and PAT replaces it but if I touch my mtrr table my system tells me its using it.

Offline

#24 2010-02-20 07:19:54

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: /proc/mtrr and incorrect GPU memory size

You could try compiling mtrr-uncover, to see if it helps.

Offline

#25 2010-02-20 17:22:08

djolk
Member
Registered: 2008-03-07
Posts: 59

Re: /proc/mtrr and incorrect GPU memory size

I believe I've tried that as well. I'll have to take another looks at it though. I've looked at a bunch of stuff regarding this and I really think the only way to do it is to manually rewrite the mtrr table - I would like to find out WHY this happens and I have no idea what my /proc/mtrr should look like I'm also wondering if it counts as a bug...

Offline

Board footer

Powered by FluxBB