You are not logged in.

#1 2010-07-07 07:42:01

cactus.ed
Member
From: The lonesome crowded west
Registered: 2007-08-11
Posts: 32

NVIDIA -- Fried two graphics cards in two months using tty console...

I do not game or over-clock. System has excellent airflow... I don't even use very much 3D.

In the past 2 months I have had two graphics cards fail (XFX GeForce 8600 GT and BFG 9800 GT).

At first my monitors would randomly go to sleep when I was using the console (NOT running X server). I
have tried everything to get the monitor to wake up -- nothing worked but a hard reboot. When I started the X server,
the fan ran at 100% for a couple minutes before quieting down.

Kernel log:
Jun 18 16:43:48 ArchMain kernel: NVRM: os_map_kernel_space: won't map address 0x0 UC!
Jun 18 16:43:48 ArchMain kernel: NVRM: RmInitAdapter failed! (0x26:0xffffffff:1076)
Jun 18 16:43:48 ArchMain kernel: NVRM: rm_init_adapter(0) failed

After about a month, screen started shaking and red and green dots started flashing at random. Card was dead.

Same story for the second card.

It took me a little while to figure it out what was happening... My NVIDIA cards were overheating when X was not
running. They would heat up to 90-100C and then poweroff the monitors. When I started X, the fan would run at 100%
until the card cooled down.

From what I see, the problem is caused by a couple things:
    1. The GPU fan does not seem to activate when the X server is not running.
    2. The GPU seems to run at a max power state when the X server is not running.

I do not start X (no kdm, no gdm, no xdm...) on boot. I use agetty as my login manager and have added to following to my .bashrc 

 
if [ -z "$DISPLAY" ] && [ $(tty) == /dev/tty1 ]; then
  exec /usr/bin/startx &>/dev/null
  clear
  exit
fi

I believe the combination of the problem stated above and the fact that I do not use a graphical login manager (leaving
my computer on for many hours with no X server running) caused the cards to overheat and quickly fail.

Has anybody else experienced this issue?

Kernel 2.6.34 (x86_64), NVIDIA 256.35

Last edited by cactus.ed (2010-07-07 07:43:40)

Offline

#2 2010-07-07 09:52:06

slumslayer
Member
From: Belgium
Registered: 2008-09-14
Posts: 66

Re: NVIDIA -- Fried two graphics cards in two months using tty console...

cactus.ed wrote:

From what I see, the problem is caused by a couple things:
    1. The GPU fan does not seem to activate when the X server is not running.
    2. The GPU seems to run at a max power state when the X server is not running.

Loading nvidia driver even when not using X may fix this.

Maybe nvclock could help too (http://www.linuxhardware.org/nvclock/).

Nvclock Features wrote:

...
Low-level Overclocking for all Nvidia cards except for the riva128/riva128zx
Additional Coolbits overclocking for GeforceFX/6/7/8 (desktop) cards
Hardware monitoring (including temperature reading, fanspeed adjustments)
...

Offline

#3 2010-07-07 17:29:15

cactus.ed
Member
From: The lonesome crowded west
Registered: 2007-08-11
Posts: 32

Re: NVIDIA -- Fried two graphics cards in two months using tty console...

slumslayer wrote:
cactus.ed wrote:

From what I see, the problem is caused by a couple things:
    1. The GPU fan does not seem to activate when the X server is not running.
    2. The GPU seems to run at a max power state when the X server is not running.

Loading nvidia driver even when not using X may fix this.

Maybe nvclock could help too (http://www.linuxhardware.org/nvclock/).

Nvclock Features wrote:

...
Low-level Overclocking for all Nvidia cards except for the riva128/riva128zx
Additional Coolbits overclocking for GeforceFX/6/7/8 (desktop) cards
Hardware monitoring (including temperature reading, fanspeed adjustments)
...

NVIDIA module is loaded at boot... I tried Nvclock, it does not seem to work for fan speed on my card(s) -- even with force option.

Last edited by cactus.ed (2010-07-07 17:50:51)

Offline

Board footer

Powered by FluxBB