You are not logged in.
Just in case someone stumbles on this in the future, my GPU was dead. I borrowed another to test with and everything works as it did before.
Original Post:
I'll just start with my problem and provide all of the information that I believe to be relevant before I start pouring details into this post. When I turn on my machine, my monitors turn on, but they immediately say "no signal, going into sleep mode" or something to that effect. My machine boots up just fine, but I can't see anything. Here is all of the output files that I think other users would ask for, if more information is needed let me know and I'll post it. I see there are other recent posts with the same issue, or similar issue. Not sure if it's common or caused by a recent update. My last pacman update was a couple of hours ago.
[dpad@machine Temporary]$ pacman -Q | grep nvidia
lib32-nvidia-utils 550.78-1
nvidia 550.78-2
nvidia-settings 550.78-1
nvidia-utils 550.78-1/etc/mkinitcpio.conf
/boot/grub/grub.cfg
My keyboard is on and I can hear all of the typical sounds you'd hear when the machine fires up. HDD spinning, GPU fans turning on, the small popping sound coming from my speakers that are connected to my USB sound card, etc. Things were working just fine a couple of days ago. My issue began when I was trying to play Elden Ring with a friend but I was getting this crazy flickering effect where my primary monitor was flickering between my desktop and the game. This has happened with another game before and I fixed it by going to borderless window mode. This wasn't working though because Elden Ring prompts you to save changes or it reverts in 10 seconds, and the prompt window would never pop up.
I found a reddit comment that mentioned enabling full composition pipeline in nvidia settings, so I tried that. It didn't fix my issue but switching to proton version 8 did, so I just started playing. Soon after, my GPU crashed. My monitors blacked out, the fans on my PC spiked to full speed, but I could still hear the game audio in my speakers. I restarted the machine, but it took several times hitting the reset button before my monitors actually came on. That's been happening ever since. Even after disabling full pipeline in Nvidia settings. It feels like rolling the dice just turning my machine on. Prior to me changing settings with `sudo nvidia-settings`, there was no xorg.conf file in /etc/X11. Nvidia-settings created one. I've since then deleted it to see if it would fix my problem, but it doesn't.
I've googled every combination of words I could think of to find a solution, to no avail. Unplugging my DPI cables from my GPU then plugging them back again doesn't work. Pressing CTRL + ALT + F2/3/4/5 doesn't work. Occasionally holding shift down after restarting gets me into Grub, and then everything works fine, but that usually takes several power cycles. I know hitting the reset button or holding power is a bad idea but I don't know how else to force the machine off when I can't login and see my screen.
I'm using the non-lts linux kernel, 6.8.9-arch1-2
Last edited by dpad (2024-05-13 23:08:04)
echo "Hello, friends!"
Offline
well this is the first thing that stands out:
May 11 11:26:47 dpadllc kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
May 11 11:26:47 dpadllc kernel: NVRM: No NVIDIA GPU found.
May 11 11:26:47 dpadllc kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 237
May 11 11:26:47 dpadllc kernel: usbcore: registered new interface driver snd-usb-audio
May 11 11:26:47 dpadllc systemd-modules-load[395]: Failed to insert module 'nvidia_uvm': No such deviceit sounds like a faulty graphics card to me BUT there are a few things to try...
remove 'kms' from the hooks in mkinitcpio.conf
HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block filesystems fsck)and add
nvidia_drm.modeset=1to your kernel parameters.
also try another graphics card if you have one
EDIT: and of course the usual troubleshooting can be done, reseat the graphics card, replug the gpu power connectors, try another power supply.... etc etc
Last edited by jonno2002 (2024-05-12 02:26:55)
Offline
07:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080 Ti] (rev a1)but
May 11 11:26:46 dpadllc kernel: pci 0000:07:00.0: [1022:148a] type 00 class 0x130000 PCIe Endpoint
May 11 11:26:46 dpadllc kernel: pci 0000:07:00.0: enabling Extended Tags
May 11 11:26:46 dpadllc kernel: pci 0000:07:00.0: Adding to iommu group 131022 is AMD and in particular https://devicehunt.com/view/type/pci/ve … evice/148A and there's abolsutely no trace of an nvidia chip in the journal.
You're not fixing that w/ any software/driver/kernel adjustments - the GPU might be badly seated in the slot or underpowered (forgot to attach the 6/8-pin power connector?) or the PSU is underdimensioned to the GPU is falling apart.
ceterum censeo: is there a parallel windows installation?
Offline
Thanks for the replies. I'll try the first suggestion at the end of the work day, I'm afraid to mess something up while it's working fine and lose a day of production.
ceterum censeo: is there a parallel windows installation?
No, there's no Windows install aside from a Win 10 Virtual Machine installed on /mnt/sdb. What's weird is that it's not an issue at all if I just put my machine to sleep, so I haven't done a full shut down since Saturday morning. My uptime is 1d, 7h right now with no issues. Nothing like this has ever happened before until that crash I mentioned in my original post. It wouldn't shock me if I fried something when I enabled full composition pipeline. That doesn't make much sense to me, but the nature of the crash was weird. Screens blacking out and fans ramping up to max RPM sounds like a serious issue. It hasn't crashed again since I disabled full composition pipeline.
While the machine has been running though, I've done all the usual things that I do with it and haven't had any problems at all. The issue seems to happen exclusively on a fresh boot.
Here's the output of journalctl -b, since this boot seems to be working. I'll admit I'm not advanced enough to understand half the things I see in this output, but grepping for nvidia shows a few lines that may mean something to someone else here.
May 12 01:36:26 dpadllc kernel: nvidia: loading out-of-tree module taints kernel.
May 12 01:36:26 dpadllc kernel: nvidia: module license 'NVIDIA' taints kernel.
May 12 01:36:26 dpadllc kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
May 12 01:36:26 dpadllc kernel: nvidia: module license taints kernel.
May 12 01:36:27 dpadllc kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
May 12 01:36:27 dpadllc kernel: nvidia 0000:07:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
May 12 01:36:27 dpadllc (udev-worker)[454]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia /proc/devices | cut -d \ -f 1) 255'' failed with exit code 1.
May 12 01:36:27 dpadllc (udev-worker)[465]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia /proc/devices | cut -d \ -f 1) 255'' failed with exit code 1.
May 12 01:36:27 dpadllc (udev-worker)[454]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \ -f 4); do /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c $(grep nvidia /proc/devices | cut -d \ -f 1) ${i}; done'' failed with exit code 1.
May 12 01:36:27 dpadllc (udev-worker)[465]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \ -f 4); do /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c $(grep nvidia /proc/devices | cut -d \ -f 1) ${i}; done'' failed with exit code 1.
May 12 01:36:27 dpadllc kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.78 Sun Apr 14 06:23:31 UTC 2024
May 12 01:36:27 dpadllc kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
May 12 01:36:27 dpadllc (udev-worker)[465]: nvidia_modeset: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidia-modeset c $(grep nvidia /proc/devices | cut -d \ -f 1) 254'' failed with exit code 1.
May 12 01:36:27 dpadllc kernel: [drm] [nvidia-drm] [GPU ID 0x00000700] Loading driver
May 12 01:36:27 dpadllc kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:07:00.0 on minor 1
May 12 01:36:27 dpadllc systemd-modules-load[394]: Inserted module 'nvidia_uvm'
May 12 01:36:27 dpadllc kernel: nvidia-uvm: Loaded the UVM driver, major device number 235.EDIT: Not 20 minutes after making this post, my screens went black. I'm going to open it up now and reseat my GPU
Last edited by dpad (2024-05-13 14:34:16)
echo "Hello, friends!"
Offline
I fried something when I enabled full composition pipeline
Unlikely.
There're obvious issues w/ the bus your 4TB seagate HDD is attached to and also the LG BR writer flares up.
I'd not be surprised if the latter causes PCI issues that ultimately affect the GPU and I'd suggest to detach that (SATA cable only, you don't have to screw it out of the case ![]()
Offline
I've had the LG BR device in here since I first built the PC in 2016, and never used it, I'll just unplug that entirely. I need the 4TB drive but I'll unplug it for now and see if it boots at all.
echo "Hello, friends!"
Offline
I had to pull my GPU out to unplug the SATA cables, so I had a chance to make sure it was seated well again. It's still not working. The monitors don't even flash on to say no signal now, they just sit here in sleep mode.
echo "Hello, friends!"
Offline
I don't know why this would've happened as soon as I started messing with nvidia-settings, but I went ahead and RMA'd the GPU since nothing I've tried will make my screens come back on, and it's still under warranty. I'll mark the thread as solved now and when I get my replacement, I'll just make another post if the issue persists. It cut out right in the middle of watching YouTube this morning but audio was still playing so I'm feeling pretty confident that the card is either dead or dying. I'm leaving that Blue Ray drive unplugged since Seth pointed out that there were issues with it. That thing has been sitting in my case since 2016 with the same Spirited Away BR disc in it the entire time anyway. Thanks for the tips and assistance.
echo "Hello, friends!"
Offline