You are not logged in.

#1 2022-11-18 22:14:55

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Is my graphics card bad?

I installed an MSI Radeon RX 6600 into my PC about six or seven months ago. Up until that point, everything worked perfectly. I did this because I wanted a better GPU as I was planning on playing some VR games with a Valve Index (I had an RX 570 before the upgrade). After rebuilding my whole PC as the GPU wouldn't fit into my case, and reinstalling GRUB as my motherboard couldn't find it for some reason, I got into my system and thought everything was good. Ran some benchmarks, the performance was a lot better.

I then installed SteamVR and Beat Saber just to test everything out. This worked fine, but when I installed and tried to play Half-Life: Alyx, my PC hard reset and I saw a hardware error message during boot. What the message was exactly doesn't matter as it's different every time. Since then, even just starting SteamVR causes my whole system to instantly crash. I tried to find solutions online - I did manage to find some and tried changing my boot parameters to pcie_aspm=off iommu=pt amdgpu.noretry=0 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1, but this didn't help. I just gave up on SteamVR on Linux entirely and installed Windows 10 onto my second hard drive.

That wasn't the only problem though. About one time every two weeks, my system crashes and hard resets at random. This usually happens while I'm watching YouTube, but it actually happened today about 20 seconds after I logged in and KDE loaded. There was nothing in my kernel log as my PC just got hard reset, but I did manage to get some logs a few months back and the log was just completely full of amdgpu driver errors, mainly "error -125 couldn't initialise parser" or something like that.

This week, I tried playing HL:A in Windows, this ended up with me being able to play for about 10 secnods and then seeing nothing but grey and hearing the "USB disconnected" and "USB connected" sounds over and over for around a minute and then being able to play for another 10 seconds and so on. After doing some research on this, I found out that it might be due to my wi-fi interfering with the Index's Bluetooth base stations as they both use 2.4GHz. Switching to 5GHz or using Ethernet isn't an option in my case so I tried turning the wi-fi router off. This didn't resolve the issue. I then found another person on Reddit with the same issue who fixed this by plugging the headset into a different USB port. Almost everything I own is wired, so I always have a lot of USB cables connected to my PC, so I'm pretty sure the USB ports on my motherboard weren't the issue, but I tried a different USB port I was 100% sure was OK just in case. Nothing. Still grey. Disconnected. Connected. After some more searching, I found someone with the same problems - grey, keeps disconnecting and reconnecting. This person fixed their problem by trying their old GPU and seeing that that worked without any problems. I would do that, had I not made the mistake of immediately selling my old GPU as soon as I booted my PC and saw the new one "worked".

My question is: do you think my GPU is bad? Or could this be something else? I'm pretty sure it is since none of this has ever happened to me before the upgrade, but I want others' opinions on this just to be sure.

If you need any additional info I didn't include in this post, tell me and I'll post it as soon as I can.

Thank you for reading.

PC specs:
Motherboard: ASUS TUF GAMING B550-PLUS
GPU: MSI Radeon RX 6600 MECH 2X 8G
CPU: AMD Ryzen 5 1600X
RAM: 16GB DDR4 2400MHz
PSU: Corsair CX450M

Offline

#2 2022-11-18 23:11:17

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,624

Re: Is my graphics card bad?

Mrr7782 wrote:

...and reinstalling GRUB as my motherboard couldn't find it for some reason...

That's the expected behavior. On UEFI systems the location of the bootloader is stored on the NVRAM on the motherboard.

Almost everything I own is wired, so I always have a lot of USB cables connected to my PC, so I'm pretty sure the USB ports on my motherboard weren't the issue, but I tried a different USB port I was 100% sure was OK just in case.

The USB ports on a motherboard don't all have separate controllers, you'll usually only have a couple of controllers with inbuilt hubs meaning that half of your ports use each controller. VR USB headset issues are often caused by more than 1 device attempting to use the same controller - you need to either consult your motherboard manual or look at the output of lsusb -t to check that the headset is the only device connected. The USB controller(s) that are connected directly to the CPU usually have less issues than any that bridge to it using 3rd party chips.


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Online

#3 2022-11-19 00:10:53

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Re: Is my graphics card bad?

Slithery wrote:

That's the expected behavior. On UEFI systems the location of the bootloader is stored on the NVRAM on the motherboard.

Oh, that's good to know. Thank you for telling me that!

Slithery wrote:

The USB ports on a motherboard don't all have separate controllers (...) VR USB headset issues are often caused by more than 1 device attempting to use the same controller - you need to either consult your motherboard manual or look at the output of lsusb -t to check that the headset is the only device connected.

Used that command and found a bus that only has one device connected to it which I don't really need, I'll try disconnecting it and connecting the headset there next week.

But then there's still the issue of my PC hard-resetting for no reason at random. Any idea what I could do about that? Since my PC just hard-resets, there's nothing in the kernel log, it just ends. A few months ago, it didn't use to hard-reset, but just become unresponsive instead, and when it did that, the kernel log was full of amdgpu errors.

Offline

#4 2022-11-19 00:25:18

jonno2002
Member
Registered: 2016-11-21
Posts: 425

Re: Is my graphics card bad?

Mrr7782 wrote:

That wasn't the only problem though. About one time every two weeks, my system crashes and hard resets at random. This usually happens while I'm watching YouTube, but it actually happened today about 20 seconds after I logged in and KDE loaded. There was nothing in my kernel log as my PC just got hard reset, but I did manage to get some logs a few months back and the log was just completely full of amdgpu driver errors, mainly "error -125 couldn't initialise parser" or something like that.

this might help: https://bbs.archlinux.org/viewtopic.php … 0#p2013250

Offline

#5 2022-11-19 01:46:13

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Re: Is my graphics card bad?

I have had system freezes in the past due to my CPU and that was the fix for it, so I'm pretty sure it's not the CPU causing the instability as I don't get freezes but just straight up hard-resets, but thank you for that suggestion.

Offline

#6 2022-11-19 15:47:33

seth
Member
Registered: 2012-09-03
Posts: 33,337

Re: Is my graphics card bad?

I'm pretty sure it is since none of this has ever happened to me before the upgrade

Do you still have the old GPU for a cross test?

This week, I tried playing HL:A in Windows, this ended up with me being able to play for about 10 secnods and then seeing nothing but grey and hearing the "USB disconnected" and "USB connected" sounds over and over for around a minute and then being able to play for another 10 seconds and so on

I'd almost say you now exceed the TDP, but the new GPU apparently draws 20W *less* than the old one.
Did you maybe just forget to connect the dedicated power supply to the new GPU (or is it seated loose/badly)?

Ceterum censeo: 3rd link below…

Online

#7 2022-11-19 16:51:41

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,704

Re: Is my graphics card bad?

seth wrote:

Do you still have the old GPU for a cross test?

Mrr7782 wrote:

After some more searching, I found someone with the same problems - grey, keeps disconnecting and reconnecting. This person fixed their problem by trying their old GPU and seeing that that worked without any problems. I would do that, had I not made the mistake of immediately selling my old GPU as soon as I booted my PC and saw the new one "worked".


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#8 2022-11-24 14:22:38

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Re: Is my graphics card bad?

Thank you for your suggestions, everyone. I'm back home and will experiment with the Index again tomorrow.

seth wrote:

I'd almost say you now exceed the TDP, but the new GPU apparently draws 20W *less* than the old one.
Did you maybe just forget to connect the dedicated power supply to the new GPU (or is it seated loose/badly)?

I disconnected and reconnected the PSU cable to my GPU just in case, but it happened again in the worst time possible. I turned my PC on and did sudo pacman -Syu and... my PC hard-reset during a kernel upgrade. Fortunately for me, this is the second time it's happened, so I knew exactly what to do. My system wasn't bootable, but I fixed it in about 5 minutes by booting into the Arch Linux installation ISO, mounting my disk, arch-chrooting into it and reinstalling all the packages that got updated.

I now know how to fix this as I just said, but it's still extremely annoying that whenever my GPU feels like it, it'll just go "haha you're not booting this PC again, bye".

Since I've seen other people online that also use the RX 6600 on Arch, do you think the problem here could actually be my GPU being bad? I need a clear answer or a way to find out for sure as I'm seriously getting tired of this and I really feel like that might be it and that I can't do anything about it myself other than replacing it.

Offline

#9 2022-11-24 14:40:31

seth
Member
Registered: 2012-09-03
Posts: 33,337

Re: Is my graphics card bad?

I need a clear answer or a way to find out for sure

I really feel like that might be it and that I can't do anything about it myself other than replacing it.

We cannot tell you remotely whether there's a power leak or cold solder or blown capacitor or whatnot on the GPU and the only test to rule that out is to replace it.
What I can tell you is that when the HW spontanously hard-reboots, that's a HW issue - underpowered, overheated or defective.

UNLESS (you ignored that): 3rd link below.
Windows might be rebooting the system.

Online

#10 2022-11-24 14:47:50

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Re: Is my graphics card bad?

seth wrote:

We cannot tell you remotely whether there's a power leak or cold solder or blown capacitor or whatnot on the GPU and the only test to rule that out is to replace it.

I understand that. Actually, I'll try asking my younger brother if I can borrow his PC for some time and see if his GPU works fine if he lets me.

seth wrote:

UNLESS (you ignored that): 3rd link below.
Windows might be rebooting the system.

I disabled fast boot and hibernation as soon as I installed it.

Offline

#11 2022-11-24 16:37:26

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 18,869

Re: Is my graphics card bad?

Mrr7782 wrote:

I disabled fast boot and hibernation as soon as I installed it.

That does not mean that it remains off.   Thanks Microsoft.


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#12 2022-11-25 00:40:08

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Re: Is my graphics card bad?

ewaller wrote:

That does not mean that it remains off.   Thanks Microsoft.

You're right, I will check that in a few hours.

It just happened again and I really don't like that this is the 3rd time it's happened in the last 14 days, but this time it didn't hard-reset, but my PC did completely freeze. Couldn't switch to a tty, but I heard a lot of HDD activity, or at least something loud that I assumed was my HDD. I let it run for a while so that I'd have some log info and I didn't get much, but I did find something using journalctl. After reading it, it actually looks more like a KDE or Krita bug this time as something probably tried allocating 16GB of VRAM and that made the amdgpu module freeze my whole PC from what I understand. (GitHub Gist)

Last edited by Mrr7782 (2022-11-25 00:41:31)

Offline

#13 2022-11-25 08:41:26

seth
Member
Registered: 2012-09-03
Posts: 33,337

Re: Is my graphics card bad?

this time it didn't hard-reset, but my PC did completely freeze

So it's nowhere clear whether this is the same issue tbw.
But the last one looks like a KDE issue, I don't think the amdgpu module froze anything here (the buffer allocation was simply rejected) but along the HDD activity, I suspect you ran OOM what your heard was the swapping efforts? In that case

Couldn't switch to a tty

it could take a minute until you get a reaction for this.

Since krita and plasmashell both wanted to allocate a 17GB buffer I suspect that the "snapping" triggered a resize misbehavior, causing the window to take the maximum size (think extremely huge…) and the compositor and maybe some taskbar and krita on top of that trying to allocate GL buffers for that.

If you're looking for a *potential* common cause of that and reboots, run memtest86+ for at least a night.

Online

#14 2022-11-25 09:22:06

Awebb
Member
Registered: 2010-05-06
Posts: 6,044

Re: Is my graphics card bad?

ewaller wrote:
Mrr7782 wrote:

I disabled fast boot and hibernation as soon as I installed it.

That does not mean that it remains off.   Thanks Microsoft.

In "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Power" there is a DWORD called "HiberbootEnabled". 0 means off, 1 means on. Might be an avenue for scripting.

Offline

#15 2022-11-25 09:36:18

Mrr7782
Member
Registered: 2022-03-22
Posts: 7

Re: Is my graphics card bad?

So I've just booted with my younger brother's GPU (RX 570) in my PC and what I've already noticed is that it only showed me one amdgpu warning when booting.

kernel: amdgpu: SRAT table not found

I normally have this in my kernel log even when booting with my GPU, but mine also gives me this:

kernel: amdgpu 0000:0b:00.0: amdgpu: PSP runtime database doesn't exist

It's just a warning but I'm posting it anyway because I have absolutely no idea what it's supposed to mean.

Offline

#16 2022-11-25 15:04:35

seth
Member
Registered: 2012-09-03
Posts: 33,337

Online

Board footer

Powered by FluxBB