You are not logged in.
Pages: 1
I installed an MSI Radeon RX 6600 into my PC about six or seven months ago. Up until that point, everything worked perfectly. I did this because I wanted a better GPU as I was planning on playing some VR games with a Valve Index (I had an RX 570 before the upgrade). After rebuilding my whole PC as the GPU wouldn't fit into my case, and reinstalling GRUB as my motherboard couldn't find it for some reason, I got into my system and thought everything was good. Ran some benchmarks, the performance was a lot better.
I then installed SteamVR and Beat Saber just to test everything out. This worked fine, but when I installed and tried to play Half-Life: Alyx, my PC hard reset and I saw a hardware error message during boot. What the message was exactly doesn't matter as it's different every time. Since then, even just starting SteamVR causes my whole system to instantly crash. I tried to find solutions online - I did manage to find some and tried changing my boot parameters to pcie_aspm=off iommu=pt amdgpu.noretry=0 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1, but this didn't help. I just gave up on SteamVR on Linux entirely and installed Windows 10 onto my second hard drive.
That wasn't the only problem though. About one time every two weeks, my system crashes and hard resets at random. This usually happens while I'm watching YouTube, but it actually happened today about 20 seconds after I logged in and KDE loaded. There was nothing in my kernel log as my PC just got hard reset, but I did manage to get some logs a few months back and the log was just completely full of amdgpu driver errors, mainly "error -125 couldn't initialise parser" or something like that.
This week, I tried playing HL:A in Windows, this ended up with me being able to play for about 10 secnods and then seeing nothing but grey and hearing the "USB disconnected" and "USB connected" sounds over and over for around a minute and then being able to play for another 10 seconds and so on. After doing some research on this, I found out that it might be due to my wi-fi interfering with the Index's Bluetooth base stations as they both use 2.4GHz. Switching to 5GHz or using Ethernet isn't an option in my case so I tried turning the wi-fi router off. This didn't resolve the issue. I then found another person on Reddit with the same issue who fixed this by plugging the headset into a different USB port. Almost everything I own is wired, so I always have a lot of USB cables connected to my PC, so I'm pretty sure the USB ports on my motherboard weren't the issue, but I tried a different USB port I was 100% sure was OK just in case. Nothing. Still grey. Disconnected. Connected. After some more searching, I found someone with the same problems - grey, keeps disconnecting and reconnecting. This person fixed their problem by trying their old GPU and seeing that that worked without any problems. I would do that, had I not made the mistake of immediately selling my old GPU as soon as I booted my PC and saw the new one "worked".
My question is: do you think my GPU is bad? Or could this be something else? I'm pretty sure it is since none of this has ever happened to me before the upgrade, but I want others' opinions on this just to be sure.
If you need any additional info I didn't include in this post, tell me and I'll post it as soon as I can.
Thank you for reading.
PC specs:
Motherboard: ASUS TUF GAMING B550-PLUS
GPU: MSI Radeon RX 6600 MECH 2X 8G
CPU: AMD Ryzen 5 1600X
RAM: 16GB DDR4 2400MHz
PSU: Corsair CX450M
Offline
...and reinstalling GRUB as my motherboard couldn't find it for some reason...
That's the expected behavior. On UEFI systems the location of the bootloader is stored on the NVRAM on the motherboard.
Almost everything I own is wired, so I always have a lot of USB cables connected to my PC, so I'm pretty sure the USB ports on my motherboard weren't the issue, but I tried a different USB port I was 100% sure was OK just in case.
The USB ports on a motherboard don't all have separate controllers, you'll usually only have a couple of controllers with inbuilt hubs meaning that half of your ports use each controller. VR USB headset issues are often caused by more than 1 device attempting to use the same controller - you need to either consult your motherboard manual or look at the output of lsusb -t to check that the headset is the only device connected. The USB controller(s) that are connected directly to the CPU usually have less issues than any that bridge to it using 3rd party chips.
Offline
That's the expected behavior. On UEFI systems the location of the bootloader is stored on the NVRAM on the motherboard.
Oh, that's good to know. Thank you for telling me that!
The USB ports on a motherboard don't all have separate controllers (...) VR USB headset issues are often caused by more than 1 device attempting to use the same controller - you need to either consult your motherboard manual or look at the output of lsusb -t to check that the headset is the only device connected.
Used that command and found a bus that only has one device connected to it which I don't really need, I'll try disconnecting it and connecting the headset there next week.
But then there's still the issue of my PC hard-resetting for no reason at random. Any idea what I could do about that? Since my PC just hard-resets, there's nothing in the kernel log, it just ends. A few months ago, it didn't use to hard-reset, but just become unresponsive instead, and when it did that, the kernel log was full of amdgpu errors.
Offline
That wasn't the only problem though. About one time every two weeks, my system crashes and hard resets at random. This usually happens while I'm watching YouTube, but it actually happened today about 20 seconds after I logged in and KDE loaded. There was nothing in my kernel log as my PC just got hard reset, but I did manage to get some logs a few months back and the log was just completely full of amdgpu driver errors, mainly "error -125 couldn't initialise parser" or something like that.
this might help: https://bbs.archlinux.org/viewtopic.php … 0#p2013250
Offline
this might help: https://bbs.archlinux.org/viewtopic.php … 0#p2013250
I have had system freezes in the past due to my CPU and that was the fix for it, so I'm pretty sure it's not the CPU causing the instability as I don't get freezes but just straight up hard-resets, but thank you for that suggestion.
Offline
I'm pretty sure it is since none of this has ever happened to me before the upgrade
Do you still have the old GPU for a cross test?
This week, I tried playing HL:A in Windows, this ended up with me being able to play for about 10 secnods and then seeing nothing but grey and hearing the "USB disconnected" and "USB connected" sounds over and over for around a minute and then being able to play for another 10 seconds and so on
I'd almost say you now exceed the TDP, but the new GPU apparently draws 20W *less* than the old one.
Did you maybe just forget to connect the dedicated power supply to the new GPU (or is it seated loose/badly)?
Ceterum censeo: 3rd link below…
Offline
Do you still have the old GPU for a cross test?
After some more searching, I found someone with the same problems - grey, keeps disconnecting and reconnecting. This person fixed their problem by trying their old GPU and seeing that that worked without any problems. I would do that, had I not made the mistake of immediately selling my old GPU as soon as I booted my PC and saw the new one "worked".
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
Thank you for your suggestions, everyone. I'm back home and will experiment with the Index again tomorrow.
I'd almost say you now exceed the TDP, but the new GPU apparently draws 20W *less* than the old one.
Did you maybe just forget to connect the dedicated power supply to the new GPU (or is it seated loose/badly)?
I disconnected and reconnected the PSU cable to my GPU just in case, but it happened again in the worst time possible. I turned my PC on and did sudo pacman -Syu and... my PC hard-reset during a kernel upgrade. Fortunately for me, this is the second time it's happened, so I knew exactly what to do. My system wasn't bootable, but I fixed it in about 5 minutes by booting into the Arch Linux installation ISO, mounting my disk, arch-chrooting into it and reinstalling all the packages that got updated.
I now know how to fix this as I just said, but it's still extremely annoying that whenever my GPU feels like it, it'll just go "haha you're not booting this PC again, bye".
Since I've seen other people online that also use the RX 6600 on Arch, do you think the problem here could actually be my GPU being bad? I need a clear answer or a way to find out for sure as I'm seriously getting tired of this and I really feel like that might be it and that I can't do anything about it myself other than replacing it.
Offline
I need a clear answer or a way to find out for sure
I really feel like that might be it and that I can't do anything about it myself other than replacing it.
We cannot tell you remotely whether there's a power leak or cold solder or blown capacitor or whatnot on the GPU and the only test to rule that out is to replace it.
What I can tell you is that when the HW spontanously hard-reboots, that's a HW issue - underpowered, overheated or defective.
UNLESS (you ignored that): 3rd link below.
Windows might be rebooting the system.
Offline
We cannot tell you remotely whether there's a power leak or cold solder or blown capacitor or whatnot on the GPU and the only test to rule that out is to replace it.
I understand that. Actually, I'll try asking my younger brother if I can borrow his PC for some time and see if his GPU works fine if he lets me.
UNLESS (you ignored that): 3rd link below.
Windows might be rebooting the system.
I disabled fast boot and hibernation as soon as I installed it.
Offline
I disabled fast boot and hibernation as soon as I installed it.
That does not mean that it remains off. Thanks Microsoft.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
That does not mean that it remains off. Thanks Microsoft.
You're right, I will check that in a few hours.
It just happened again and I really don't like that this is the 3rd time it's happened in the last 14 days, but this time it didn't hard-reset, but my PC did completely freeze. Couldn't switch to a tty, but I heard a lot of HDD activity, or at least something loud that I assumed was my HDD. I let it run for a while so that I'd have some log info and I didn't get much, but I did find something using journalctl. After reading it, it actually looks more like a KDE or Krita bug this time as something probably tried allocating 16GB of VRAM and that made the amdgpu module freeze my whole PC from what I understand. (GitHub Gist)
Last edited by Mrr7782 (2022-11-25 00:41:31)
Offline
this time it didn't hard-reset, but my PC did completely freeze
So it's nowhere clear whether this is the same issue tbw.
But the last one looks like a KDE issue, I don't think the amdgpu module froze anything here (the buffer allocation was simply rejected) but along the HDD activity, I suspect you ran OOM what your heard was the swapping efforts? In that case
Couldn't switch to a tty
it could take a minute until you get a reaction for this.
Since krita and plasmashell both wanted to allocate a 17GB buffer I suspect that the "snapping" triggered a resize misbehavior, causing the window to take the maximum size (think extremely huge…) and the compositor and maybe some taskbar and krita on top of that trying to allocate GL buffers for that.
If you're looking for a *potential* common cause of that and reboots, run memtest86+ for at least a night.
Offline
Mrr7782 wrote:I disabled fast boot and hibernation as soon as I installed it.
That does not mean that it remains off. Thanks Microsoft.
In "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Power" there is a DWORD called "HiberbootEnabled". 0 means off, 1 means on. Might be an avenue for scripting.
Offline
So I've just booted with my younger brother's GPU (RX 570) in my PC and what I've already noticed is that it only showed me one amdgpu warning when booting.
kernel: amdgpu: SRAT table not found
I normally have this in my kernel log even when booting with my GPU, but mine also gives me this:
kernel: amdgpu 0000:0b:00.0: amdgpu: PSP runtime database doesn't exist
It's just a warning but I'm posting it anyway because I have absolutely no idea what it's supposed to mean.
Offline
https://en.wikipedia.org/wiki/AMD_Platf … _Processor
The warning in and by itself would be harmless
Offline
I'd first like to say I'm sorry for not posting any updates for so long, and that I think I might have found out what my problem is.
On the 25th, I tried playing HL:A (on Windows 10) again as that game caused problems 100% of the time with my GPU. With my younger brother's GPU (RX 570), I was able to play for around 30 minutes without the headset disconnecting or losing tracking once. I wasn't able to test if it fixed my PC hard-resetting at random because I didn't have it for long enough (again, it only does this like once per two weeks), BUT I did manage to play some VR games on Linux (previously an instant hard-reset) and they worked fine, so I think it would be safe to say my GPU really is the problem and that the only solution is to replace it.
In "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Power" there is a DWORD called "HiberbootEnabled". 0 means off, 1 means on.
HiberbootEnabled is set to 0.
I am planning on running memtest86+ just in case to make sure nothing else is wrong and will post an update when I do that. I'll obviously also replace my GPU, post an update, and mark this thread as solved when I test everything and feel like the issue has been solved.
Offline
Update: I've replaced my GPU yesterday so we'll see how it goes. I don't really have any time to test it right now due to school, but I'll post an update as soon as I do anything or if something happens. I've also noticed that my secondary monitor (connected via HDMI → VGA) is now normal. I didn't mention this before as I thought it was the cable or converter, but with my old GPU, the image was a bit cropped on the right - about 20 pixels were just farther right than the monitor displayed. I didn't change the cable nor the converter, but the image's no longer cropped, so that's a good sign.
Offline
Pages: 1