You are not logged in.
I have a formerly functional RX 570 that won't output after GRUB without
modprobe.blacklist=amdgpu
as a kernel parameter and running
modprobe amdgpu
from tty after boot causes the graphical environment to freeze. SSH is available throughout and can control the system.
It had worked until October of 2023, at which point this issue arose (maybe update related?). It can boot MX linux to graphical environment just fine on this same box, but I have tried other power cables and PSUs on this box with Arch with no success. This leads me to believe it is software related. An RX 560 on this box has no issues, so I can't yet rule out hardware.
journalctl --dmesg --boot -1 --grep "amdgpu" | grep fail
yields
amdgpu 0000:2b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110
amdgpu 0000:2b:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu: probe of 0000:2b:00.0 failed with error -110
a lengthier exploration of
journalctl --dmesg --boot -1 --grep "amdgpu"
yields several repetitions of
WARNING: CPU: 14 PID: 296 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:655 amdgpu_irq_put+0x46/0x70 [amdgpu]
Modules linked in: uas usbhid usb_storage amdgpu(+) drm_ttm_helper ttm video gpu_sched drm_buddy drm_display_helper crc32c_intel cec xhci_pci xhci_pci_renesas wmi nvme nvme_core nvme_common
RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
? amdgpu_irq_put+0x46/0x70 [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
? amdgpu_irq_put+0x46/0x70 [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
? amdgpu_irq_put+0x46/0x70 [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
gmc_v8_0_hw_fini+0x1b/0xa0 [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
amdgpu_device_fini_hw+0x1ce/0x2b0 [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
amdgpu_driver_load_kms.cold+0x54/0x6a [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
amdgpu_pci_probe+0x12b/0x370 [amdgpu 2141defad0ee6bd6288fdd2ba8ce314a51f6ea25]
amdgpu: probe of 0000:2b:00.0 failed with error -110
[drm] amdgpu: ttm finalized
I have tried several manner of
amdgpu.aspm=0 amdgpu.bapm=0 amdgpu.dc=0 amdgpu.runpm=0
and other similar kernel parameters without success, the only thing that seems to provide a functional graphical environment is blacklisting amdgpu. Is the hardware faulty and MX just isn't sensitive to it or is this a software issue that can be resolved, and if software is suspected any pointers would be appreciated. thanks
Last edited by brb78 (2024-02-01 03:27:28)
Offline
Keep in mind the RX 560 (polaris 11/21) and RX 570 (polaris 10/20) are from different families.
This very much sounds like a kernel issue. What kernel version is MX linux using ?
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
I tested an RX 580 on this box which is also polaris 10/20 like the RX 570, that works as well with no issue.
The RX 570 in question also seems to work with Proxmox 7.3-6 running kernel 5.15.85-1-pve.
MX-21.2.1 was used on this box successfully with the RX 570 in question, which I believe ships with kernel 5.10. The Arch kernels I most recently tested were 6.1.65-1-lts and 6.7.2.arch1-1, although this has been ongoing with earlier versions of both dated back to approximately October 26, 2023.
Last edited by brb78 (2024-02-02 15:41:22)
Offline
Then I suggest to try with linux lts 5.15 in AUR .
https://wiki.archlinux.org/title/Arch_User_Repository for info about the AUR and how to use it.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Then I suggest to try with linux lts 5.15 in AUR .
https://wiki.archlinux.org/title/Arch_User_Repository for info about the AUR and how to use it.
I had to downgrade to the RX 560 for the work week, I'll test out that kernel this weekend and provide an update. thanks
Last edited by brb78 (2024-02-08 16:49:34)
Offline
Online
I seem to have run in to an issue with the PGP keys for that kernel
:: (1/1) Parsing SRCINFO: linux-lts515
gpg: error reading key: No public key
pub rsa4096 2011-09-23 [SC]
647F28654894E3BD457199BE38DBBDC86092693E
uid [ unknown] Greg Kroah-Hartman <gregkh@linuxfoundation.org>
uid [ unknown] Greg Kroah-Hartman <gregkh@kernel.org>
uid [ unknown] Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kroah.com>
sub rsa4096 2011-09-23 [E]
:: PGP keys need importing:
-> ABAF11C65A2970B130ABE3C479BE3E4300411886, required by: linux-lts515
:: Import? [Y/n]
:: Importing keys with gpg...
gpg: keyserver receive failed: No data
-> problem importing keys
I've tried editing
~/.gnupg/gpg.conf
to point to a specific alternative keyserver (hkp://pgp.rediris.es), manually importing via
gpg --keyserver hkp://pgp.rediris.es --recv-keys ABAF11C65A2970B130ABE3C479BE3E4300411886
running
pacman-key --refresh-keys
and curling the key via
curl -sS https://keys.openpgp.org/vks/v1/by-fingerprint/ABAF11C65A2970B130ABE3C479BE3E4300411886 | gpg --import
all with no success. Will have to get the kernel installed before I can test the RX 570 against it.
Offline
That's interesting, based on that link and the links in the seventh post of that thread it seems I'm not alone. However, to compound things I'm now successfully running an RX 7600 gpu in this box.
That's an RX 560, RX 580, and RX 7600 all running on my daily driven Arch install on my main box, and an RX 570 that will seemingly run anywhere except my daily driven Arch install - including successfully running on an alternate live OS on my main box.
If I manage to get this PGP issue sorted I can test the kernel-related theory (I'm currently running 6.1.65-1-lts). I have a fair amount of AMD GPUs at my disposal and this one alone seems to not like my main setup, leading me to believe it isn't kernel alone but maybe kernel and "sensitive" hardware. I have two RX 460s and a few other RX 580s I could test in the interim, but it seems the current sample population should have enough architecture overlap with my sole RX 570 that that wouldn't yield much more insight.
Last edited by brb78 (2024-02-08 21:42:40)
Offline
gpg --keyserver hkp://pgp.rediris.es --search-keys 'ABAF11C65A2970B130ABE3C479BE3E4300411886'
gpg --keyserver hkp://pgp.rediris.es --search-keys '647F28654894E3BD457199BE38DBBDC86092693E'
Online
gpg --keyserver hkp://pgp.rediris.es --search-keys 'ABAF11C65A2970B130ABE3C479BE3E4300411886' gpg --keyserver hkp://pgp.rediris.es --search-keys '647F28654894E3BD457199BE38DBBDC86092693E'
I ran the above and it returned
gpg: data source: http://130.206.1.111:11371
(1) Linus Torvalds <torvalds@kernel.org>
Linus Torvalds <torvalds@linux-foundation.org>
2048 bit RSA key 79BE3E4300411886, created: 2011-09-20
Keys 1-1 of 1 for "ABAF11C65A2970B130ABE3C479BE3E4300411886". Enter number(s), N)ext, or Q)uit > 1
gpg: Note: third-party key signatures using the SHA1 algorithm are rejected
gpg: (use option "--allow-weak-key-signatures" to override)
gpg: key 79BE3E4300411886: 150007 signatures not checked due to missing keys
gpg: key 79BE3E4300411886: "Linus Torvalds <torvalds@kernel.org>" not changed
gpg: Total number processed: 1
gpg: unchanged: 1
gpg: data source: http://130.206.1.111:11371
(1) Greg Kroah-Hartman <gregkh@kernel.org>
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kro
4096 bit RSA key 38DBBDC86092693E, created: 2011-09-23
Keys 1-1 of 1 for "647F28654894E3BD457199BE38DBBDC86092693E". Enter number(s), N)ext, or Q)uit > 1
gpg: Note: third-party key signatures using the SHA1 algorithm are rejected
gpg: (use option "--allow-weak-key-signatures" to override)
gpg: key 38DBBDC86092693E: 1 duplicate signature removed
gpg: key 38DBBDC86092693E: 7 signatures not checked due to missing keys
gpg: key 38DBBDC86092693E: "Greg Kroah-Hartman <gregkh@linuxfoundation.org>" not changed
gpg: Total number processed: 1
gpg: unchanged: 1
I'm not particularly PGP adept, but it seems both keys were already present, yet I am still prompted
:: PGP keys need importing:
-> ABAF11C65A2970B130ABE3C479BE3E4300411886, required by: linux-lts515
:: Import? [Y/n]
and rejected with
gpg: keyserver receive failed: No data
-> problem importing keys
perhaps a reboot is in order, I'll try again to build the 5.15lts kernel after that unless there are any other suggestions to aid my PGP woes. thanks all
Offline
You'll still have to import them
gpg --keyserver hkp://pgp.rediris.es --receive-keys 'ABAF11C65A2970B130ABE3C479BE3E4300411886'
gpg --keyserver hkp://pgp.rediris.es --receive-keys '647F28654894E3BD457199BE38DBBDC86092693E'
Online
You'll still have to import them
gpg --keyserver hkp://pgp.rediris.es --receive-keys 'ABAF11C65A2970B130ABE3C479BE3E4300411886' gpg --keyserver hkp://pgp.rediris.es --receive-keys '647F28654894E3BD457199BE38DBBDC86092693E'
shouldn't one of the steps tried in post 7 such as
curl -sS https://keys.openpgp.org/vks/v1/by-fingerprint/ABAF11C65A2970B130ABE3C479BE3E4300411886 | gpg --import
have done that, or am I missing something basic with working with keys?
ls -la ~/.gnupg
total 80K
-rw------- 1 brian video 600 Jan 26 2022 random_seed
drwx------ 4 brian video 4.0K Feb 9 03:15 .
drwx--x---+ 87 brian video 4.0K Feb 9 02:54 ..
-rw-r--r-- 1 brian users 21 Feb 9 03:15 .#lk0x000055c2b0d26a70.r7-3800xt.3955829
-rw-r--r-- 1 brian users 0 Feb 8 22:08 gpg.conf
drwx------ 2 brian video 4.0K Feb 18 2020 crls.d
drwx------ 2 brian video 4.0K Feb 18 2020 private-keys-v1.d
-rw------- 1 brian video 1.2K Oct 14 2021 trustdb.gpg
-rw-r--r-- 1 brian users 27K Feb 8 15:27 pubring.kbx
-rw-r--r-- 1 brian users 23K Feb 8 14:59 pubring.kbx~
leads me to believe the permissions are correct, not sure what I'm missing
Last edited by brb78 (2024-02-09 08:27:37)
Offline
gpg -k greg
gpg -k linus
Are you building in a clean chroot?
You could also change the keyserver in ~/.gnupg/gpg.conf and ~/.gnupg/dirmngr.conf
Online
In the interest of getting this tested, I passed
--mflags "--skippgpcheck"
to
yay
I booted into linux-5.15-lts, and on
startx
it returned something along the lines of "error: no screens detected" and failed to start the xserver. This was with the previously working GPUs still installed, have not had the opportunity to test the RX 570 against linux-5.15-lts. Probably need to sort out the xserver error before that's productive.
Last edited by brb78 (2024-02-09 16:41:32)
Offline
If you need help, please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
Online