You are not logged in.
I booted Arch a few minutes ago and the console was displaying in a lower resolution than usual. I then ran startx and it failed with the error "no screens found". The /dev/dri directory did not exist. The outputs of "dmesg --level=warn,err" and "lspci -k" seemed normal at a quick glance. My xorg.conf.d is empty.
After switching back to radeon by removing the options files from /etc/modprobe.d, the console and startx work as usual. Has anyone else experienced this breakage? My GPU is an AMD R9 280X.
Besides enabling amdgpu, I have not touched anything in the system directories.
I am posting this in a hurry now, will update with logs later. But I think it will be good for now in case someone else searches for this issue.
Thanks.
Last edited by triantad (2022-01-13 08:37:35)
Offline
Could also be just a case of early KMS. If you do adjustment only off of a single reproducer then you haven't really verified much. To lower the chance of running into early KMS issues make sure you do https://wiki.archlinux.org/title/Kernel … _KMS_start with the amdgpu module.
Offline
Similar here. Having R9 280x and RX 580. Everything works perfect on 5.15, but after 5.16 my 280x cant load amdgpu driver.
Have "radeon.si_support=0 amdgpu.si_support=1" set in GRUB_CMDLINE_LINUX_DEFAULT
lspci -nnk shows this on 5.16
27:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X] [1002:6798]
Subsystem: Tul Corporation / PowerColor Device [148c:3001]
Kernel modules: radeon, amdgpu
...
28:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB [1da2:e353]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Should have "Kernel driver in use: amdgpu" on 27:00.0 too but seems broken on 5.16
Switched to 5.15.14-1-lts now and works fine
Offline
Checked dmesg for errors (... and not the filter for error level kind, many, many problems are logged at the normal log level)? This "hack" was always experimental and it's possible they decided to remove support for it now, but would've expected to read more about this if it was intentional, and at least config wise support should still be around: https://github.com/archlinux/svntogit-p … 6316-L6318
Last edited by V1del (2022-01-13 13:19:36)
Offline
V1del's suggestion to include the module in /etc/mkinitcpio.conf is a good one and may solve your problem. I booted into 5.16 with my Radeon RX 560D without issues. Remember to rebuild your image after modifying that file.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
dmesg about my 280x and amdgpu below
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=3d4e69bf-6a08-4dbd-a0db-0a943649bb84 rw rootflags=subvol=@ resume=UUID=8b0d1430-cdcc-40f0-844b-16d45585ecdb loglevel=3 quiet amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc=1 amdgpu.runpm=1 pcie_acs_override=downstream,multifunction
[ 0.041414] Kernel command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=3d4e69bf-6a08-4dbd-a0db-0a943649bb84 rw rootflags=subvol=@ resume=UUID=8b0d1430-cdcc-40f0-844b-16d45585ecdb loglevel=3 quiet amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc=1 amdgpu.runpm=1 pcie_acs_override=downstream,multifunction
[ 0.311733] pci 0000:27:00.0: [1002:6798] type 00 class 0x030000
[ 0.311744] pci 0000:27:00.0: reg 0x10: [mem 0xe0000000-0xefffffff 64bit pref]
[ 0.311751] pci 0000:27:00.0: reg 0x18: [mem 0xfce00000-0xfce3ffff 64bit]
[ 0.311756] pci 0000:27:00.0: reg 0x20: [io 0xf000-0xf0ff]
[ 0.311765] pci 0000:27:00.0: reg 0x30: [mem 0xfce40000-0xfce5ffff pref]
[ 0.311778] pci 0000:27:00.0: BAR 0: assigned to efifb
[ 0.311811] pci 0000:27:00.0: supports D1 D2
[ 0.311812] pci 0000:27:00.0: PME# supported from D1 D2 D3hot
[ 0.311851] pci 0000:27:00.0: 63.008 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x8 link at 0000:00:03.1 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)
[ 0.311883] pci 0000:27:00.1: [1002:aaa0] type 00 class 0x040300
[ 0.311893] pci 0000:27:00.1: reg 0x10: [mem 0xfce60000-0xfce63fff 64bit]
[ 0.311939] pci 0000:27:00.1: supports D1 D2
[ 0.314647] pci 0000:27:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 0.314647] pci 0000:27:00.0: vgaarb: bridge control possible
[ 0.314647] pci 0000:27:00.0: vgaarb: setting as boot device
[ 0.339588] pci 0000:27:00.1: D0 power state depends on 0000:27:00.0
[ 0.340178] pci 0000:27:00.0: Adding to iommu group 16
[ 0.340189] pci 0000:27:00.1: Adding to iommu group 16
[ 4.843482] [drm] amdgpu kernel modesetting enabled.
[ 4.843540] amdgpu: Ignoring ACPI CRAT on non-APU system
[ 4.843543] amdgpu: Virtual CRAT table created for CPU
[ 4.843550] amdgpu: Topology: Add CPU node
[ 4.843660] amdgpu 0000:28:00.0: enabling device (0006 -> 0007)
[ 4.843727] amdgpu 0000:28:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 4.843761] amdgpu 0000:28:00.0: amdgpu: Fetched VBIOS from VFCT
[ 4.843763] amdgpu: ATOM BIOS: 113-4E353CU-O4B
[ 4.843829] amdgpu 0000:28:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 4.843831] amdgpu 0000:28:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 4.843854] [drm] amdgpu: 4096M of VRAM memory ready
[ 4.843855] [drm] amdgpu: 4096M of GTT memory ready.
[ 4.846272] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[ 5.290796] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 5.290874] amdgpu: SRAT table not found
[ 5.290875] amdgpu: Virtual CRAT table created for GPU
[ 5.290936] amdgpu: Topology: Add dGPU node [0x67df:0x1002]
[ 5.290940] kfd kfd: amdgpu: added device 1002:67df
[ 5.290953] amdgpu 0000:28:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 5.291069] amdgpu 0000:28:00.0: [drm] Cannot find any crtc or sizes
[ 5.294313] amdgpu 0000:28:00.0: amdgpu: Using BACO for runtime pm
[ 5.294574] [drm] Initialized amdgpu 3.44.0 20150101 for 0000:28:00.0 on minor 0
[ 6.410807] radeon 0000:27:00.0: SI support disabled by module param
[ 6.553163] snd_hda_intel 0000:27:00.1: enabling device (0000 -> 0002)
[ 6.553261] snd_hda_intel 0000:27:00.1: Force to non-snoop mode
[ 6.667103] snd_hda_intel 0000:28:00.1: bound 0000:28:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 6.668985] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.1/0000:27:00.1/sound/card0/input3
[ 6.669033] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.1/0000:27:00.1/sound/card0/input4
[ 6.669103] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.1/0000:27:00.1/sound/card0/input5
[ 6.669151] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.1/0000:27:00.1/sound/card0/input6
[ 6.669193] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.1/0000:27:00.1/sound/card0/input7
[ 6.669251] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.1/0000:27:00.1/sound/card0/input8
[ 27.572813] amdgpu 0000:28:00.0: [drm] Cannot find any crtc or sizes
full log here if it's helpful https://pastebin.com/ms34Mj8a
and I did have amdgpu in mkinitcpio.conf like "MODULES=(amdgpu)"
Offline
V1del's suggestion did not help (added amdgpu to MODULES=(), rebuilt the image, enabled amdgpu support in modprobe.d). In any case, with amdgpu I get this line in the journal which is also present in xzsk2's log:
kernel: [drm] Unsupported asic. Remove me when IP discovery init is in place.
Link to the full log: https://dpaste.org/bPHZ
Last edited by triantad (2022-01-13 17:23:07)
Offline
@triantad the GPU is the system is a 1002:679 Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X?
If so the amdgpu loaded but no options were specified so you got the error documented. I believe you need to enable AMDGPU#Enable_Southern_Islands_(SI)_and_Sea_Islands_(CIK)_support or use the radeon module.
Offline
Note that if you did it in the order you just specified then your modprobe adjustment will not be in the initramfs, you need to generate the image after setting up all the modprobe files to ensure the logic is relevantly applied/carried over.
Offline
@V1del: I tried again for good measure, leaving the image generation last, with the same result. modconf *is* in HOOKS=() by default if it matters in this case (as mentioned in the Wiki), and I also tried putting *both* amdgpu and radeon in MODULES=() in this order.
@loqs: Yes this is the GPU. I have followed the instructions in this Wiki page in the past successfully, even with just the modprobe.d files and a reboot.
There seem to be some suspicious changes in this diff (not that I know enough to make much sense out of it) if anyone wants to bother taking a look:
https://scm.linefinity.com/common/linux … hitespace=
Thank you for the suggestions. I will stay with linux-lts for now, since it works.
P.S.: This was a nice opportunity to join the forums of Arch Linux, the distro which I have been using for the last few years and which I absolutely love. It just feels nice and chill.
Last edited by triantad (2022-01-13 21:21:00)
Offline
Could you please post the contents of /etc/modprobe.d file that is setting the options for amdgpu and radeon modules.
From the console of 5.16 does the output of
systool -v -m amdgpu -m radeon
show si_support=0 for radeon and 1 for amdgpu?
Edit:
https://github.com/torvalds/linux/commi … d1f5262803 is the commit that introduced the error you encountering.
Edit2:
Ah I believe the issue is CHIP_TAHITI = 0 so flags = ent->driver_data is 0 so flags == 0, I think the test should be changed to flags == CHIP_IP_DISCOVERY.
Can you test with the code changed and report the issue to https://gitlab.freedesktop.org/drm/amd/-/issues ?
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 99370bdd8c5b..a723f3e68f92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1928,7 +1928,7 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
return -ENODEV;
}
- if (flags == 0) {
+ if (flags == CHIP_IP_DISCOVERY) {
DRM_INFO("Unsupported asic. Remove me when IP discovery init is in place.\n");
return -ENODEV;
}
Last edited by loqs (2022-01-13 23:50:58)
Offline
/etc/modprobe.d/amdgpu.conf:
options amdgpu si_support=1
options amdgpu cik_support=1
/etc/modprobe.d/radeon.conf:
options radeon si_support=0
options radeon cik_support=0
With *both* 5.16 and the LTS,
systool -v -m amdgpu -m radeon
shows si_support = "0".
EDIT: Again with both 5.16 and LTS and without the modprobe.d files, the above command shows si_support="1".
With 5.16, lspci -k does not print a "kernel module in use" line but only a "kernel modules: radeon, amdgpu" line, but
in the lts it also prints "kernel module in use: amdgpu"
EDIT: typo in radeon.conf
Last edited by triantad (2022-01-13 22:41:11)
Offline
See my edits to post #11 I think it is looking like a bug limited to Tahiti based cards.
Offline
/etc/modprobe.d/radeon.conf disables amdgpu again as well.
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
/etc/modprobe.d/radeon.conf disables amdgpu again as well.
Sorry, this was a typo, I corrected my post.
@loqs: Allright, I will try the code a bit later or tomorrow.
Last edited by triantad (2022-01-13 22:36:54)
Offline
linux 5.16 with change applied, pkgrel 1.1 to seperate it from official packages:
https://drive.google.com/file/d/1Ubg_Hf … sp=sharing linux-5.16.arch1-1.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/13D6mZ1 … sp=sharing linux-headers-5.16.arch1-1.1-x86_64.pkg.tar.zst
Offline
I should've checked the forums before I spent all day bisecting the kernel.
loqs is correct, it's eb4fd29afd4a that broke it and the culprit is flags == 0 is actually a check for flags == CHIP_TAHITI. I guess nobody in the testing loop was still rocking a 10 year old GPU like we all are. Changing the conditional to check for CHIP_IP_DISCOVERY fixes the bug for me.
Offline
Thank you for confirming the source of the issue and the potential fix. Someone has opened an upstream bug report for the issue https://gitlab.freedesktop.org/drm/amd/-/issues/1860
Edit:
https://gitlab.freedesktop.org/agd5f/li … fe26f6403c
Last edited by loqs (2022-01-15 15:01:15)
Offline
loqs was right indeed!
Last edited by triantad (2022-01-15 23:03:01)
Offline