You are not logged in.

#1 2022-09-28 14:26:46

tinux
Member
Registered: 2019-05-10
Posts: 6

[amdgpu] issues in X

Something happened in the last few day, probably in an update, that completely messes up X.

Depending on how often I reboot, I have slightly different issues but it is only happening on my AMD Radeon VII (I also tried a Radeon Pro VII shortly, same issue), but not on an nvidia GPU.

Among the problems are

- no visible mouse cursor
- no fonts in alacritty (also selecting doesn't reveal the text)
- other terminals (xfce4-terminal, gnome-terminal, rxvt-unicode) look normal, though
- about 50% of the letters are randomly missing in, eg. the rofi menu, or in the app drawer of gnome, and other programs
- etc.

I checked `dmesg` and found this:

$ sudo dmesg | grep -i amdgpu
[sudo] password for tinux: 
[    4.338939] [drm] amdgpu kernel modesetting enabled.
[    4.345004] amdgpu: Ignoring ACPI CRAT on non-APU system
[    4.345007] amdgpu: Virtual CRAT table created for CPU
[    4.345013] amdgpu: Topology: Add CPU node
[    4.345136] amdgpu 0000:0b:00.0: vgaarb: deactivate vga console
[    4.345173] amdgpu 0000:0b:00.0: enabling device (0006 -> 0007)
[    4.345456] amdgpu 0000:0b:00.0: amdgpu: Fetched VBIOS from VFCT
[    4.345458] amdgpu: ATOM BIOS: 113-D3600200-106
[    4.347703] amdgpu 0000:0b:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    4.347732] amdgpu 0000:0b:00.0: amdgpu: MEM ECC is not presented.
[    4.347732] amdgpu 0000:0b:00.0: amdgpu: SRAM ECC is not presented.
[    4.347737] amdgpu 0000:0b:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[4] ras_mask[4]
[    4.347748] amdgpu 0000:0b:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    4.347750] amdgpu 0000:0b:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    4.347751] amdgpu 0000:0b:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    4.347798] [drm] amdgpu: 16368M of VRAM memory ready
[    4.347799] [drm] amdgpu: 16011M of GTT memory ready.
[    4.356958] amdgpu 0000:0b:00.0: amdgpu: PSP runtime database doesn't exist
[    4.356961] amdgpu 0000:0b:00.0: amdgpu: PSP runtime database doesn't exist
[    4.356991] amdgpu: hwmgr_sw_init smu backed is vega20_smu
[    4.549477] amdgpu 0000:0b:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available
[    4.549479] amdgpu 0000:0b:00.0: amdgpu: DTM: optional dtm ta ucode is not available
[    4.549479] amdgpu 0000:0b:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    4.549480] amdgpu 0000:0b:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    5.021488] [drm:smu_v11_0_i2c_xfer.cold [amdgpu]] *ERROR* Received I2C_NAK_7B_ADDR_NOACK !!!
[    5.021724] [drm:smu_v11_0_i2c_xfer [amdgpu]] *ERROR* WriteI2CData() - I2C error occurred :1
[    5.021925] [drm:amdgpu_ras_eeprom_init [amdgpu]] *ERROR* Failed to read EEPROM table header, res:-5
[    5.022101] amdgpu 0000:0b:00.0: amdgpu: Failed to initialize ras recovery! (-5)
[    5.023810] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    5.023934] amdgpu: sdma_bitmap: ffff
[    5.086722] amdgpu: HMM registered 16368MB device memory
[    5.086766] amdgpu: SRAT table not found
[    5.086766] amdgpu: Virtual CRAT table created for GPU
[    5.086887] amdgpu: Topology: Add dGPU node [0x66af:0x1002]
[    5.086892] kfd kfd: amdgpu: added device 1002:66af
[    5.086909] amdgpu 0000:0b:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 16, active_cu_number 60
[    5.086981] amdgpu 0000:0b:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[    5.086982] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    5.086983] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    5.086984] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    5.086985] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    5.086985] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    5.086986] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    5.086987] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    5.086988] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    5.086989] amdgpu 0000:0b:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    5.086989] amdgpu 0000:0b:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
[    5.086990] amdgpu 0000:0b:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 1
[    5.086991] amdgpu 0000:0b:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 1
[    5.086992] amdgpu 0000:0b:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1
[    5.086993] amdgpu 0000:0b:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 1
[    5.086993] amdgpu 0000:0b:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[    5.086994] amdgpu 0000:0b:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[    5.086995] amdgpu 0000:0b:00.0: amdgpu: ring uvd_1 uses VM inv eng 9 on hub 1
[    5.086996] amdgpu 0000:0b:00.0: amdgpu: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
[    5.086996] amdgpu 0000:0b:00.0: amdgpu: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
[    5.086997] amdgpu 0000:0b:00.0: amdgpu: ring vce0 uses VM inv eng 12 on hub 1
[    5.086998] amdgpu 0000:0b:00.0: amdgpu: ring vce1 uses VM inv eng 13 on hub 1
[    5.086999] amdgpu 0000:0b:00.0: amdgpu: ring vce2 uses VM inv eng 14 on hub 1
[    5.096745] amdgpu: Detected AMDGPU DF Counters. # of Counters = 8.
[    5.096756] amdgpu: Detected AMDGPU 2 Perf Events.
[    5.097286] [drm] Initialized amdgpu 3.47.0 20150101 for 0000:0b:00.0 on minor 0
[    5.105698] fbcon: amdgpudrmfb (fb0) is primary device
[    5.289246] amdgpu 0000:0b:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   19.859947] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

There are some obvious errors (no idea if related though), but it seems downgrading kernel or amdgpu does not seem to solve the issue.

I tried to search the internet and the forum and already tried to

- disable power management (`amdgpu.dpm=0`)
- reinstall amd-ucode
- setting a bunch of options in 20-amdgpu.conf for X (Option "SWCursor" "True")

But nothing helped.

So, I'm lost what to try next to resolve the issue. Currently, the PC is basically unusable, because X is really messed up.

Any help is highly appreciated.

Offline

#2 2022-09-28 14:37:28

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,668

Re: [amdgpu] issues in X

Are you running testing and hence mesa 22.2.0 instead of the current 22.1.7 ? there are some known issues there: https://bugs.archlinux.org/task/76019


Moving to the testing sub forum.

Last edited by V1del (2022-09-28 14:38:58)

Online

#3 2022-09-28 16:42:54

headkase
Member
Registered: 2011-12-06
Posts: 1,976

Re: [amdgpu] issues in X

Regarding Mesa 22.2, see: Here.  Specifically:

As part of today's Mesa 22.2 release over 22.2-rc3, there are nearly 150 patches back-ported from Mesa 22.3/Git to the 22.2 series for bug fixing. Normally this would have led to another release candidate, but it seems Mesa 22.2.0 was just kicked out the door to get the release out there. So if you are particularly concerned about stability/bugs, this may definitely be a release where it's worth waiting for Mesa 22.2.1 until its tires have been kicked a bit more.

My reading of that is that the Mesa team fell behind schedule, so 22.2 was just kicked out the door so 22.3 could become the focus.  What this means for Arch is that 22.2.0 in Testing is really "testing" this time and not the usual "fine enough".  Expect Mesa 22.2 to be in better shape with the 22.2.1 point release.

Offline

#4 2022-09-29 07:34:59

tinux
Member
Registered: 2019-05-10
Posts: 6

Re: [amdgpu] issues in X

@V1del @headkase Thank you for your answers. I was indeed on testing, and reverting mesa did solve the issue. I was so fixated on ampgpu, that mesa totally slipped from my mind. I guess those errors in dmesg didn't help...

In any case, thanks a lot!

Offline

#5 2022-09-29 11:25:52

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,668

Re: [amdgpu] issues in X

Note that a new release was made disabling LTO with the intention of fixing this so you might try the updated testing version again.

Last edited by V1del (2022-09-29 11:26:12)

Online

Board footer

Powered by FluxBB