You are not logged in.

#1 2022-11-10 23:03:13

lorber13
Member
Registered: 2022-04-05
Posts: 12

Problem with amdgpu firmware

Hello,
I am having this kernel error on my Lenovo ThinkPad T14s gen 2 amd:

 [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3 

I opened this thread at the Lenovo forums:
https://forums.lenovo.com/t5/Other-Linu … -p/5175183
and this is the answer I got:

I believe you need to update your amdgpu firmware. You have:

[    3.076548] [drm] Loading DMUB firmware via PSP: version=0x0101001F
[    3.103134] [drm] Found VCN firmware Version ENC: 1.17 DEC: 5 VEP: 0 Revision: 2

On my system:

Oct 14 08:07:54 fedora kernel: [drm] Loading DMUB firmware via PSP: version=0x02020013
Oct 14 08:07:54 fedora kernel: [drm] use_doorbell being set to: [true]
Oct 14 08:07:54 fedora kernel: [drm] Found VCN firmware Version ENC: 1.21 DEC: 2 VEP: 0 Revision: 10

You should be able to grab it from linux firmware (https://git.kernel.org/pub/scm/linux/ke … mware.git/)

If I look the output of dmesg, I clearly see that the VCN firmware is out-of-date:

dmesg | grep VCN
[    3.077470] [drm] Found VCN firmware Version ENC: 1.17 DEC: 5 VEP: 0 Revision: 2

I have currently installed the latest "linux-firmware" package from the Arch repo, so why is this VCN firmware still out-of-date?

pacman -Q linux-firmware
linux-firmware 20221012.8b07c1f-1

Offline

#2 2022-11-11 09:09:48

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,768

Re: Problem with amdgpu firmware

I'm loading 1.24 here. Are you running an older kernel with early KMS enabled initramfs image that you haven't rebuilt?

pacman -Q linux
sudo lsinitcpio /boot/initramfs-linux.img | grep vcn
uname -a 

Offline

#3 2022-11-11 16:13:29

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

pacman -Q linux
linux 6.0.7.arch1-1

The second command has no output.

uname -a
Linux ThinkPad-di-Lorenzo 6.0.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 03 Nov 2022 18:01:58 +0000 x86_64 GNU/Linux

Offline

#4 2022-11-11 16:34:38

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,768

Re: Problem with amdgpu firmware

Do we have any prior messages that indicate why loading the firmware would fail? While it's a stretch you can make something appear in the second output and maybe alleviate a potential race cond by enabling https://wiki.archlinux.org/title/Kernel … _KMS_start

FWIW less of a stretch, do you have a parallel Windows installation and did you ensure that fast boot is disabled there?  https://wiki.archlinux.org/title/Dual_b … ibernation a hibernating Windows could easily lead to the firmware being left in an undefined state.

Offline

#5 2022-11-11 20:41:28

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

V1del wrote:

Do we have any prior messages that indicate why loading the firmware would fail?

How can I check this? Should I look into the dmesg?

V1del wrote:

While it's a stretch you can make something appear in the second output and maybe alleviate a potential race cond by enabling https://wiki.archlinux.org/title/Kernel … _KMS_start

Thanks, I will try it.

V1del wrote:

FWIW less of a stretch, do you have a parallel Windows installation and did you ensure that fast boot is disabled there?  https://wiki.archlinux.org/title/Dual_b … ibernation a hibernating Windows could easily lead to the firmware being left in an undefined state.

I do not have Windows installed.

Offline

#6 2022-11-11 21:03:55

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

Should I look into the dmesg?

Or post it, yes.
Since the ringbuffer will overflow and this happens early in the boot, you might want to look at the system journal instead.

@V1del, also a renoir chip?

Offline

#7 2022-11-11 21:20:06

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

V1del wrote:

While it's a stretch you can make something appear in the second output and maybe alleviate a potential race cond by enabling https://wiki.archlinux.org/title/Kernel … _KMS_start

I just enabled it. The error in the journal remains the same, but the output of the command "sudo lsinitcpio /boot/initramfs-linux.img | grep vcn" changes:

sudo lsinitcpio /boot/initramfs-linux.img | grep vcn
usr/lib/firmware/amdgpu/aldebaran_vcn.bin.xz
usr/lib/firmware/amdgpu/arcturus_vcn.bin.xz
usr/lib/firmware/amdgpu/beige_goby_vcn.bin.xz
usr/lib/firmware/amdgpu/dimgrey_cavefish_vcn.bin.xz
usr/lib/firmware/amdgpu/green_sardine_vcn.bin.xz
usr/lib/firmware/amdgpu/navi10_vcn.bin.xz
usr/lib/firmware/amdgpu/navi12_vcn.bin.xz
usr/lib/firmware/amdgpu/navi14_vcn.bin.xz
usr/lib/firmware/amdgpu/navy_flounder_vcn.bin.xz
usr/lib/firmware/amdgpu/picasso_vcn.bin.xz
usr/lib/firmware/amdgpu/raven2_vcn.bin.xz
usr/lib/firmware/amdgpu/raven_vcn.bin.xz
usr/lib/firmware/amdgpu/renoir_vcn.bin.xz
usr/lib/firmware/amdgpu/sienna_cichlid_vcn.bin.xz
usr/lib/firmware/amdgpu/vangogh_vcn.bin.xz
usr/lib/firmware/amdgpu/vcn_3_1_2.bin.xz
usr/lib/firmware/amdgpu/yellow_carp_vcn.bin.xz

But the vcn firmware loaded in the dmesg remains still the same:

dmesg | grep VCN
[    1.643557] [drm] VCN decode is enabled in VM mode
[    1.643558] [drm] VCN encode is enabled in VM mode
[    1.671923] [drm] Found VCN firmware Version ENC: 1.17 DEC: 5 VEP: 0 Revision: 2
[    1.671931] amdgpu 0000:05:00.0: amdgpu: Will use PSP to load VCN firmware
[    2.656048] [drm] VCN decode and encode initialized successfully(under DPG Mode).

Offline

#8 2022-11-11 21:49:15

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

grep VCN

Don't grep.

ftr, https://gitlab.freedesktop.org/drm/amd/-/issues/1887
Is it only the message or are there other symptoms/actual problems along it?

Offline

#9 2022-11-11 22:10:22

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

seth wrote:

Don't grep.

Here is my entire dmesg output:
https://gist.github.com/lorber13/bbd1f8 … 7baf4902aa

seth wrote:

Is it only the message or are there other symptoms/actual problems along it?

I can trigger the error every time I turn off the laptop screen (for example when I want to switch to an external monitor). I asked at the Lenovo forums and they told me that my VCN firmware is out-of-date. This is the only thing I know. Idk why the VCN is out-of-date since I have installed the latest linux-firmware package which should have the newer version of the VCN.

Offline

#10 2022-11-11 22:26:40

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

I can trigger the error every time I turn off the laptop screen

Given https://gitlab.freedesktop.org/drm/amd/ … te_1433226 that's not surprising.

wrt firmware version and

find /usr/lib/firmware/ -iname '*vcn*'

i'm not sure whether there's actually sth. out of date or people are just comparing different FWs for different chips.
Unfortunately the linux-firmware upstream commit messages don't indicate versions and idk how to decode the FW bins (to get to the version tag directly)

Offline

#11 2022-11-12 09:53:27

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

find /usr/lib/firmware/ -iname '*vcn*'
/usr/lib/firmware/amdgpu/raven2_vcn.bin.xz
/usr/lib/firmware/amdgpu/beige_goby_vcn.bin.xz
/usr/lib/firmware/amdgpu/vcn_3_1_2.bin.xz
/usr/lib/firmware/amdgpu/sienna_cichlid_vcn.bin.xz
/usr/lib/firmware/amdgpu/renoir_vcn.bin.xz
/usr/lib/firmware/amdgpu/aldebaran_vcn.bin.xz
/usr/lib/firmware/amdgpu/yellow_carp_vcn.bin.xz
/usr/lib/firmware/amdgpu/dimgrey_cavefish_vcn.bin.xz
/usr/lib/firmware/amdgpu/raven_vcn.bin.xz
/usr/lib/firmware/amdgpu/navi12_vcn.bin.xz
/usr/lib/firmware/amdgpu/green_sardine_vcn.bin.xz
/usr/lib/firmware/amdgpu/navy_flounder_vcn.bin.xz
/usr/lib/firmware/amdgpu/picasso_vcn.bin.xz
/usr/lib/firmware/amdgpu/arcturus_vcn.bin.xz
/usr/lib/firmware/amdgpu/navi14_vcn.bin.xz
/usr/lib/firmware/amdgpu/vangogh_vcn.bin.xz
/usr/lib/firmware/amdgpu/navi10_vcn.bin.xz
seth wrote:

I'm not sure whether there's actually sth. out of date or people are just comparing different FWs for different chips.
Unfortunately the linux-firmware upstream commit messages don't indicate versions and idk how to decode the FW bins (to get to the version tag directly)

Should I try fedora 37 in live and see if it loads the up-to-date firmware for my chip? That would indicate if the problem is related to the out-of-date firmware or to the different chips comparison

Offline

#12 2022-11-12 17:32:25

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

Why should you not?
Even if V1del doesn't use a renoir chip, that's no proof that the current renoir vcn version is spposed to be 1.17 - but if you get that version on F37 as well, it's pretty clear that the version disparity hinges on different chips, thus differen firmwares.

Offline

#13 2022-11-13 09:14:51

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

seth wrote:

Why should you not?
Even if V1del doesn't use a renoir chip, that's no proof that the current renoir vcn version is spposed to be 1.17 - but if you get that version on F37 as well, it's pretty clear that the version disparity hinges on different chips, thus differen firmwares.

I tried Fedora 37 in live, but the live version ships with an outdated version of linux-firmware (the August version). Is there a way to try Fedora with the lastest version of linux-firmware but in live? I do not want to install it.

Offline

#14 2022-11-13 09:30:02

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

Did it say the FW version is 1.17?

Edit: https://git.kernel.org/pub/scm/linux/ke … ir_vcn.bin
Cause renoir_vcn saw it's last update in April.

Last edited by seth (2022-11-13 09:31:01)

Offline

#15 2022-11-16 17:17:26

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

seth wrote:

Did it say the FW version is 1.17?

Edit: https://git.kernel.org/pub/scm/linux/ke … ir_vcn.bin
Cause renoir_vcn saw it's last update in April.

Yes, it says 1.17

Offline

#16 2022-11-16 17:21:27

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,768

Re: Problem with amdgpu firmware

Sienna_sichild here, but I doubt the VCN version is of much interest (that should only be for HW video decoding/encoding) and PSP/DMUB much more relevant. FWIW have you tried a firmware update of your EFI/mainboard vendor? Also if this is just a one off message with no negatively noticeable impact I'd just ignore it for now.

Last edited by V1del (2022-11-16 17:25:30)

Offline

#17 2022-11-16 17:35:53

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

V1del wrote:

Sienna_sichild here, but I doubt the VCN version is of much interest (that should only be for HW video decoding/encoding) and PSP/DMUB much more relevant. FWIW have you tried a firmware update of your EFI/mainboard vendor? Also if this is just a one off message with no negatively noticeable impact I'd just ignore it for now.

I have installed the latest version of the UEFI firmware recently but the error remains the same. I asked for help already in the Lenovo forums because I thought it was a UEFI firmware problem. They told me that the problem is this outdated vcn firmware. This message is related to the external monitor, which doesn’t turn off, and just shows a black screen instead. So it isn’t a journal error only. I think I am going to try fedora 37 and install the latest Linux-firmware package from their repository and see what happens.

Offline

#18 2022-11-16 18:05:43

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,768

Re: Problem with amdgpu firmware

The error message as is comes from the DMUB and not the VCN firmware. Did you follow the link seth posted in #10 and try booting with drm.vblankoffdelay=0 kernel parameter ?

Offline

#19 2022-11-16 18:13:02

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

V1del wrote:

The error message as is comes from the DMUB and not the VCN firmware. Did you follow the link seth posted in #10 and try booting with drm.vblankoffdelay=0 kernel parameter ?

This parameter does not fix the issue for me

Offline

#20 2022-11-16 20:10:20

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

The parameter won't deal w/ "I can trigger the error every time I turn off the laptop screen", that's a different vector to the same symptom.
The OP there gets their log spammed, the linked poster experiences actual "different problems like stuttering and failure to resume from sleep for different cards"

Itr. you didn't answer

seth wrote:

Is it only the message or are there other symptoms/actual problems along it?

Offline

#21 2022-11-16 20:50:54

lorber13
Member
Registered: 2022-04-05
Posts: 12

Re: Problem with amdgpu firmware

seth wrote:

Itr. you didn't answer

seth wrote:

Is it only the message or are there other symptoms/actual problems along it?

I investigated more on what are the effective triggers/problems linked to the journal message.
I will try to explain more what happens because it seems there are two problems that I think are linked between each other.
What I can say with certainty is that:
- the message appears every time I turn off the laptop screen
- when I lock the screen (like when I hit Super key + L on GNOME), the external monitor turns off for a second, then the laptop monitor turns on for a fraction of a second, and then turns off. The external monitor, which was initially turned off when I locked the session, turns on and shows a black screen. The message appears every time this chain of events occur.
The second problem is a bit difficult to understand because there are multiple actors. I think that the message on the journal is triggered by the laptop screen that turns on and off (idk why this happens though), I am not sure that the external monitor showing a black screen is linked to the journal error.
I do not have problems related to sleep, or resuming from sleep.

Offline

#22 2022-11-16 21:29:59

seth
Member
Registered: 2012-09-03
Posts: 51,648

Re: Problem with amdgpu firmware

The message is indicative of the output juggling, but the output juggling is most likely a gnome issue.
Is this gnome on xorg or wayland - and does that matter?

Offline

Board footer

Powered by FluxBB