You are not logged in.

#1 2019-03-15 14:02:56

natervance
Member
Registered: 2017-04-20
Posts: 52

[SOLVED] amdgpu IB test failed on gfx

I am using the amdgpu driver for my gpu:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev ef)
	Subsystem: Gigabyte Technology Co., Ltd Radeon RX 570 Gaming 4G
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

Randomly when I boot (about 2/3 of the time) starting X cannot connect to the X server. Looking through the dmesg logs I find the following:

$ grep -B6 "ERROR" dmesg.bad 
[    3.679927] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe400 flags=0x0050]
[    3.679933] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe580 flags=0x0050]
[    3.679936] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe540 flags=0x0050]
[    3.679941] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe480 flags=0x0050]
[    3.679946] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe5c0 flags=0x0050]
[    3.679950] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe4c0 flags=0x0050]
[    4.693376] amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
[    4.693422] [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110).

When I get lucky and the system behaves, neither the page faults nor the IB tests errors appear, so I think they must be related. I can confirm that this issue existed before Linux 5.0 (I had hoped the release would fix it because it adds support for amd gpus, but no luck). I have also had the same behavior with and without early KMS.

Last edited by natervance (2019-03-16 15:00:43)

Offline

#2 2019-03-15 14:53:41

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: [SOLVED] amdgpu IB test failed on gfx

Which motherboard / processor ?

Have you configured microcode updating ?

Post full dmesg please (both failed and  succesfull) .
you may want to use a pastebin client


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#3 2019-03-15 15:15:05

natervance
Member
Registered: 2017-04-20
Posts: 52

Re: [SOLVED] amdgpu IB test failed on gfx

Here is the bad dmesg: http://dpaste.com/2SMCCWY
And the good one: http://dpaste.com/1JN3VGB

My motherboard is a Gigabyte:

# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
	Manufacturer: Gigabyte Technology Co., Ltd.
	Product Name: F2A88XM-D3H
	Version: x.x
	Serial Number: To be filled by O.E.M.
	Asset Tag: To be filled by O.E.M.
	Features:
		Board is a hosting board
		Board is replaceable
	Location In Chassis: To be filled by O.E.M.
	Chassis Handle: 0x0003
	Type: Motherboard
	Contained Object Handles: 0

EDIT: I also do have microcode updating with amd-ucode. I boot using the following grub menu entry:

menuentry 'Arch Linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-/dev/sda3' {
	load_video
	set gfxpayload=keep
	insmod gzio
	insmod part_gpt
	insmod fat
	set root='hd0,gpt2'
	if [ x$feature_platform_search_hint = xy ]; then
	  search --no-floppy --fs-uuid --set=root --hint-ieee1275='ieee1275//disk@0,gpt2' --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2  3EB6-26DF
	else
	  search --no-floppy --fs-uuid --set=root 3EB6-26DF
	fi
	echo	'Loading Linux linux ...'
	linux	/vmlinuz-linux root=/dev/sda3 rw  quiet
	echo	'Loading initial ramdisk ...'
	initrd	/amd-ucode.img /initramfs-linux.img
}

Last edited by natervance (2019-03-15 15:16:52)

Offline

#4 2019-03-16 12:08:45

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: [SOLVED] amdgpu IB test failed on gfx

Nothing really stands out in the logs , have to try to narrow down potential causes.

Are you using early KMS start or late KMS start ? see https://wiki.archlinux.org/index.php/Ke … de_setting
Does behaviour improve if you switch to the other method ?

Your Motherboard has several revisions, which revision do you have and what version is your bios/efi firmware ?
https://www.gigabyte.com/Motherboard/GA … -rev-31#ov

A cold boot is a boot where the system has been powered down completely (disconnect it from power outlet ).
A warm boot is one where the system has shut down but stays connected to power outlet (a reboot is like this).
Is there a difference in fail rate between cold / warm boots ?

Note : no need to start x to determine fail or success.
boot to multi-user target (see systemd wiki page) , login to console and run

dmesg | grep -i amd-vi

Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#5 2019-03-16 15:00:28

natervance
Member
Registered: 2017-04-20
Posts: 52

Re: [SOLVED] amdgpu IB test failed on gfx

My motherboard is revision 3.0 (the earlier revision), and had the F5 bios. I updated to the latest version (F10a) this morning following your line of questioning, and everything appears to be resolved!

For posterity sake, I initially had late KMS and switched to early KMS while trying to debug this issue (I am currently on early KMS). Cold vs warm booting didn't seem to make much of a difference, but I had a fairly small sample size. But the motherboard firmware update is what did it! Thanks, I never would have guessed. I'll mark as resolved.

Offline

Board footer

Powered by FluxBB