You are not logged in.
Pages: 1
I am using the amdgpu driver for my gpu:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev ef)
Subsystem: Gigabyte Technology Co., Ltd Radeon RX 570 Gaming 4G
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Randomly when I boot (about 2/3 of the time) starting X cannot connect to the X server. Looking through the dmesg logs I find the following:
$ grep -B6 "ERROR" dmesg.bad
[ 3.679927] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe400 flags=0x0050]
[ 3.679933] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe580 flags=0x0050]
[ 3.679936] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe540 flags=0x0050]
[ 3.679941] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe480 flags=0x0050]
[ 3.679946] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe5c0 flags=0x0050]
[ 3.679950] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0xfffbe4c0 flags=0x0050]
[ 4.693376] amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
[ 4.693422] [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110).
When I get lucky and the system behaves, neither the page faults nor the IB tests errors appear, so I think they must be related. I can confirm that this issue existed before Linux 5.0 (I had hoped the release would fix it because it adds support for amd gpus, but no luck). I have also had the same behavior with and without early KMS.
Last edited by natervance (2019-03-16 15:00:43)
Offline
Which motherboard / processor ?
Have you configured microcode updating ?
Post full dmesg please (both failed and succesfull) .
you may want to use a pastebin client
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
Here is the bad dmesg: http://dpaste.com/2SMCCWY
And the good one: http://dpaste.com/1JN3VGB
My motherboard is a Gigabyte:
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: F2A88XM-D3H
Version: x.x
Serial Number: To be filled by O.E.M.
Asset Tag: To be filled by O.E.M.
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: To be filled by O.E.M.
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
EDIT: I also do have microcode updating with amd-ucode. I boot using the following grub menu entry:
menuentry 'Arch Linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-/dev/sda3' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_gpt
insmod fat
set root='hd0,gpt2'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-ieee1275='ieee1275//disk@0,gpt2' --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2 3EB6-26DF
else
search --no-floppy --fs-uuid --set=root 3EB6-26DF
fi
echo 'Loading Linux linux ...'
linux /vmlinuz-linux root=/dev/sda3 rw quiet
echo 'Loading initial ramdisk ...'
initrd /amd-ucode.img /initramfs-linux.img
}
Last edited by natervance (2019-03-15 15:16:52)
Offline
Nothing really stands out in the logs , have to try to narrow down potential causes.
Are you using early KMS start or late KMS start ? see https://wiki.archlinux.org/index.php/Ke … de_setting
Does behaviour improve if you switch to the other method ?
Your Motherboard has several revisions, which revision do you have and what version is your bios/efi firmware ?
https://www.gigabyte.com/Motherboard/GA … -rev-31#ov
A cold boot is a boot where the system has been powered down completely (disconnect it from power outlet ).
A warm boot is one where the system has shut down but stays connected to power outlet (a reboot is like this).
Is there a difference in fail rate between cold / warm boots ?
Note : no need to start x to determine fail or success.
boot to multi-user target (see systemd wiki page) , login to console and run
dmesg | grep -i amd-vi
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
My motherboard is revision 3.0 (the earlier revision), and had the F5 bios. I updated to the latest version (F10a) this morning following your line of questioning, and everything appears to be resolved!
For posterity sake, I initially had late KMS and switched to early KMS while trying to debug this issue (I am currently on early KMS). Cold vs warm booting didn't seem to make much of a difference, but I had a fairly small sample size. But the motherboard firmware update is what did it! Thanks, I never would have guessed. I'll mark as resolved.
Offline
Pages: 1