You are not logged in.

#1 2018-06-25 18:29:09

ghawk1ns
Member
Registered: 2018-06-23
Posts: 4

amdgpu causing kernel panic on boot

Hey all,

Just got through a fresh arch install and am running into some trouble on about 50% of my boots. On boot, right after after the shell asks for a login, the kernel panics. Looking through the logs, I see amdgpu errors showing up in the sys trace.

edit-0: kernel param `modprobe.blacklist=amdgpu` stops the crashing, so I am certain amdgpu is the issue. Now to figure out how I can get a proper working gpu driver.

edit-1: I tried setting 'amdgpu.dc=0' which resulted in the screen freezing at login 100% of the time but shell still accepted keyboard input. I could login / restart the machine, etch, but there was no output on the monitor.

edit-2: I tried setting `mem_encrypt=off amd_iommu=off based off of this issue on freedesktop with Raven ridge failing to start but results were the same as before

edit-3: After looking around the net, it looks a like few others are having issues with Raven Ridge APUs and amdgpu, though the theme seems to be the drivers and kernel do not fully support Raven Ridge yet sad

Does anyone have an idea as to what's going on? Thanks!


journalctl logs from boot to crash


kernel: CPU: 2 PID: 331 Comm: systemd-udevd Not tainted 4.17.2-1-ARCH #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. AB350M-DS3H/AB350M-DS3H-CF, BIOS F23d 04/17/2018
kernel: RIP: 0010:prefetch_freepointer.isra.17+0xf/0x20
kernel: RSP: 0018:ffffa34602483930 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: f1826c313e9ce59c RCX: 000000000008d402
kernel: RDX: f1826c313e9ce59c RSI: ffff8893de806fb0 RDI: ffff8893de806ea0
kernel: RBP: ffff8893cc05e800 R08: 00000000000000ff R09: ffff8893cc054800
kernel: R10: ffff8893cc054800 R11: ffff8893cc052497 R12: 00000000014080c0
kernel: R13: 0000000000000288 R14: ffff8893de806e80 R15: ffff8893de806e80
kernel: FS:  00007f82df1d0d40(0000) GS:ffff8893dec80000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00005594a6d9d688 CR3: 000000040bf50000 CR4: 00000000003406e0
kernel: Call Trace:
kernel:  kmem_cache_alloc_trace+0xbd/0x1d0
kernel:  ? dm_hw_init.cold.28+0x239/0xb5a [amdgpu]
kernel:  dm_hw_init.cold.28+0x239/0xb5a [amdgpu]
kernel:  ? printk+0x58/0x6f
kernel:  amdgpu_device_init.cold.14+0x1001/0x117b [amdgpu]
kernel:  amdgpu_driver_load_kms+0x86/0x2c0 [amdgpu]
kernel:  drm_dev_register+0x129/0x160 [drm]
kernel:  amdgpu_pci_probe+0x13c/0x1c0 [amdgpu]
kernel:  ? _raw_spin_unlock_irqrestore+0x20/0x40
kernel:  local_pci_probe+0x41/0x90
kernel:  pci_device_probe+0x189/0x1a0
kernel:  driver_probe_device+0x2b9/0x460
kernel:  __driver_attach+0xb6/0xe0
kernel:  ? driver_probe_device+0x460/0x460
kernel:  bus_for_each_dev+0x76/0xc0
kernel:  bus_add_driver+0x152/0x230
kernel:  ? 0xffffffffc1100000
kernel:  driver_register+0x6b/0xb0
kernel:  ? 0xffffffffc1100000
kernel:  do_one_initcall+0x46/0x1f5
kernel:  ? kmem_cache_alloc_trace+0x181/0x1d0
kernel:  ? do_init_module+0x22/0x210
kernel:  do_init_module+0x5a/0x210
kernel:  load_module+0x247a/0x29f0
kernel:  ? vmap_page_range_noflush+0x276/0x350
kernel:  ? __se_sys_init_module+0x10c/0x170
kernel:  __se_sys_init_module+0x10c/0x170
kernel:  do_syscall_64+0x5b/0x170
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7f82deae1f3a
kernel: RSP: 002b:00007ffcece40708 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
kernel: RAX: ffffffffffffffda RBX: 000055befcf7ccc0 RCX: 00007f82deae1f3a
kernel: RDX: 00007f82de393ecd RSI: 000000000057c4b0 RDI: 000055befd94a100
kernel: RBP: 00007f82de393ecd R08: 0000000000000006 R09: 0000000000000005
kernel: R10: 000055befcf67010 R11: 0000000000000246 R12: 000055befd94a100
kernel: R13: 000055befcf834f0 R14: 0000000000020000 R15: 000055befcf7ccc0
kernel: Code: 8c de 10 03 00 00 31 c9 e9 21 fe ff ff e8 7a ac e5 ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 d2 74 0e 8b 07 48 01 c2 <48> 33 12 48 33 16 0f 18 0a c3 0f 1f 80 00 00 00 00 0f 1f 44 00 
kernel: RIP: prefetch_freepointer.isra.17+0xf/0x20 RSP: ffffa34602483930
kernel: ---[ end trace d6b637a2a65cd49a ]---

hardware

* MB - Gigabyte AB350M-DS3H (latest bios)
* CPU - AMD Ryzen 3 2200G Processor with Radeon Vega 8 Graphics
* SSD - Samsung PCIe NVMe - M.2 Internal SSD (MZ-V6E250BW)
* MEM - DDR4 DRAM 3000MHz 
* PCIe - ASUS PCE-AC55BT Wireless card
* USB - mouse/keyboard 

I verified vendor compatibility of everything on Gigabyte's support page


Here are the steps I've taken:

# fdisk /dev/nvme0n1
	* 1 550MB EFI partition
	* 2 100% Linux partition

# mkfs.fat -F32 /dev/nvme0n1p1

# modprobe dm_crypt
# cryptsetup --key-size 512 --hash sha512 luksFormat /dev/nvme0n1p2
# cryptsetup luksOpen /dev/nvme0n1p2 cryptroot
# mkfs.ext4 /dev/mapper/cryptroot

# mount /dev/mapper/cryptroot /mnt
# mkdir -p mnt/boot/efi
# mount /dev/nvme0n1p1 /mnt/boot/efi

# pacstrap /mnt base zsh git sudo grub efibootmgr
# genfstab -U /mnt >> /mnt/etc/fstab

# arch-chroot /mnt
# ln -fs /usr/share/zoneinfo/America/Los_Angeles
# hwclock --systohc
// uncomment en_US.UTF-8 UTF-8 in /etc/locale.gen
# locale-gen

// create etc/hostname and add hostname
// add entries to /etc/hosts
// add 'encrypt' to HOOKS in /etc/mkinitcpio.conf
# mkinitcpio -p linux

// edit /etc/default/grub
// 	* add GRUB_CMDLINE_LINUX="cryptdevice=/dev/nvme0n1p2:cryptroot"
// 	* uncomment GRUB_ENABLE_CRYPTODISK=y 

# grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=GRUB
# grub-mkconfig -o /boot/grub/grub.cgf

// set passwd / exit / unmount / reboot

Other issues that I believe are unrelated:

	* Ethernet link is always DOWN by default / no ethernet connection

	* AMD-VI: Unable to wrote to IOMMU perf counter
	* kvm: disabled by bios
	* EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
	* [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for 

	* ACPI BIOS Error (bug): Failure creating [\_SB.SMIC], AE_ALREADY_EXISTS (20180313/dswload2-316)
	* ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20180313/psobject-220)
	* ACPI Error: Method parse/execution failed \, AE_ALREADY_EXISTS (20180313/psparse-516)
	* ACPI Error: Invalid zero thread count in method (20180313/dsmethod-760)

Last edited by ghawk1ns (2018-06-26 05:11:49)

Offline

#2 2018-06-25 18:38:15

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: amdgpu causing kernel panic on boot

If you add the boot option amdgpu.dc=0 does that make any difference?
Edit:
How are you attempting to disable the admgpu module?

Last edited by loqs (2018-06-25 18:50:14)

Offline

#3 2018-06-25 18:54:19

ghawk1ns
Member
Registered: 2018-06-23
Posts: 4

Re: amdgpu causing kernel panic on boot

I'll give that a try, I was trying to blacklist the module in the kernel params with:

modprobe.blacklist=amdgpu

Offline

#4 2018-06-26 02:44:32

ghawk1ns
Member
Registered: 2018-06-23
Posts: 4

Re: amdgpu causing kernel panic on boot

I removed amdgpu from the blacklist and set amdgpu.dc=0

I am now getting a screen lock up 100% of the time, but the keyboard is still functional, I can restart the machine, etc

Last edited by ghawk1ns (2018-06-26 03:02:34)

Offline

#5 2018-07-07 22:55:07

tunix
Member
From: İzmir, Turkey
Registered: 2018-07-07
Posts: 3
Website

Re: amdgpu causing kernel panic on boot

Hi @ghawk1ns,

Have you found a solution to your problems? I'm planning to build a PC with Ryzen 3 2200G with an MSI mobo and run Arch on it. Arch seems to run 4.17 kernel, so do you still have issues? Are you able to use it with 4K resolution? What about other issues you have like the ethernet issues for instance?

Offline

#6 2018-07-07 23:04:05

seth
Member
Registered: 2012-09-03
Posts: 49,976

Re: amdgpu causing kernel panic on boot

amdgpu.dpm=0 amdgpu.aspm=0 amdgpu.bapm=0

Do you have stuff like TLP or laptop-mode-tools installed?

Online

#7 2018-07-09 00:52:36

nomorewindows
Member
Registered: 2010-04-03
Posts: 3,362

Re: amdgpu causing kernel panic on boot

I have a triple head setup with Intel + amd/ati rv710... the ati card seems to be acting up when I run xrandr to bring up the screens attached to the ati card.
I can't tell whether at this time whether it is something with the motherboard/the ati card or whether the amdgpu/ati drivers have a problem.  Have gone back to a linux-lts kernel, with the same result.

Last edited by nomorewindows (2018-07-09 01:00:12)


I may have to CONSOLE you about your usage of ridiculously easy graphical interfaces...
Look ma, no mouse.

Offline

#8 2018-07-09 06:42:16

seth
Member
Registered: 2012-09-03
Posts: 49,976

Re: amdgpu causing kernel panic on boot

Do you get the very same backtrace in dmesg and did you try the kernel parameters in comment #6 and do you use some power saving config tools?

Online

#9 2018-07-10 12:29:36

nomorewindows
Member
Registered: 2010-04-03
Posts: 3,362

Re: amdgpu causing kernel panic on boot

I don't get anything in dmesg.


I may have to CONSOLE you about your usage of ridiculously easy graphical interfaces...
Look ma, no mouse.

Offline

#10 2018-07-10 12:32:38

seth
Member
Registered: 2012-09-03
Posts: 49,976

Re: amdgpu causing kernel panic on boot

That's not a very conclusive statement, but if you don't see the same backtraces, you're unlikely to face the same problem.

Online

#11 2018-07-13 05:39:37

nomorewindows
Member
Registered: 2010-04-03
Posts: 3,362

Re: amdgpu causing kernel panic on boot

I ran xrandr --listproviders, and my amd ve/7000 which had never previously been on the list (older card), is now there so I figured it was conflicting with the newer ati hd 4350 so I took it out.  No better.  I put the card back in.  xf86-video-ati and xf86-video-amdgpu are both back in may, so no change there, so it must be something in dri or the linux kernel that's messing up, but still no kernel panic in dmesg of any kind.


I may have to CONSOLE you about your usage of ridiculously easy graphical interfaces...
Look ma, no mouse.

Offline

#12 2018-07-13 06:18:10

seth
Member
Registered: 2012-09-03
Posts: 49,976

Re: amdgpu causing kernel panic on boot

Please open a new thread, describe the setup, the problem (no, "acting up" does not describe it. The issues commands and the perceived reaction. Also whether you can switch to another VT or ssh into the machine) and attach a complete dmesg, xorg log and "xrandr -q" and "xrandr --listproviders" output.

Online

Board footer

Powered by FluxBB