You are not logged in.

#1 2020-03-25 14:38:19

vfio_experte
Member
Registered: 2020-03-25
Posts: 1

linux kenrel 5.4.* pci passthrough amd gpu crash

I have a AMD ryzen 1700X and ASrock X370 Taichi.
I have a Problem with a corrupt Header and crash on my AMD RX VEGA 64 Card after shutdown the VM or reboot.
I have archlinux in qemu VM and it not work with vega 64 gpu.
I have windows 10 and 7 and the rest  with the gpu work.
I use a amd radeon r7 260x and it work the vfio-pci rest.
The GPU is with vfio in Qemu VM.
arch linux kernel 5.5.10 and linux-lts 5.4. make this BUG on my KVM server.
I downgrade the kernel to 5.3.5 an the corrupt Header is fixed.
I have mesa beta 20.0.1 and archlinux 19.3.4 tested. and the BUG is not fixed.
see the log lspci -v > lspciv1.log for the 5.3.5 kernel loading in VM after shutdown.
see the log lspci -v > lspci_header_corupt.log for the 5.4.26 or 5.5.10 kernel loading in VM after shutdown.
see the dmesg >vfio_5.3.5.log for the 5.3.5 kernel loading in VM after shutdown.
see the log dmesg > vfio_5.4.26.log for the 5.4.26 or 5.5.10 kernel loading in VM after shutdown.

the gpu corrupt header has a gpu then not colling any more and fan rpm of 0.
hte gpu 30min - 40min fan of 100% and pc must remove for engine and waiting 15min gpu coling down.

lspci_header_corupt.log
---
11:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: vfio-pci
    Kernel modules: amdgpu

11:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel
---

vfio_5.4.26.log
---
usb 1-6: reset low-speed USB device number 3 using xhci_hcd
[  328.721093] usb 1-5: reset full-speed USB device number 2 using xhci_hcd
[  329.114761] AMD-Vi: Completion-Wait loop timed out
[  329.240773] AMD-Vi: Completion-Wait loop timed out
[  329.377244] AMD-Vi: Completion-Wait loop timed out
[  329.513137] AMD-Vi: Completion-Wait loop timed out
[  329.639226] AMD-Vi: Completion-Wait loop timed out
[  329.785674] AMD-Vi: Completion-Wait loop timed out
[  329.917530] AMD-Vi: Completion-Wait loop timed out
[  329.970340] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b991f30]
[  330.142445] AMD-Vi: Completion-Wait loop timed out
[  330.268692] AMD-Vi: Completion-Wait loop timed out
[  330.393851] AMD-Vi: Completion-Wait loop timed out
[  330.522536] AMD-Vi: Completion-Wait loop timed out
[  330.651075] AMD-Vi: Completion-Wait loop timed out
[  330.777420] AMD-Vi: Completion-Wait loop timed out
[  330.902383] AMD-Vi: Completion-Wait loop timed out
[  330.970220] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b991f60]
[  331.094932] AMD-Vi: Completion-Wait loop timed out
[  331.219859] AMD-Vi: Completion-Wait loop timed out
[  331.344575] AMD-Vi: Completion-Wait loop timed out
[  331.469397] AMD-Vi: Completion-Wait loop timed out
[  331.696964] AMD-Vi: Completion-Wait loop timed out
[  331.828525] AMD-Vi: Completion-Wait loop timed out
[  331.960548] AMD-Vi: Completion-Wait loop timed out
[  331.970138] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b991f90]
[  332.136662] AMD-Vi: Completion-Wait loop timed out
[  332.136701] vfio-pci 0000:11:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  332.317830] AMD-Vi: Completion-Wait loop timed out
[  332.317883] vfio-pci 0000:11:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  332.970395] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b991fc0]
[  333.569529] AMD-Vi: Completion-Wait loop timed out
[  333.714534] AMD-Vi: Completion-Wait loop timed out
[  333.859560] AMD-Vi: Completion-Wait loop timed out
[  333.969964] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b991ff0]
[  334.113547] AMD-Vi: Completion-Wait loop timed out
[  334.256571] AMD-Vi: Completion-Wait loop timed out
[  334.437989] AMD-Vi: Completion-Wait loop timed out
[  334.970196] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990020]
[  335.633017] AMD-Vi: Completion-Wait loop timed out
[  335.764017] AMD-Vi: Completion-Wait loop timed out
[  335.895029] AMD-Vi: Completion-Wait loop timed out
[  335.969785] vfio-pci 0000:11:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  335.969789] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990050]
[  336.094965] AMD-Vi: Completion-Wait loop timed out
[  336.219485] AMD-Vi: Completion-Wait loop timed out
[  336.344249] AMD-Vi: Completion-Wait loop timed out
[  336.506775] AMD-Vi: Completion-Wait loop timed out
[  336.506829] vfio-pci 0000:11:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  336.632634] AMD-Vi: Completion-Wait loop timed out
[  336.757176] AMD-Vi: Completion-Wait loop timed out
[  336.882142] AMD-Vi: Completion-Wait loop timed out
[  336.898187] vfio-pci 0000:11:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  336.969692] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990080]
[  337.098498] AMD-Vi: Completion-Wait loop timed out
[  337.134891] vfio-pci 0000:11:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  337.398739] AMD-Vi: Completion-Wait loop timed out
[  337.530665] AMD-Vi: Completion-Wait loop timed out
[  337.659815] AMD-Vi: Completion-Wait loop timed out
[  337.788946] AMD-Vi: Completion-Wait loop timed out
[  337.914170] AMD-Vi: Completion-Wait loop timed out
[  338.094337] AMD-Vi: Completion-Wait loop timed out
[  338.218983] AMD-Vi: Completion-Wait loop timed out
[  338.343795] AMD-Vi: Completion-Wait loop timed out
[  338.468540] AMD-Vi: Completion-Wait loop timed out
[  338.593370] AMD-Vi: Completion-Wait loop timed out
[  338.593395] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9900b0]
[  338.773403] AMD-Vi: Completion-Wait loop timed out
[  338.908973] AMD-Vi: Completion-Wait loop timed out
[  338.969486] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9900e0]
[  339.386742] AMD-Vi: Completion-Wait loop timed out
[  339.542193] AMD-Vi: Completion-Wait loop timed out
[  339.679115] AMD-Vi: Completion-Wait loop timed out
[  339.808915] AMD-Vi: Completion-Wait loop timed out
[  339.968841] AMD-Vi: Completion-Wait loop timed out
[  339.969403] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990110]
[  340.969308] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990140]
[  341.969212] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990170]
[  342.969126] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9902e0]
[  342.969128] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990300]
[  343.969025] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990620]
[  344.968945] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990650]
[  345.968847] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990680]
[  346.968892] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9906b0]
[  347.968689] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9906e0]
[  348.968590] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990710]
[  349.968503] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990880]
[  349.968506] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9908a0]
[  349.968508] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9908c0]
[  350.968395] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9908f0]
[  351.968311] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990920]
[  352.873561] vfio-pci 0000:11:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  352.873610] vfio-pci 0000:11:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  352.873985] vfio-pci 0000:11:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  352.874023] vfio-pci 0000:11:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  352.968226] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990950]
[  353.106032] AMD-Vi: Completion-Wait loop timed out
[  353.233985] AMD-Vi: Completion-Wait loop timed out
[  353.361883] AMD-Vi: Completion-Wait loop timed out
[  353.499169] AMD-Vi: Completion-Wait loop timed out
[  353.627469] AMD-Vi: Completion-Wait loop timed out
[  353.968452] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990980]
[  354.241115] usb 1-6: reset low-speed USB device number 3 using xhci_hcd
[  354.771087] usb 1-5: reset full-speed USB device number 2 using xhci_hcd
[  354.968349] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9909b0]
[  355.643800] AMD-Vi: Completion-Wait loop timed out
[  355.775981] AMD-Vi: Completion-Wait loop timed out
[  355.907697] AMD-Vi: Completion-Wait loop timed out
[  355.968106] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b9909e0]
[  356.967871] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990a10]
[  357.967790] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990a40]
[  358.967763] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990a70]
[  359.876824] kvm_get_msr_common: 44 callbacks suppressed
[  359.876826] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x3a
[  359.876829] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0xd90
[  359.876839] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x570
[  359.876841] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x571
[  359.876842] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x572
[  359.876844] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x560
[  359.876845] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x561
[  359.876846] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x580
[  359.876848] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x581
[  359.876849] kvm [1560]: vcpu2, guest rIP: 0xffffffff8fc6c854 ignored rdmsr: 0x582
[  359.967590] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990aa0]
[  360.967508] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990ad0]
[  361.967531] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990b00]
[  362.967331] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=11:00.0 address=0x81b990b30]
---
The KVM Server ~3yaear with no problem with AMD RX VEGA 64 card.
I have the problem in the bug reporter add is removing the report.

https://bugs.archlinux.org/task/65956

I use this script:
https://pastebin.com/stGn7zi7

Thanks

Last edited by vfio_experte (2020-03-25 16:12:50)

Offline

Board footer

Powered by FluxBB