You are not logged in.

#1 2023-04-22 12:32:38

Whoracle
Member
Registered: 2010-11-02
Posts: 212

[No Solution] Host crash on VFIO/PCI Passthrough

Hi there!

I've been running a libvirt setup with PCI passthrough for the last few years - worked fine so far.
Last week I upgraded to a new setup:
- ASUS ROG Strix B650E-I ITX AM5 Board - latest non-beta bios
- ADM Ryzen 7800X3D
- 64GB DDR5 6200 RAM
- Arch fully updated as of yesterday evening
- Kernel 6.2.11-arch1-1 #1 SMP PREEMPT_DYNAMIC

Now, the Passthrough setup is (or at least used to be) relatively simple:
- Pass through a RTX 3080
- Pass through a USB controller with at least two ports attached
- When no VM is running, the USB controller/s must be usable for the host

Worked fine without a hitch on the earlier setups, been through multiple iterations of hardware in the last years.

Now, with the new setup, the 3080 still works. And I can pass through a USB controller with one port just fine. But as soon as I add a controller that's not the one that works, the whole host just crashes. Like it power cycles.
Nothing in the logs, no error messages that I can find anywhere. Just booting the Guest VM until PCI gets initialized, then POWER CYCLE.

Anyone got an idea on how to even start debugging this? (more info below the code blocks)

My IOMMU Groups wrote:
IOMMU Group 0:
	00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 1:
	00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 10:
	00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14dd]
IOMMU Group 11:
	00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
	00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12:
	00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e0]
	00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e1]
	00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e2]
	00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e3]
	00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e4]
	00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e5]
	00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e6]
	00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e7]
IOMMU Group 13:
	01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3080] [10de:2206] (rev a1)
	01:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
IOMMU Group 14:
	02:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:5405]
IOMMU Group 15:
	03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)
IOMMU Group 16:
	04:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 17:
	04:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 18:
	04:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 19:
	04:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	08:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
IOMMU Group 2:
	00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 20:
	04:0b.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	09:00.0 Network controller [0280]: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter [14c3:0616]
IOMMU Group 21:
	04:0c.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	0a:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f7] (rev 01)
IOMMU Group 22:
	04:0d.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
	0b:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f6] (rev 01)
IOMMU Group 23:
	0c:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc Device [1344:5405]
IOMMU Group 24:
	0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev cb)
IOMMU Group 25:
	0d:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
IOMMU Group 26:
	0d:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP [1022:1649]
IOMMU Group 27:
	0d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b6]
IOMMU Group 28:
	0d:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b7]
IOMMU Group 29:
	0e:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b8]
IOMMU Group 3:
	00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 4:
	00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 5:
	00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 6:
	00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 7:
	00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 8:
	00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 9:
	00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14dd]
Relevant USB to IOMMU mappings wrote:
Bus 1 --> 0000:0a:00.0 (IOMMU group 21)
Bus 001 Device 004: ID 0b05:19af ASUSTek Computer, Inc. AURA LED Controller
Bus 001 Device 003: ID 0b05:1a5c ASUSTek Computer, Inc. USB Audio
Bus 001 Device 002: ID 046d:c539 Logitech, Inc. Lightspeed Receiver
Bus 001 Device 006: ID 0489:e0e2 Foxconn / Hon Hai Wireless_Device
Bus 001 Device 005: ID 046d:c545 Logitech, Inc. USB Receiver
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 2 --> 0000:0a:00.0 (IOMMU group 21)
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

Bus 3 --> 0000:0d:00.3 (IOMMU group 27)
Bus 003 Device 006: ID 1397:0506 BEHRINGER International GmbH UMC404 192k
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 4 --> 0000:0d:00.3 (IOMMU group 27)
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

Bus 5 --> 0000:0d:00.4 (IOMMU group 28)
Bus 005 Device 004: ID 18d1:4ee1 Google Inc. Nexus/Pixel Device (MTP)
Bus 005 Device 003: ID 046d:c343 Logitech, Inc. G915 TKL LIGHTSPEED Wireless RGB Mechanical Gaming Keyboard
Bus 005 Device 002: ID 05e3:0610 Genesys Logic, Inc. Hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 6 --> 0000:0d:00.4 (IOMMU group 28)
Bus 006 Device 003: ID 1532:0e05 Razer USA, Ltd Razer Kiyo Pro
Bus 006 Device 002: ID 05e3:0626 Genesys Logic, Inc. Hub
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

Bus 7 --> 0000:0e:00.0 (IOMMU group 29)
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 8 --> 0000:0e:00.0 (IOMMU group 29)
Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

Now, I can pass through 0000:01:00:0 (RTX 3080) as well as 0000:01:00:1 (3080 HDMI audio device) just fine.
0000:0D:00:3 also works and contains my audio interface.

Every other PCI device I pass through leads to the host crashing. I want to pass through the Bus 006 Device 002: ID 05e3:0626 Genesys Logic, Inc. Hub, which also contains the Bus 006 Device 003: ID 1532:0e05 Razer USA, Ltd Razer Kiyo Pro.

Glad to give any further info that's needed.

Edit:

Domain XML for one of the guest VMs - archlinux. Same happens with a Win10 VM that additionally has an NVME passed through wrote:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
  <name>VFIO</name>
  <uuid>1bd85e98-1bd3-408a-8110-62cdf6a59129</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://archlinux.org/archlinux/rolling"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">50700288</memory>
  <currentMemory unit="KiB">50700288</currentMemory>
  <vcpu placement="static">10</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu="0" cpuset="2"/>
    <vcpupin vcpu="1" cpuset="10"/>
    <vcpupin vcpu="2" cpuset="3"/>
    <vcpupin vcpu="3" cpuset="11"/>
    <vcpupin vcpu="4" cpuset="4"/>
    <vcpupin vcpu="5" cpuset="12"/>
    <vcpupin vcpu="6" cpuset="5"/>
    <vcpupin vcpu="7" cpuset="13"/>
    <vcpupin vcpu="8" cpuset="6"/>
    <vcpupin vcpu="9" cpuset="14"/>
    <emulatorpin cpuset="0,8"/>
    <iothreadpin iothread="1" cpuset="0,8"/>
  </cputune>
  <os>
    <type arch="x86_64" machine="pc-q35-6.2">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/senec_VARS.fd</nvram>
    <bootmenu enable="no"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode="custom">
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="1234567890ab"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="partial" migratable="on">
    <feature policy="require" name="topoext"/>
  </cpu>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/anthrax/data/virtual/iso/archlinux-2021.11.01-x86_64.iso"/>
      <target dev="sda" bus="sata"/>
      <readonly/>
      <boot order="2"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/home/anthrax/data/virtual/disks/senec.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <boot order="1"/>
      <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0x9"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
    </controller>
    <controller type="pci" index="11" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="11" port="0xa"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
    </controller>
    <controller type="pci" index="12" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="12" port="0xb"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x3"/>
    </controller>
    <controller type="pci" index="13" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="13" port="0xc"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x4"/>
    </controller>
    <controller type="pci" index="14" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x0d" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="15" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="15" port="0xd"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x5"/>
    </controller>
    <controller type="pci" index="16" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="16" port="0xe"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x6"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <controller type="scsi" index="0" model="virtio-scsi">
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </controller>
    <filesystem type="mount" accessmode="mapped">
      <source dir="/home/anthrax/data/Downloads"/>
      <target dir="/host/downloads"/>
      <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
    </filesystem>
    <filesystem type="mount" accessmode="mapped">
      <source dir="/home/anthrax/data/Documents"/>
      <target dir="/host/documents"/>
      <address type="pci" domain="0x0000" bus="0x10" slot="0x00" function="0x0"/>
    </filesystem>
    <interface type="direct">
      <mac address="52:54:00:76:7e:02"/>
      <source dev="eno1" mode="bridge"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <channel type="unix">
      <target type="virtio" name="org.qemu.guest_agent.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <channel type="spicevmc">
      <target type="virtio" name="com.redhat.spice.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="2"/>
    </channel>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <input type="evdev">
      <source dev="/dev/input/by-id/usb-Logitech_USB_Receiver-if02-event-mouse"/>
    </input>
    <input type="evdev">
      <source dev="/dev/input/by-id/usb-Logitech_USB_Receiver-if01-event-kbd" grab="all" repeat="on"/>
    </input>
    <graphics type="vnc" port="-1" autoport="yes">
      <listen type="address"/>
    </graphics>
    <graphics type="spice">
      <listen type="none"/>
      <image compression="off"/>
      <gl enable="no"/>
    </graphics>
    <sound model="ich9">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
    </sound>
    <audio id="1" type="pulseaudio" serverName="/run/user/1000/pulse/native"/>
    <video>
      <model type="vga" vram="16384" heads="1" primary="yes"/>
      <address type="pci" domain="0x0000" bus="0x0e" slot="0x01" function="0x0"/>
    </video>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0d" slot="0x00" function="0x3"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x0b" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0d" slot="0x00" function="0x4"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x0c" slot="0x00" function="0x0"/>
    </hostdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="2"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="3"/>
    </redirdev>
    <watchdog model="itco" action="reset"/>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </memballoon>
    <rng model="virtio">
      <backend model="random">/dev/urandom</backend>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </rng>
  </devices>
  <qemu:commandline>
    <qemu:arg value="-audiodev"/>
    <qemu:arg value="pa,id=hda,server=/run/user/1000/pulse/native"/>
  </qemu:commandline>
</domain>

Last edited by Whoracle (2024-06-28 08:26:11)

Offline

#2 2023-05-15 06:45:38

Whoracle
Member
Registered: 2010-11-02
Posts: 212

Re: [No Solution] Host crash on VFIO/PCI Passthrough

Update: I _think_ I found the cause and have no way of working around.

Here's what happened:
I bought another board (Gigabyte B650I Aorus Ultra) and had the same thing happen. Perusing the Manual I noticed that al USB ports on the IO-Shield were attached to the CPU, while all the headers on the board for e.g. Front-Panel USB Ports are attached to the chipset. So, the assumption is that isolating a USB controller that's directly attached to the CPU just hard resets the system, and that the one USB port on the ASUS IO shield that worked just so happened to be connected to the Chipset.

So I bought another NVME, since the Gigabyte board has three slots, and just copied my two VMs over. So, after almost 5 years, no more VFIO for me for the time being. I'll be marking this as [closed] but not [solved].

Offline

Board footer

Powered by FluxBB