You are not logged in.

#1 2024-12-17 16:19:41

teodor1290
Member
Registered: 2024-12-16
Posts: 1

QEMU unexpectedly closed the monitor, giving segfault in dmesg

Hello! I've been trying to configure a VM with GPU passthrough, following the instructions on the wiki. I have successfully bound all of the necessary components to the vfio driver. The GPU I want to passthrough is the Nvidia RTX 3060.
The relevant system configurations (I've removed parts I thought were irrelevant, apologies if I cut too much):

lspci -k
00:1b.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #21 [8086:a3eb] (rev f0)
	Subsystem: ASRock Incorporation Device [1849:a3eb]
	Kernel driver in use: pcieport
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:a393] (rev f0)
	Subsystem: ASRock Incorporation Device [1849:a393]
	Kernel driver in use: pcieport
00:1c.4 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #05 [8086:a394] (rev f0)
	Subsystem: ASRock Incorporation Device [1849:a394]
	Kernel driver in use: pcieport
00:1f.0 ISA bridge: Intel Corporation B460 Chipset LPC/eSPI Controller
	DeviceName: Onboard - Other
	Subsystem: ASRock Incorporation Device a3c8
00:1f.2 Memory controller: Intel Corporation Cannon Lake PCH Power Management Controller
	DeviceName: Onboard - Other
	Subsystem: ASRock Incorporation Device a3a1
00:1f.3 Audio device: Intel Corporation Comet Lake PCH-V cAVS
	DeviceName: Onboard - Sound
	Subsystem: ASRock Incorporation Device 1203
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_soc_avs, snd_sof_pci_intel_cnl
00:1f.4 SMBus: Intel Corporation Comet Lake PCH-V SMBus Host Controller
	DeviceName: Onboard - Other
	Subsystem: ASRock Incorporation Device a3a3
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. Device 2438
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. Device 2438
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
02:00.0 Non-Volatile memory controller: Phison Electronics Corporation E18 PCIe4 NVMe Controller (rev 01)
	Subsystem: Phison Electronics Corporation E18 PCIe4 NVMe Controller
	Kernel driver in use: nvme
	Kernel modules: nvme
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
	Subsystem: ASRock Incorporation Device 8168
	Kernel driver in use: vfio-pci
	Kernel modules: r8169
04:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd Device 4072
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau, nvidia_drm, nvidia
04:00.1 Audio device: NVIDIA Corporation GA106 High Definition Audio Controller (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd Device 4072
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

IOMMU script provided on the wiki shows that the network controller is in the same IOMMU group as the RTX 3060 GPU which I wanted to use, so I bound the vfio driver to it as well

IOMMU Group 0:
	00:00.0 Host bridge [0600]: Intel Corporation 10th Gen Core Processor Host Bridge/DRAM Registers [8086:9b43] (rev 05)
IOMMU Group 1:
	00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
	01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
	01:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
IOMMU Group 2:
	00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-V USB Controller [8086:a3af]
	00:14.2 Signal processing controller [1180]: Intel Corporation Comet Lake PCH-V Thermal Subsystem [8086:a3b1]
IOMMU Group 3:
	00:16.0 Communication controller [0780]: Intel Corporation Comet Lake PCH-V HECI Controller [8086:a3ba]
IOMMU Group 4:
	00:17.0 SATA controller [0106]: Intel Corporation 400 Series Chipset Family SATA AHCI Controller [8086:a382]
IOMMU Group 5:
	00:1b.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #21 [8086:a3eb] (rev f0)
	02:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E18 PCIe4 NVMe Controller [1987:5018] (rev 01)
IOMMU Group 6:
	00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:a393] (rev f0)
	00:1c.4 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #05 [8086:a394] (rev f0)
	03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
	04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] [10de:2504] (rev a1)
	04:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e] (rev a1)
IOMMU Group 7:
	00:1f.0 ISA bridge [0601]: Intel Corporation B460 Chipset LPC/eSPI Controller [8086:a3c8]
	00:1f.2 Memory controller [0580]: Intel Corporation Cannon Lake PCH Power Management Controller [8086:a3a1]
	00:1f.3 Audio device [0403]: Intel Corporation Comet Lake PCH-V cAVS [8086:a3f0]
	00:1f.4 SMBus [0c05]: Intel Corporation Comet Lake PCH-V SMBus Host Controller [8086:a3a3]

/etc/mkinitcpio.conf

MODULES=(vfio_pci vfio vfio_iommu_type1)
BINARIES=()
FILES=()
HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block filesystems fsck)

/etc/modprobe.d/vfio.conf

options vfio-pci ids=10de:2504,10de:228e,10ec:8125
softdep nvidia pre: vfio-pci

The host's GTX 1060 only works with the proprietary Nvidia driver which is installed and the nvidia-open driver is not installed.

/etc/default/grub

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="vfio-pci.ids=10de:2504,10de:228e,10ec:8125 intel_io>
GRUB_CMDLINE_LINUX=""

Verifying the configuration, as said in the wiki:

# dmesg | grep -i vfio
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=5c07e388-d3fa-459b-9a33-c763392ff7b0 rw vfio-pci.ids=10de:2504,10de:228e,10ec:8125 intel_iommu=on loglevel=3
[    0.054045] Kernel command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=5c07e388-d3fa-459b-9a33-c763392ff7b0 rw vfio-pci.ids=10de:2504,10de:228e,10ec:8125 intel_iommu=on loglevel=3
[    0.855455] VFIO - User Level meta-driver version: 0.3
[    0.863180] vfio-pci 0000:04:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
[    0.863323] vfio_pci: add [10de:2504[ffffffff:ffffffff]] class 0x000000/00000000
[    0.909609] vfio_pci: add [10de:228e[ffffffff:ffffffff]] class 0x000000/00000000
[    0.909973] vfio_pci: add [10ec:8125[ffffffff:ffffffff]] class 0x000000/00000000
[    9.904179] NVRM: GPU 0000:04:00.0 is already bound to vfio-pci.
[   37.413019] vfio-pci 0000:03:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
[   44.634264] vfio-pci 0000:04:00.0: enabling device (0000 -> 0003)

I have an Intel i7 10700k CPU, it supports virtualisation and it is enabled in BIOS:

# lscpu | grep VT
Virtualization:                       VT-x

The VM is configured with UEFI firmware and the relevant PCI devices:
asxD0os.png

When I attempt to start it, I only receive this vague error window:
GC2vFcu.png

However, in dmesg there is an error:

[   10.135049] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
[   10.187163] Console: switching to colour dummy device 80x25
[   10.192526] Bluetooth: hci0: BCM: chip id 63
[   10.193503] Bluetooth: hci0: BCM: features 0x07
[   10.209477] Bluetooth: hci0: BCM20702A
[   10.209481] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
[   10.220683] nvidia 0000:01:00.0: vgaarb: deactivate vga console
[   10.224484] fbcon: nvidia-drmdrmfb (fb0) is primary device
[   10.224968] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
[   10.224970] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0b05-17cb.hcd'
[   10.224971] Bluetooth: hci0: BCM: 'brcm/BCM-0b05-17cb.hcd'
[   10.262631] 8021q: 802.1Q VLAN Support v1.8
[   10.285490] Console: switching to colour frame buffer device 240x67
[   10.296991] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   10.298803] Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   10.298899] Loaded X.509 cert 'wens: 61c038651aabdcf94bd0ac7ff06c7248db18c600'
[   10.299633] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   10.299635] cfg80211: failed to load regulatory.db
[   10.303691] nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
[   32.543717] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   32.685583] u32 classifier
[   32.685585]     Performance counters on
[   32.685585]     input device check on
[   32.685586]     Actions configured
[   33.126341] vfio-pci 0000:03:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
[   41.029576] vfio-pci 0000:04:00.0: enabling device (0000 -> 0003)
[   41.131708] qemu-system-x86[1395]: segfault at b8 ip 0000556b96d204e6 sp 00007ffd78ee3fd0 error 4 in qemu-system-x86_64[5c24e6,556b968cf000+72c000] likely on CPU 0 (core 0, socket 0)
[   41.131719] Code: 2e 01 83 c0 01 89 05 0d cd 2e 01 48 8b 43 40 48 85 c0 74 16 ba 01 00 00 00 f0 0f c1 50 18 81 fa fe ff ff 7f 0f 87 c4 00 00 00 <49> 8b 84 24 b8 00 00 00 48 85 c0 74 55 8b 93 b0 00 00 00 eb 11 0f

Removing the PCI host device lets the VM boot without problems.
I have searched the forum and the internet to the best of my ability and asked AI assistants, to no avail. So I'm asking here - how can I solve this so I am able to successfully boot the virtual machine with GPU passthrough? Thanks in advance.

Offline

Board footer

Powered by FluxBB