You are not logged in.

#1 2018-01-13 16:22:08

Gruntz
Member
From: Haskovo, Bulgaria
Registered: 2007-08-31
Posts: 291

[ SOLVED ]Pass-through two GPUs to two VMs

Hello all,

I am trying to make GPU pass-through work. I made it work when I had only one spare card, but when I added the second one, it broke. I cannot make it work.

I have 3 GPUs
  1) on board, for boot;
  2) gta 1060 - for VM1 ( pass-through worked before i added the 430 on the other pci-e )
  3) gt   430   - for VM2

I had followed the wiki, and one of the card was working.

Here is what I have now:

# List of my 3 video devices:

[user@host ~]# dmesg | grep VGA
[    1.080016] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.080016] pci 0000:06:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.080017] pci 0000:01:01.0: vgaarb: setting as boot VGA device
[    1.080018] pci 0000:01:01.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    1.779837] fb0: VESA VGA frame buffer device
[    1.924485] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[    7.369888] fb: switching to mgag200drmfb from VESA VGA
[user@host ~]#

# Lisf of DMAR and IOMMU

[user@host modprobe.d]$ dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 0x00000000BF77E0D0 000138 (v01 AMI    OEMDMAR  00000001 MSFT 00000097)
[    0.000000] DMAR: IOMMU enabled
[    0.030098] DMAR: Host address width 40
[    0.030100] DMAR: DRHD base: 0x000000fbffe000 flags: 0x1
[    0.030109] DMAR: dmar0: reg_base_addr fbffe000 ver 1:0 cap c90780106f0462 ecap f020fe
[    0.030110] DMAR: RMRR base: 0x000000000e5000 end: 0x000000000e8fff
[    0.030111] DMAR: RMRR base: 0x000000bf7ec000 end: 0x000000bf7fffff
[    0.030112] DMAR: ATSR flags: 0x0
[    0.030115] DMAR-IR: IOAPIC id 6 under DRHD base  0xfbffe000 IOMMU 0 { <- I guess that is video card one. Or at least the group}
[    0.030116] DMAR-IR: IOAPIC id 7 under DRHD base  0xfbffe000 IOMMU 0 { <- I guess that is video card two. Or at least the group}
[    0.030466] DMAR-IR: Enabled IRQ remapping in xapic mode
[    1.125782] DMAR: dmar0: Using Queued invalidation
[    1.125975] DMAR: Hardware identity mapping for device 0000:00:00.0
[    1.125978] DMAR: Hardware identity mapping for device 0000:00:01.0
[    1.125981] DMAR: Hardware identity mapping for device 0000:00:03.0
[    1.125983] DMAR: Hardware identity mapping for device 0000:00:05.0
... CUT BECAUSE LAST LINE REPEATS 20-30 TIMES WITH DIFFERENT DEVICE ID ...

# List of "lspci -nnk -d [device]". Strangely, GT430 is using vfio, but 1060 is not, although gt430 is not in /etc/modprobe.d/vfio.conf file, but 1060 is.

[user@host ~]# cat check_dev 
lspci -nnk -d 10de:0de1
lspci -nnk -d 10de:0bea
lspci -nnk -d 10de:1c02
lspci -nnk -d 10de:10f1
[user@host ~]# sh check_dev 
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 430] [10de:0de1] (rev a1)
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau
07:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
	Subsystem: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02]
	Kernel driver in use: nouveau
	Kernel modules: nouveau
06:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
	Subsystem: NVIDIA Corporation GP106 High Definition Audio Controller [10de:1c02]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
[user@host ~]#

# Here is the /etc/modprobe.d/vfio.conf . Only id of 1060 are there, but 1060 is still not using vfio. 430 is not in the list, but it uses vfio.

[kvasilev@darkstar ~]$ cat /etc/modprobe.d/vfio.conf 
options vfio-pci ids=10de:1c02,10de:10f1
[kvasilev@darkstar ~]$ 

# List of IOMMU groups:

[user@host ~]# sh iommu | grep -i nvid
IOMMU Group 17 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 430] [10de:0de1] (rev a1)
IOMMU Group 17 07:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
IOMMU Group 18 06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
IOMMU Group 18 06:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
[user@host ~]#

# I had added the IDs of the 1060 into the vfio.conf file:

[kvasilev@darkstar ~]$ cat /etc/modprobe.d/vfio.conf 
options vfio-pci ids=10de:1c02,10de:10f1
[kvasilev@darkstar ~]$ 

# but only gt430 group is in /dev/vfio:

[user@host ~]$ ls -al /dev/vfio/
total 0
drwxr-xr-x  2 root root       80 Jan 13 18:01 .
drwxr-xr-x 19 root root     3460 Jan 13 18:02 ..
crw-------  1 root root 246,   0 Jan 13 18:01 17 { Only group 17 had a vfio device}
crw-rw-rw-  1 root root  10, 196 Jan 13 18:01 vfio
[user@host ~]$ 

# Here is my mkinitcpio:

[user@host ~]$ grep -v -e ^# -e ^$ /etc/mkinitcpio.conf 
MODULES=(vfio vfio_iommu_type1 vfio_pci vfio_virqfd)
BINARIES=()
FILES=()
HOOKS=(base udev autodetect modconf block filesystems keyboard fsck)
[user@host ~]$

# Here is my grub config:

[user@host ~]$ grep -v -e ^# -e ^$ /etc/default/grub    
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on init=/usr/lib/systemd/systemd quiet"
GRUB_CMDLINE_LINUX=""
GRUB_PRELOAD_MODULES="part_gpt part_msdos"
GRUB_TERMINAL_INPUT=console
GRUB_GFXMODE=auto
GRUB_GFXPAYLOAD_LINUX=keep
GRUB_DISABLE_RECOVERY=true
[user@host ~]$

I had tried:
1) to add all 4 ids in a row in /etc/modprobe.d/vfio.conf
2) to add a second row with options in /etc/modprobe.d/vfio.conf
3) Add only 430 in /etc/modprobe.d/vfio.conf
4) Add only 1060 in /etc/modprobe.d/vfio.conf

I had NOT tried to:
1) remove one of the GPUs from the case;
2) Swap the GPUs on each others pci-e

Maybe there is another way to do it? Or is it even possible? Had anyone did this?

Any help will be appreciated.

Best regards.

==== New info:
When the system first boots, I got the scenario bellow.
But if I "rmmod vfio_pci" and then "modprobe vfio_pci", I can then see the devices as they should be:

[user@host ~]# sh check_dev 
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 430] [10de:0de1] (rev a1)
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau
07:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
	Subsystem: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau
06:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
	Subsystem: NVIDIA Corporation GP106 High Definition Audio Controller [10de:1c02]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
[user@host ~]# 

====

Last edited by Gruntz (2018-01-18 18:30:36)

Offline

#2 2018-01-14 13:29:42

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,127

Re: [ SOLVED ]Pass-through two GPUs to two VMs

Please post full outputs (pastebin sites are useful when output is to large).

The things i miss most are full lspci -k , dmesg and iommu groups output.
Motherboard & processor brand + model would also help.

Atm i see 2 possible causes, but without complete info investigating them further is near impossible.

Last edited by Lone_Wolf (2018-01-14 13:30:55)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#3 2018-01-14 13:52:51

Gruntz
Member
From: Haskovo, Bulgaria
Registered: 2007-08-31
Posts: 291

Re: [ SOLVED ]Pass-through two GPUs to two VMs

Hello,

Thank you for your interest in the problem.

Here are the whole logs:

dmesg : https://pastebin.com/jbESnebs
lspci -k : https://pastebin.com/JMvz0Hfq
iouum groups : https://pastebin.com/V7gxnUmd
dmidecode: https://pastebin.com/qQgfdDPD

My system:
Motherboard:  X8DTN+-F [ edited from X8DTN+-F-LR]
CPU: 2 Xeon 6560
Video 1: GTX 1060
Video 2: GT 430

If you need anything else, please ask.

Best regards.

Last edited by Gruntz (2018-01-14 16:06:30)

Offline

#4 2018-01-14 14:42:45

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,127

Re: [ SOLVED ]Pass-through two GPUs to two VMs

Wow, that is exotic quality stuff .
A supermicro [1] dual socket server-class motherboard that supports NUMA and has an onboard Matrox G200 videocard.
The chipset appears be manufactured by ALI, while the server manufacturer appears to be Gateway.

Not  much time today, so  a few quick remarks :

- You don't appear to have setup intel microcode updates, please verify.

I'm thinking a difference between system internal workings with one or 2 extra videocards could cause this.
This could result in needing other kernel modules or a different load order to get everything working correctly.

The nivida cards are placed in the PCI-E 2 x16 slots (that provide x8 lanes) shown as slot 4 and slot 6 in the manual ?


- after boot take lsmod output
- rrmmod vfio-pci , modprobe vfio-pci
- take lsmod output again and compare with the previous one


[1] https://www.supermicro.com/products/mot … Y&LRDIMM=Y


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#5 2018-01-14 15:43:06

Gruntz
Member
From: Haskovo, Bulgaria
Registered: 2007-08-31
Posts: 291

Re: [ SOLVED ]Pass-through two GPUs to two VMs

Hello,

### About the microcode:
No, i didn`t updated the CPU microcode.

Now I had done the update, but I am not sure if it works. Grub reported that "intel_ucode.img" has been added to boot menu, but when I rebooted, I got:

[kvasilev@darkstar ~]$ sudo dmesg | grep mic
[sudo] password for kvasilev: 
[    0.830114] ACPI: Dynamic OEM Table Load:
[    0.830954] ACPI: Dynamic OEM Table Load:
[    1.731271] microcode: sig=0x206c2, pf=0x1, revision=0xc
[    1.732370] microcode: Microcode Update Driver: v2.2.
[    6.447029] mousedev: PS/2 mouse device common for all mice
[kvasilev@darkstar ~]$ 

### About the modules:
I tryed to count the modules loaded at boot and when I "rmmod" and "modprobe" vfio_pci, but they are the same count. No extra modules are added.
Modules at boot: https://pastebin.com/uKd3qrAk
Modules after rmmod/modprobe: https://pastebin.com/zgd7iUTa
Both with diff after: https://pastebin.com/LdGEsMVD


### The video card location on the MB slots:
I had made a small mistake. My MB is: https://www.supermicro.com/products/mot … cfm?IPMI=Y

1) GT430 is located on the top full size pci-e slot, just next to the CPU. I think that is the "master" or 1 pci-e x16 slot
2) GTX1060 i slocated bellow it. I think it is the "secondary" pci-e x16 slot.

I located them this way because of spacing issues and I cannot switch them without some heater cutting because the gt430 is fanless and the heater goes arround it.


### On extra thing I discovered:

I have a archlinux guest. When I add the GTX1060 to the guest, it works. I can see the VM console, then I can startx and use the guest VM.

But when I switch to GT430, I have no video input while the guest is in text mode. When I startx on the guest, then I can see picture.

Of course all this when I "rmmod"/"modprobe" the vfio_pci

Table, to make it more clear:

               | text mode |  X running |
-----------------------------------------------|
gtx 1060 | video        | video         |
gt 430     | no video   | video         |

So in parts... it is working... but it needs some more tweaking I guess.

Best regards,

Last edited by Gruntz (2018-01-14 16:16:37)

Offline

#6 2018-01-18 18:30:16

Gruntz
Member
From: Haskovo, Bulgaria
Registered: 2007-08-31
Posts: 291

Re: [ SOLVED ]Pass-through two GPUs to two VMs

Hello all,

I solved the issue.

After update kernel image has been re-created, and now it works. vfio_pci has the correct ids.

Apparently it is not enough to recreate it only once, and then just edit /etc/modprobe.d/vfio.conf.

Best regards.

Offline

#7 2018-01-23 15:58:04

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,127

Re: [ SOLVED ]Pass-through two GPUs to two VMs

That does make sense, as the files in /etc/modprobe.d are added to the initramfs (when modconf is used in mkinitcpio.conf ).
After changing them the initramfs needs to be recreated, else it will still have the old content of /etc/modprobe.d .


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

Board footer

Powered by FluxBB