You are not logged in.
@Denso
What happens when you use /sys/bus/pci/drivers_probe when in that state where lspci -k shows no driver? ex: echo 0000:01:00.0 > /sys/bus/pci/drivers_probe
I assume you're actually using a v4.1-rc kernel since the ids= option was only introduced post v4.0, right? What does dmesg show? dmesg | grep -i vfio
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
When using :
echo 0000:09:00.0 > /sys/bus/pci/drivers_probe
Nothing happens , no errors are thrown , no dmesg messages , and there is still no drivers attached to the GPU .
I am using 4.1-rc7 compiled last night .
Thanks
Last edited by Denso (2015-06-15 20:41:01)
Offline
When using :
echo 0000:09:00.0 > /sys/bus/pci/drivers_probe
Nothing happens , no errors are thrown , no dmesg messages , and there is still no drivers attached to the GPU .
I am using 4.1-rc7 compiled last night .
Thanks
I was really looking to see if vfio-pci was accepting your ids options via dmesg, not just errors from the above command
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
dmesg | grep -i vfio :
[Mon Jun 15 23:36:14 2015] VFIO - User Level meta-driver version: 0.3
[Mon Jun 15 23:36:14 2015] vfio_pci: add [8086:8d20[ffff:ffff]] class 0x000000/00000000
[Mon Jun 15 23:36:14 2015] vfio_pci: add [8086:15a0[ffff:ffff]] class 0x000000/00000000
[Mon Jun 15 23:36:14 2015] vfio_pci: add [1912:0015[ffff:ffff]] class 0x000000/00000000
these are all the devices I'm passing through minus the GPU/HDMI Audio (NVIDIA) devices . What is confusing is that they get bound to vfio-pci just fine using the vfio-bind script from OP .
Offline
dmesg | grep -i vfio :
[Mon Jun 15 23:36:14 2015] VFIO - User Level meta-driver version: 0.3 [Mon Jun 15 23:36:14 2015] vfio_pci: add [8086:8d20[ffff:ffff]] class 0x000000/00000000 [Mon Jun 15 23:36:14 2015] vfio_pci: add [8086:15a0[ffff:ffff]] class 0x000000/00000000 [Mon Jun 15 23:36:14 2015] vfio_pci: add [1912:0015[ffff:ffff]] class 0x000000/00000000
these are all the devices I'm passing through minus the GPU/HDMI Audio (NVIDIA) devices . What is confusing is that they get bound to vfio-pci just fine using the vfio-bind script from OP .
We're only getting an Intel audio device, Intel NIC, and Renesas USB3 controller via vfio-pci.ids... why would it claim the nvidia devices?
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
... why would it claim the nvidia devices?
Errr ... Because I told it to do so ?
Here is what modprobe.d looks like for me :
options vfio-pci ids=10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff,1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff,8086:8d20,8086:15a0,1912:0015
You see , it claims all the devices except both of my NVIDIA GPUs .
Last edited by Denso (2015-06-15 21:43:01)
Offline
aw wrote:... why would it claim the nvidia devices?
Errr ... Because I told it to do so ?
Here is what modprobe.d looks like for me :
options vfio-pci ids=10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff,1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff,8086:8d20,8086:15a0,1912:0015
You see , it claims all the devices except both of my NVIDIA GPUs .
Your dmesg says those first 4 entries aren't taking effect. Do they work by themselves? What if you add your entries at the beginning? Others have found that you can use multiple lines.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Your dmesg says those first 4 entries aren't taking effect. Do they work by themselves? What if you add your entries at the beginning? Others have found that you can use multiple lines.
GPUs don't get bound unless I use the vfio-bind script on them . Specifying them in modprobe.d has no effect what so ever . I also tried breaking them to multiple lines , but the result is the same :
options vfio-pci ids=10de:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,10de:ffffffff:ffffffff:ffffffff:00040300:ffffffff
options vfio-pci ids=1002:ffffffff:ffffffff:ffffffff:00030000:ffff00ff,1002:ffffffff:ffffffff:ffffffff:00040300:ffffffff
options vfio-pci ids=8086:8d20,8086:15a0,1912:0015
vfio-pci ids= claims all the specified devices successfully except for the NVIDIA devices .
Last edited by Denso (2015-06-15 22:08:22)
Offline
@Denso
Are you rebuilding your initramfs between each of these changes? What shows up in dmesg if you run 'modprobe -r vfio-pci; modprobe vfio-pci' with modprobe.d as in the previous post?
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Up front, I'll admit I haven't kept up to date on this thread since about page 184, but I was curious about something regarding AMD FX CPU series (at least relevant to the octocore models) and how it pertains to IOMMU grouping.
My assumption from what I vaguely remember reading is that those with the FX cpu's don't need to use the ACS override patch and can use an unpatched kernel without having to worry about IOMMU grouping issues. Do I understand this correctly or is my recollection incorrect? Or is there more to it that I need to read up on?
Last edited by Myranti (2015-06-16 03:06:06)
Offline
There are all sorts of ways to use an unpatched kernel, read my blog for examples. There's no magic bullet in AMD FX CPUs, other than perhaps the chipsets for them are so old that we've already got quirks for them.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Hi guys,
Could anybody advise me on how to solve subtle timing issues with a 8.1 guest?
When I'm playing music in foobar, the EQ would stop for a second and the sound will stutter, and when I'm playing games, I still get sudden FPS dropouts.My hardware
* FX-8350 CPU
* M5A99FX PRO 2.0 M/B
* Sapphire Vapor-X Radeon R9 290 as passthrough GPU (VFIO mode)
* USB controller with k/b, mouse and Xonar U7 passed to the VM via PCI passthrough
* I switch k/b and mouse between the motherboard USB ports with an ATEN CS22U KVMWhat I've already done:
* Using i440FX vChipset at the moment (with OVMF)
* kvm-amd nested pages disabled
* Hyper-V extensions enabled for the VM at install time
* Host CPU (FX-8350) cpufreq set to performance
* Hugepages memory backing enabled, with hugetlbfs
* 6 VCPUs pinned to 6 real cores (0-5 respectively)
Do you mean that you're pinning all 6 vCPUs to the cpuset 0-5 or are you pinning each individual vCPU to a pCPU within the set 0-5, ie. vCPU0->pCPU0, vCPU1->pCPU1, etc? The latter is recommended for optimal locality. You don't mention if you're using virtio for disk and network within the VM, both are highly recommended and will help to improve overall VM performance. Either as a test or for very VM specific tuning, you can try using isolcpus= on the host to isolate the pCPU running vCPU from the general scheduler. I believe running the host with nohz=off has also been suggested. Using iommu=pt can also avoid IOMMU overhead for host devices and may help marginally.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Do you mean that you're pinning all 6 vCPUs to the cpuset 0-5 or are you pinning each individual vCPU to a pCPU within the set 0-5, ie. vCPU0->pCPU0, vCPU1->pCPU1, etc? The latter is recommended for optimal locality. You don't mention if you're using virtio for disk and network within the VM, both are highly recommended and will help to improve overall VM performance. Either as a test or for very VM specific tuning, you can try using isolcpus= on the host to isolate the pCPU running vCPU from the general scheduler. I believe running the host with nohz=off has also been suggested. Using iommu=pt can also avoid IOMMU overhead for host devices and may help marginally.
Hi there.
Turns out these were not timing issues, it's my audio card (Xonar U7) that was stuttering all along.
I was able to achieve perfect sound with using my monitor's audio output (via HDMI). Though I still get minor FPS dropouts in Witcher 3 this way, it's completely playable now.
I'd really appreciate any advice on how to a) either make my U7 work under the hypervisor or b) get PulseAudio sound working with libvirt (my PA is in user mode right now).
As for your questions, yes, I've configured vCPUs pinned to pCPUs and I'm using virtio for everything (though currently my disk image is in qcow2 format, that's slowing things down considerably with NTFS on top of it).
Offline
@aw
I tried the method with your wrapper script for passing x-vga but the only thing happening is that libvirt will hang until force restarted.
Yes I changed the paths to the right ones.
In between it does not seem anything to happen, not even the xml gets parsed.
Last edited by PrinzipDesSees (2015-06-17 01:39:07)
Offline
OK folks, trying to guess what might be wrong with your set is really not fun. If you want some help, document exactly what you're doing. Prolific use of pastebin is encouraged.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
OK folks, trying to guess what might be wrong with your set is really not fun. If you want some help, document exactly what you're doing. Prolific use of pastebin is encouraged.
Since I may need to post asking assistance soon if I can't get past my current hurdle, do you have a format/template in mind we should follow? Not that one size fits all, but I'd not want to be leaving out information and i figure for certain problems there is at least a base level. (I've been trying to catch up, but I've been busy for the last 150 pages and might have missed changes along the way)
Offline
aw wrote:OK folks, trying to guess what might be wrong with your set is really not fun. If you want some help, document exactly what you're doing. Prolific use of pastebin is encouraged.
Since I may need to post asking assistance soon if I can't get past my current hurdle, do you have a format/template in mind we should follow? Not that one size fits all, but I'd not want to be leaving out information and i figure for certain problems there is at least a base level. (I've been trying to catch up, but I've been busy for the last 150 pages and might have missed changes along the way)
Just use common sense, this isn't a bug reporting system, but use the same techniques to report enough information that someone can spot an error or reproduce the problem. If you're trying to use a libvirt wrapper script, show the wrapper script and the xml. [Sorry PrinzipDesSees, I'm not picking on you, we're getting a lot of this] If you've changed libvirt.conf, qemu.conf or anything else related to the VM, report it. Look for errors in dmesg or libvirt log files. The more interest you take in solving your own problems, the more interest I'm going to have in helping you.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Just use common sense, this isn't a bug reporting system, but use the same techniques to report enough information that someone can spot an error or reproduce the problem.
Reasonable enough. Just wanted to check in case I was going to forget anything. Plus, posting this will give me a reference if i get some slow time at work to search the interwebs for errors I'm getting...
Based off of the original post, starting with a stock install, pci-stub in the boot string was used to isolate the cards and the suggested vfio-bind script used to put them on the vfio-pci driver (if I don't use the script, they are on pci-stub and qemu complains.. mentioning that because I thought I saw that this wasn't supposed to be required anymore).
I am attempting for a headless host and the guest taking over the sole video card (so commands are run over ssh on the host)
system info:
Mobo: Asus Sabertooth 990FX/Gen3 R2.0 AM3+ AMD 990FX
Video: Asus HD7770-DC-1GD5-V2 Radeon HD 7770 GHz Edition 1GB
# uname -a
Linux pandora 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015 x86_64 GNU/Linux
# cat /proc/cmdline
initrd=\initramfs-linux.img root=/dev/sda2 rw quiet splash video=efifb:off pci-stub.ids=1002:683d,1002:aab0
# lspci -nnk
...
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] [1002:683d]
Subsystem: ASUSTeK Computer Inc. Device [1043:0429]
Kernel driver in use: vfio-pci
Kernel modules: radeon
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] [1002:aab0]
Subsystem: ASUSTeK Computer Inc. Device [1043:aab0]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
...
Testing is done with one of two commands
qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off\
-machine type=pc,accel=kvm \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=03:00.0,x-vga=on -device vfio-pci,host=03:00.1 \
-vga none -nographic \
-usb -usbdevice host:045e:0768 -usbdevice host:3842:2410 \
-cdrom /root/archbang-150516-i686.iso
or
qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
-smp 1,sockets=1,cores=1,threads=1 \
-device vfio-pci,host=03:00.0,x-vga=on -device vfio-pci,host=03:00.1 \
-vga none -nographic -display stdio\
-device virtio-scsi-pci,id=scsi \
-drive file=/root/arch_guest.img,id=disk,format=raw,if=none -device scsi-hd,drive=disk \
-usb -usbdevice host:045e:0768 -usbdevice host:3842:2410
(the two usbdevices should be my mouse and keyboard. removing them doesn't improve anything but definitely disables any keyboard support i had.)
Both guests boot and seem to get to the login prompt and then hang. arch_guest.img hangs at the login prompt and won't take any input at all. archbang.iso gets to the message saying that it's reached target Graphical Interface (occasionally not quite that far) and hangs, but i can switch to consoles 2-6 and log in (though i can't launch X from there because it's busy doing it's thing).
from an archbang run...
I assume that this in my guest's dmesg log is a good sign that the passthrough worked:
[ 0.290104] vgaarb: setting as boot device: PCI:0000:00:03.0
[ 0.290104] vgaarb: device added: PCI:0000:00:03.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.290104] vgaarb: loaded
[ 0.290104] vgaarb: bridge control possible 0000:00:03.0
[ 0.290152] PCI: Using ACPI for IRQ routing
[ 0.290152] PCI: pci_cache_line_size set to 64 bytes
...
[ 6.487464] radeon 0000:00:03.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[ 6.487467] radeon 0000:00:03.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
For a minute I was thinking it was simply that I needed to figure out the correct combination of "-nographics -display xxxx" or whatever, but I am getting this trace soon before archbang hangs and that makes me think there is a problem with my set up that i'm missing.
[ 6.633752] CPU: 1 PID: 437 Comm: systemd-udevd Not tainted 4.0.2-1-ARCH #1
[ 6.633752] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150422_083828-anatol 04/01/2014
[ 6.633752] task: f7353fc0 ti: f738a000 task.ti: f738a000
[ 6.633752] EIP: 0060:[<f8670cf1>] EFLAGS: 00010286 CPU: 1
[ 6.633752] EIP is at drm_pcie_get_speed_cap_mask+0x31/0xe0 [drm]
[ 6.633752] EAX: f5dd6800 EBX: f738bb18 ECX: 00000000 EDX: f738bb18
[ 6.633752] ESI: 00000000 EDI: ee60c000 EBP: f738bab4 ESP: f738ba8c
[ 6.633752] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 6.633752] CR0: 80050033 CR2: 00000020 CR3: 351bd000 CR4: 000007d0
[ 6.633752] Stack:
[ 6.633752] c133b044 f738ba98 c13402f2 000080d0 00004a7c f8b62618 ee4f0000 8e6555bd
[ 6.633752] 00000000 ee4f0000 f738bb60 f8b62639 c133c5ed f7332a0c ee439700 ee4f0000
[ 6.633752] f738bae0 f846d000 00000000 ee60c000 f8bc1e19 f738baf0 c1349d7d 00000000
[ 6.633752] Call Trace:
[ 6.633752] [<c133b044>] ? get_device+0x14/0x30
[ 6.633752] [<c13402f2>] ? klist_class_dev_get+0x12/0x20
[ 6.633752] [<f8b62618>] ? si_dpm_init+0x38/0x11a0 [radeon]
[ 6.633752] [<f8b62639>] si_dpm_init+0x59/0x11a0 [radeon]
[ 6.633752] [<c133c5ed>] ? device_add+0x16d/0x5f0
[ 6.633752] [<c1349d7d>] ? pm_runtime_init+0xcd/0xe0
[ 6.633752] [<c133ca87>] ? device_register+0x17/0x20
[ 6.633752] [<f846b135>] ? hwmon_device_register_with_groups.part.0+0xa5/0x100 [hwmon]
[ 6.633752] [<f846b1e1>] ? hwmon_device_register_with_groups+0x51/0x60 [hwmon]
[ 6.633752] [<f8af6317>] ? radeon_hwmon_init+0x57/0x90 [radeon]
[ 6.633752] [<f8af69aa>] radeon_pm_init+0x49a/0x860 [radeon]
[ 6.633752] [<c1351426>] ? request_firmware+0x36/0x40
[ 6.633752] [<f8b279b3>] si_init+0x263/0xb30 [radeon]
[ 6.633752] [<c1338c69>] ? vga_switcheroo_register_client+0x39/0x50
[ 6.633752] [<f8a8f267>] radeon_device_init+0x957/0xaf0 [radeon]
[ 6.633752] [<f8a8d4e0>] ? cail_reg_read+0x70/0x70 [radeon]
[ 6.633752] [<f8a9173a>] radeon_driver_load_kms+0x8a/0x200 [radeon]
[ 6.633752] [<f866ec7e>] drm_dev_register+0x8e/0xd0 [drm]
[ 6.633752] [<f86712e9>] drm_get_pci_dev+0xa9/0x1b0 [drm]
[ 6.633752] [<c1170564>] ? kmem_cache_alloc_trace+0x1c4/0x200
[ 6.633752] [<f8a8d34e>] ? radeon_pci_probe+0x6e/0xa0 [radeon]
[ 6.633752] [<f8a8d35c>] radeon_pci_probe+0x7c/0xa0 [radeon]
[ 6.633752] [<c128f34f>] pci_device_probe+0x6f/0xd0
[ 6.633752] [<c11ed945>] ? sysfs_create_link+0x25/0x50
[ 6.633752] [<c133f403>] driver_probe_device+0x93/0x3c0
[ 6.633752] [<c133f7e9>] __driver_attach+0x79/0x80
[ 6.633752] [<c133f770>] ? __device_attach+0x40/0x40
[ 6.633752] [<c133d7e7>] bus_for_each_dev+0x57/0xa0
[ 6.633752] [<c133eeee>] driver_attach+0x1e/0x20
[ 6.633752] [<c133f770>] ? __device_attach+0x40/0x40
[ 6.633752] [<c133eb37>] bus_add_driver+0x157/0x240
[ 6.633752] [<f86d4000>] ? 0xf86d4000
[ 6.633752] [<f86d4000>] ? 0xf86d4000
[ 6.633752] [<c133ffbd>] driver_register+0x5d/0xf0
[ 6.633752] [<c128eba3>] __pci_register_driver+0x33/0x40
[ 6.633752] [<f86714cd>] drm_pci_init+0xdd/0x100 [drm]
[ 6.633752] [<f86d4000>] ? 0xf86d4000
[ 6.633752] [<f86d4092>] radeon_init+0x92/0xa7 [radeon]
[ 6.633752] [<c100047a>] do_one_initcall+0xaa/0x200
[ 6.633752] [<f86d4000>] ? 0xf86d4000
[ 6.633752] [<c112cd9d>] ? free_pages_prepare+0x19d/0x2e0
[ 6.633752] [<c1170f65>] ? kfree+0x135/0x140
[ 6.633752] [<c115d238>] ? __vunmap+0xb8/0xf0
[ 6.633752] [<c117041e>] ? kmem_cache_alloc_trace+0x7e/0x200
[ 6.633752] [<c1498f95>] ? do_init_module+0x21/0x198
[ 6.633752] [<c1498f95>] ? do_init_module+0x21/0x198
[ 6.633752] [<c1498fc4>] do_init_module+0x50/0x198
[ 6.633752] [<c10d2883>] load_module+0x1df3/0x23b0
[ 6.633752] [<c10749ee>] ? finish_task_switch+0x4e/0xe0
[ 6.633752] [<c10d2f4f>] SyS_init_module+0x10f/0x170
[ 6.633752] [<c149e357>] sysenter_do_call+0x12/0x12
[ 6.633752] Code: ec 20 3e 8d 74 26 00 c7 02 00 00 00 00 8b 80 04 01 00 00 89 d3 65 8b 0d 14 00 00 00 89 4d f4 31 c9 85 c0 74 16 8b 40 08 8b 70 1c <0f> b7 46 20 66 3d 06 11 74 06 66 3d 66 11 75 18 b8 ea ff ff ff
[ 6.633752] EIP: [<f8670cf1>] drm_pcie_get_speed_cap_mask+0x31/0xe0 [drm] SS:ESP 0068:f738ba8c
[ 6.633752] CR2: 0000000000000020
[ 6.686348] ---[ end trace 7a3848033a378f35 ]---
Full dmesg: http://pastebin.com/RiNrY9sW
I'm looking back over the thread from the beginning to see if I can find anything that might help... i seem to recall some stuff regarding about reseting the card before/after launch qemu (not the eject device talk), but maybe I made that up or maybe it won't help.
Last edited by Blind Tree Frog (2015-06-17 04:11:54)
Offline
Based off of the original post, starting with a stock install, pci-stub in the boot string was used to isolate the cards and the suggested vfio-bind script used to put them on the vfio-pci driver (if I don't use the script, they are on pci-stub and qemu complains.. mentioning that because I thought I saw that this wasn't supposed to be required anymore).
I don't advocate the vfio-bind script, especially the version that blindly binds any device in a group to vfio-pci regardless of the type. vfio-pci doesn't know how to handle bridges and can actually disable them, preventing access to the downstream devices that you're actually trying to use. Newer kernels will not allow binding vfio-pci to bridges.
That said, if you're not using libvirt, you do need to bind the endpoints you intend to assign to vfio-pci. Perhaps your confusion on this point is that libvirt will do it for you, so for a libvirt-based setup, it's sufficient to get the GPU bound to pci-stub and leave the rest to libvirt.
If I were creating a qemu commandline setup, I'd still use virsh nodedev-deatch for binding to vfio-pci rather than the vfio-bind script.
I am attempting for a headless host and the guest taking over the sole video card (so commands are run over ssh on the host)
FWIW, the kernel handles the boot VGA ROM differently than secondary ROMs. You actually get the shadow copy that might be modified during the execution of the ROM code. So, somehow getting a copy of the real, untouched ROM and passing it as a file would be encouraged for this scenario.
system info:
Mobo: Asus Sabertooth 990FX/Gen3 R2.0 AM3+ AMD 990FX
Video: Asus HD7770-DC-1GD5-V2 Radeon HD 7770 GHz Edition 1GB# uname -a Linux pandora 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015 x86_64 GNU/Linux
# cat /proc/cmdline initrd=\initramfs-linux.img root=/dev/sda2 rw quiet splash video=efifb:off pci-stub.ids=1002:683d,1002:aab0
# lspci -nnk ... 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] [1002:683d] Subsystem: ASUSTeK Computer Inc. Device [1043:0429] Kernel driver in use: vfio-pci Kernel modules: radeon 03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] [1002:aab0] Subsystem: ASUSTeK Computer Inc. Device [1043:aab0] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel ...
Testing is done with one of two commands
qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off\ -machine type=pc,accel=kvm \ -smp 2,sockets=1,cores=2,threads=1 \ -device vfio-pci,host=03:00.0,x-vga=on -device vfio-pci,host=03:00.1 \ -vga none -nographic \ -usb -usbdevice host:045e:0768 -usbdevice host:3842:2410 \ -cdrom /root/archbang-150516-i686.iso
or
qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \ -smp 1,sockets=1,cores=1,threads=1 \ -device vfio-pci,host=03:00.0,x-vga=on -device vfio-pci,host=03:00.1 \ -vga none -nographic -display stdio\ -device virtio-scsi-pci,id=scsi \ -drive file=/root/arch_guest.img,id=disk,format=raw,if=none -device scsi-hd,drive=disk \ -usb -usbdevice host:045e:0768 -usbdevice host:3842:2410
(the two usbdevices should be my mouse and keyboard. removing them doesn't improve anything but definitely disables any keyboard support i had.)
Both guests boot and seem to get to the login prompt and then hang. arch_guest.img hangs at the login prompt and won't take any input at all. archbang.iso gets to the message saying that it's reached target Graphical Interface (occasionally not quite that far) and hangs, but i can switch to consoles 2-6 and log in (though i can't launch X from there because it's busy doing it's thing).
from an archbang run...
I assume that this in my guest's dmesg log is a good sign that the passthrough worked:[ 0.290104] vgaarb: setting as boot device: PCI:0000:00:03.0 [ 0.290104] vgaarb: device added: PCI:0000:00:03.0,decodes=io+mem,owns=io+mem,locks=none [ 0.290104] vgaarb: loaded [ 0.290104] vgaarb: bridge control possible 0000:00:03.0 [ 0.290152] PCI: Using ACPI for IRQ routing [ 0.290152] PCI: pci_cache_line_size set to 64 bytes ... [ 6.487464] radeon 0000:00:03.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) [ 6.487467] radeon 0000:00:03.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
For a minute I was thinking it was simply that I needed to figure out the correct combination of "-nographics -display xxxx" or whatever, but I am getting this trace soon before archbang hangs and that makes me think there is a problem with my set up that i'm missing.
[ 6.633752] CPU: 1 PID: 437 Comm: systemd-udevd Not tainted 4.0.2-1-ARCH #1 [ 6.633752] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150422_083828-anatol 04/01/2014 [ 6.633752] task: f7353fc0 ti: f738a000 task.ti: f738a000 [ 6.633752] EIP: 0060:[<f8670cf1>] EFLAGS: 00010286 CPU: 1 [ 6.633752] EIP is at drm_pcie_get_speed_cap_mask+0x31/0xe0 [drm] [ 6.633752] EAX: f5dd6800 EBX: f738bb18 ECX: 00000000 EDX: f738bb18 [ 6.633752] ESI: 00000000 EDI: ee60c000 EBP: f738bab4 ESP: f738ba8c [ 6.633752] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 6.633752] CR0: 80050033 CR2: 00000020 CR3: 351bd000 CR4: 000007d0 [ 6.633752] Stack: [ 6.633752] c133b044 f738ba98 c13402f2 000080d0 00004a7c f8b62618 ee4f0000 8e6555bd [ 6.633752] 00000000 ee4f0000 f738bb60 f8b62639 c133c5ed f7332a0c ee439700 ee4f0000 [ 6.633752] f738bae0 f846d000 00000000 ee60c000 f8bc1e19 f738baf0 c1349d7d 00000000 [ 6.633752] Call Trace: [ 6.633752] [<c133b044>] ? get_device+0x14/0x30 [ 6.633752] [<c13402f2>] ? klist_class_dev_get+0x12/0x20 [ 6.633752] [<f8b62618>] ? si_dpm_init+0x38/0x11a0 [radeon] [ 6.633752] [<f8b62639>] si_dpm_init+0x59/0x11a0 [radeon] [ 6.633752] [<c133c5ed>] ? device_add+0x16d/0x5f0 [ 6.633752] [<c1349d7d>] ? pm_runtime_init+0xcd/0xe0 [ 6.633752] [<c133ca87>] ? device_register+0x17/0x20 [ 6.633752] [<f846b135>] ? hwmon_device_register_with_groups.part.0+0xa5/0x100 [hwmon] [ 6.633752] [<f846b1e1>] ? hwmon_device_register_with_groups+0x51/0x60 [hwmon] [ 6.633752] [<f8af6317>] ? radeon_hwmon_init+0x57/0x90 [radeon] [ 6.633752] [<f8af69aa>] radeon_pm_init+0x49a/0x860 [radeon] [ 6.633752] [<c1351426>] ? request_firmware+0x36/0x40 [ 6.633752] [<f8b279b3>] si_init+0x263/0xb30 [radeon] [ 6.633752] [<c1338c69>] ? vga_switcheroo_register_client+0x39/0x50 [ 6.633752] [<f8a8f267>] radeon_device_init+0x957/0xaf0 [radeon] [ 6.633752] [<f8a8d4e0>] ? cail_reg_read+0x70/0x70 [radeon] [ 6.633752] [<f8a9173a>] radeon_driver_load_kms+0x8a/0x200 [radeon] [ 6.633752] [<f866ec7e>] drm_dev_register+0x8e/0xd0 [drm] [ 6.633752] [<f86712e9>] drm_get_pci_dev+0xa9/0x1b0 [drm] [ 6.633752] [<c1170564>] ? kmem_cache_alloc_trace+0x1c4/0x200 [ 6.633752] [<f8a8d34e>] ? radeon_pci_probe+0x6e/0xa0 [radeon] [ 6.633752] [<f8a8d35c>] radeon_pci_probe+0x7c/0xa0 [radeon] [ 6.633752] [<c128f34f>] pci_device_probe+0x6f/0xd0 [ 6.633752] [<c11ed945>] ? sysfs_create_link+0x25/0x50 [ 6.633752] [<c133f403>] driver_probe_device+0x93/0x3c0 [ 6.633752] [<c133f7e9>] __driver_attach+0x79/0x80 [ 6.633752] [<c133f770>] ? __device_attach+0x40/0x40 [ 6.633752] [<c133d7e7>] bus_for_each_dev+0x57/0xa0 [ 6.633752] [<c133eeee>] driver_attach+0x1e/0x20 [ 6.633752] [<c133f770>] ? __device_attach+0x40/0x40 [ 6.633752] [<c133eb37>] bus_add_driver+0x157/0x240 [ 6.633752] [<f86d4000>] ? 0xf86d4000 [ 6.633752] [<f86d4000>] ? 0xf86d4000 [ 6.633752] [<c133ffbd>] driver_register+0x5d/0xf0 [ 6.633752] [<c128eba3>] __pci_register_driver+0x33/0x40 [ 6.633752] [<f86714cd>] drm_pci_init+0xdd/0x100 [drm] [ 6.633752] [<f86d4000>] ? 0xf86d4000 [ 6.633752] [<f86d4092>] radeon_init+0x92/0xa7 [radeon] [ 6.633752] [<c100047a>] do_one_initcall+0xaa/0x200 [ 6.633752] [<f86d4000>] ? 0xf86d4000 [ 6.633752] [<c112cd9d>] ? free_pages_prepare+0x19d/0x2e0 [ 6.633752] [<c1170f65>] ? kfree+0x135/0x140 [ 6.633752] [<c115d238>] ? __vunmap+0xb8/0xf0 [ 6.633752] [<c117041e>] ? kmem_cache_alloc_trace+0x7e/0x200 [ 6.633752] [<c1498f95>] ? do_init_module+0x21/0x198 [ 6.633752] [<c1498f95>] ? do_init_module+0x21/0x198 [ 6.633752] [<c1498fc4>] do_init_module+0x50/0x198 [ 6.633752] [<c10d2883>] load_module+0x1df3/0x23b0 [ 6.633752] [<c10749ee>] ? finish_task_switch+0x4e/0xe0 [ 6.633752] [<c10d2f4f>] SyS_init_module+0x10f/0x170 [ 6.633752] [<c149e357>] sysenter_do_call+0x12/0x12 [ 6.633752] Code: ec 20 3e 8d 74 26 00 c7 02 00 00 00 00 8b 80 04 01 00 00 89 d3 65 8b 0d 14 00 00 00 89 4d f4 31 c9 85 c0 74 16 8b 40 08 8b 70 1c <0f> b7 46 20 66 3d 06 11 74 06 66 3d 66 11 75 18 b8 ea ff ff ff [ 6.633752] EIP: [<f8670cf1>] drm_pcie_get_speed_cap_mask+0x31/0xe0 [drm] SS:ESP 0068:f738ba8c [ 6.633752] CR2: 0000000000000020 [ 6.686348] ---[ end trace 7a3848033a378f35 ]---
Full dmesg: http://pastebin.com/RiNrY9sW
I'm looking back over the thread from the beginning to see if I can find anything that might help... i seem to recall some stuff regarding about reseting the card before/after launch qemu (not the eject device talk), but maybe I made that up or maybe it won't help.
You're assigning an AMD card to a 440FX VM running Linux. That's pretty much the one case where I recommend Q35. I've posted patches for this, but I lost track of whether they've all been accepted. The problem is that the radeon driver blindly assumes that a downstream port exists above the device because that's the way it appears on real hardware. It then goes trying to read link speed info on a device that doesn't exist and you get the above oops. You need to switch to Q35, add a root port and put the GPU below the root port.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
Blind Tree Frog wrote:Based off of the original post, starting with a stock install, pci-stub in the boot string was used to isolate the cards and the suggested vfio-bind script used to put them on the vfio-pci driver (if I don't use the script, they are on pci-stub and qemu complains.. mentioning that because I thought I saw that this wasn't supposed to be required anymore).
I don't advocate the vfio-bind script, especially the version that blindly binds any device in a group to vfio-pci regardless of the type. vfio-pci doesn't know how to handle bridges and can actually disable them, preventing access to the downstream devices that you're actually trying to use. Newer kernels will not allow binding vfio-pci to bridges.
That said, if you're not using libvirt, you do need to bind the endpoints you intend to assign to vfio-pci. Perhaps your confusion on this point is that libvirt will do it for you, so for a libvirt-based setup, it's sufficient to get the GPU bound to pci-stub and leave the rest to libvirt.
If I were creating a qemu commandline setup, I'd still use virsh nodedev-deatch for binding to vfio-pci rather than the vfio-bind script.
Switching to virsh was on the list once I got things running. Now that you mention it, I do remember reading your stuff that it was libvirt that handled the vfio-pci stuff automatically.
btf wrote:I am attempting for a headless host and the guest taking over the sole video card (so commands are run over ssh on the host)
FWIW, the kernel handles the boot VGA ROM differently than secondary ROMs. You actually get the shadow copy that might be modified during the execution of the ROM code. So, somehow getting a copy of the real, untouched ROM and passing it as a file would be encouraged for this scenario.
I was hoping to avoid that, but I feared I might have to look into that at some point.
You're assigning an AMD card to a 440FX VM running Linux. That's pretty much the one case where I recommend Q35. I've posted patches for this, but I lost track of whether they've all been accepted. The problem is that the radeon driver blindly assumes that a downstream port exists above the device because that's the way it appears on real hardware. It then goes trying to read link speed info on a device that doesn't exist and you get the above oops. You need to switch to Q35, add a root port and put the GPU below the root port.
Ah, that's what the trace is complaining about. A cursory search was leading me to similar errors about a root device not being available, but I wasn't sure if that applied here or not since my PCI root was in a different group.
Thanks.
edit:
assuming that I am looking at the correct commits in qemu, I believe I see similar commits, but they are old enough that I'd expect that (Apr 13) so I'm guessing that either this specific issue didn't get hit, or not all patches were accepted.
Last edited by Blind Tree Frog (2015-06-17 18:12:13)
Offline
Has anyone actually tried to get gpu passthrough working on laptop with nvidia optimus? I wonder if its even possible. If it were would be good incentive to get such laptop.
Offline
Has anyone actually tried to get gpu passthrough working on laptop with nvidia optimus? I wonder if its even possible. If it were would be good incentive to get such laptop.
I've observed very weird 3-state GPU on some lenovo laptop:
The Nvidia GPU was full on when it's used.
The Nvidia GPU is in power-save mode, just like unused ones on a "real" PC when it was used.
The Nvidia GPU is in some uninitialized state, where lspci fails to read the info, complaining about some "bad headers", when the GPU wasn't used since system boot.
And the screen switching problem, as some notebooks have their auxiliary video output routed only to intel, only to nvidia or to both via some weird switch.
Has anyone tried? I can't recall. If it's even possible? There was a guy without optimus, who tried to pass a signle and only GPU to the VM leaving the host headless, and he failed. So i highly doubt that it's possible in theory. It all depends on the vendor of the laptop, and what their firmware engineers were smoking.
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
You also have to wonder how well the thermal controls work for an integrated environment like a laptop. On a discrete desktop card, you know the GPU fan speed is controlled through the GPU device driver, the rest of the PC knows nothing about it. On a laptop there's often a more holistic approach to thermal management. If I was trying one, I'd be very careful until I gained some confidence that the response to thermal load was working.
The only experience I have is a pre-optimus laptop that I could never get past Code 43, probably because it's tied to a model specific, custom nvidia driver (ie. it doesn't work with downloads direct from nvidia even on bare metal).
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
You also have to wonder how well the thermal controls work for an integrated environment like a laptop. On a discrete desktop card, you know the GPU fan speed is controlled through the GPU device driver, the rest of the PC knows nothing about it. On a laptop there's often a more holistic approach to thermal management. If I was trying one, I'd be very careful until I gained some confidence that the response to thermal load was working.
Usually laptops have one fan to cool them all. So i guess if you can rule the fan manually, set it to full power and stress-test the hardware.
In the worst case(if the vendor is a total dick) - you can always resolder the fan to sata power connector or some other known good voltage. And use a cooling station.
BUT. That fan might not be enough, as i've observed some cheap and 2-3 years old dells overheating from running a windows calculator calculating "1000000!". The VM will use the hardware more intensely than running stuff bare-metal.
Last edited by Duelist (2015-06-17 20:52:13)
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Alex, not sure if you got my email or not. VFIO is still not working properly, still missing some files. i know that VFIO is built as a module, as evidenced by running "modinfo vfio". Not sure how to proceed.
Offline