You are not logged in.
Crashes still occurs randomly, dmesg -w doesn't show anything relevant when crash occur, debug log neigher, i'm starting to suspect that might be a hardware lockup?
Seems to occur more when using CPU intensively but still happens in normal use, Im without clue ATMAlso discovered something interesting accidentally and I need someone to confirm this, when I compiled the lastest linux-mainline 4.0 rc5.1 I forgot to enable pci stub module, so after shutting down machine Dmesg showed "released" devices loading their respective module, after reboot I discovered that pci stub was not enabled and devices where working normaly before and after running KVM...
seems that we can grab PCI-E devices runtime without pci-stub? is this really safe? I passed secondary gpu + audio, USB controller and Net adapter, and 4 devices where restored aparently... dmesg of testing:
0000:02:00.0 is the GTX970
0000:04:00.0 is the USB controller
0000:05:00.0 is the NET adapter[ 115.651756] usbcore: registered new interface driver snd-usb-audio [ 115.687194] usb 1-14.1: current rate 33186 is different from the runtime rate 16000 [ 115.699927] usb 1-14.1: current rate 33186 is different from the runtime rate 16000 [ 115.724838] usb 1-14.1: current rate 198 is different from the runtime rate 16000 [ 115.732585] usb 1-14.1: current rate 8 is different from the runtime rate 16000 [ 115.734399] usb 1-14.1: 3:1: cannot get min/max values for control 2 (id 3) [ 300.233368] mce: [Hardware Error]: Machine check events logged --QEMU started here-- [ 464.991218] VFIO - User Level meta-driver version: 0.3 [ 465.261701] xhci_hcd 0000:04:00.0: remove, state 4 [ 465.261710] usb usb4: USB disconnect, device number 1 [ 465.261891] xhci_hcd 0000:04:00.0: USB bus 4 deregistered [ 465.261897] xhci_hcd 0000:04:00.0: remove, state 1 [ 465.261901] usb usb3: USB disconnect, device number 1 [ 465.261903] usb 3-1: USB disconnect, device number 2 [ 465.323138] xhci_hcd 0000:04:00.0: USB bus 3 deregistered [ 465.400432] usbcore: deregistering interface driver ov534 [ 465.499432] tun: Universal TUN/TAP device driver, 1.6 [ 465.499437] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 465.500357] device tap0 entered promiscuous mode [ 465.500534] hostbr0: port 1(tap0) entered listening state [ 465.500543] hostbr0: port 1(tap0) entered listening state [ 465.501220] IPv6: ADDRCONF(NETDEV_UP): hostbr0: link is not ready [ 465.543875] kvm: SMP vm created on host with unstable TSC; guest TSC will not be reliable [ 467.240268] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1e@0x258 [ 467.240281] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x900 [ 480.529995] hostbr0: port 1(tap0) entered learning state [ 487.466676] kvm: zapping shadow pages for mmio generation wraparound [ 495.569983] hostbr0: topology change detected, propagating [ 495.569991] hostbr0: port 1(tap0) entered forwarding state [ 495.570023] IPv6: ADDRCONF(NETDEV_CHANGE): hostbr0: link becomes ready --Shutdown of QEMU-- [ 598.207298] hostbr0: port 1(tap0) entered disabled state [ 598.207420] device tap0 left promiscuous mode [ 598.207440] hostbr0: port 1(tap0) entered disabled state --Nouveau starts spilling-- [ 601.701150] nouveau 0000:02:00.0: fb1: nouveaufb frame buffer device [ 601.701157] [drm] Initialized nouveau 1.2.1 20120801 for 0000:02:00.0 on minor 1 [ 601.715319] snd_hda_intel 0000:02:00.1: Disabling MSI [ 601.715336] snd_hda_intel 0000:02:00.1: Handle VGA-switcheroo audio client [ 601.725718] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 601.725725] r8169 0000:05:00.0: can't disable ASPM; OS doesn't have ASPM control [ 601.725731] r8169 0000:05:00.0: enabling device (0400 -> 0403) [ 601.728245] r8169 0000:05:00.0 enp5s0: renamed from eth0 [ 601.758725] xhci_hcd 0000:04:00.0: enabling device (0400 -> 0402) [ 601.758804] xhci_hcd 0000:04:00.0: xHCI Host Controller [ 601.758811] xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 3 --Usbs connected to controller starts popping--
Hmm, check your CPU for bugs. I have AMD family 15h cpu, and there's a revision guide for it saying that "hardware lockups may happen because of an interrupt" and stuff like that.
[ 300.233368] mce: [Hardware Error]: Machine check events logged
This is very, very bad. Check what ever it is. It may be a simple overheat, or something worse.
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
Tyrewt wrote:Anyone notice the time is always out of sync on their Windows 7 guests? Though configured correctly and using NTP, the time is always off by 3 hours.
This sounds like you're setting the rtc to utc (default) instead of local time
<clock offset='localtime'>
...why windows is so lame on this side?...
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
I'm interested in configuring this type of setup, and I hope I'll be building a new computer soon. What parts would you recommend for this type of build?
And check first post for huge spreadsheet with stats when determining what GPUs you probably want to use. But that is a loosy list, as it doesn't reflect amounts of quirks applied on listed systems.
If you want a system that is 100% working - just copy aw's one;)
The forum rules prohibit requesting support for distributions other than arch.
I gave up. It was too late.
What I was trying to do.
The reference about VFIO and KVM VGA passthrough.
Offline
I thought AW was using a Dell Venue 8 Pro?
Offline
I thought AW was using a Dell Venue 8 Pro?
Heh, had to google for that one. My hardware is nothing special, I just live within its limitations and don't try to use multiple slots off the processor root ports. One card off the processor root ports, one card off the PCH root ports, GPUs new enough to support UEFI, and each connected to a small craigslist TV to be able to use the audio function directly.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
That would be neet to run kvm on dell venue 8 pro any one had tried
I have AMD A8-6600K APU A88X-pro and couple of R9 270 and it work great. Thanks to
options vfio_iommu_type1 disable_hugepages=1
Last edited by tritron4 (2015-03-25 21:16:47)
Offline
Hi aw,
I was just curious, since some PCIe root ports without ACS support but with vendor verification of disabled peer-to-peer interaction, some devices were quirked in the kernel:
static const u16 pci_quirk_intel_pch_acs_ids[] = {
/* Ibexpeak PCH */
0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
/* Cougarpoint PCH */
0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
/* Pantherpoint PCH */
0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
/* Lynxpoint-H PCH */
0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
/* Lynxpoint-LP PCH */
0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
/* Wildcat PCH */
0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
/* Patsburg (X79) PCH */
0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
};
Now basically what I'm wondering, is if adding other root port PCI ids to this list would effectively be the same as applying previous ACS override kernel patches and providing override ID through kernel options? Also, would this be the same for adding the ID for the processor's own PCIe root ports, and not just PCH root ports?
I think I've posted something like this in the past, but for many Z97 motherboards, the system looks something like:
where the only GPU pci-express slots are provided by the processor, and none are available from the PCH, so the only option would be to force separation of the processor root port. What are the ramifications of doing this through the existing quirked list, instead of using your previous ACS patches (besides the obvious possible effects of ACS patches in general)
Last edited by mutiny (2015-03-26 03:14:09)
Offline
@mutiny
The ACS override patch is just a quirk, like the code you're point to. It just happens to be a dynamic quirk that let's you specify devices via the kernel command line. It's certainly possible to add static entries for processor root ports, but that table is not the right place to do it. That table triggers not only reporting the isolation of the device, but enabling of isolation. So you wouldn't want to trigger that on anything except devices that follow the same programming model.
Have you read my blog post on IOMMU groups? I think that gives a pretty good overview of why groups take the shape they do and the dangers of assuming isolation where there is none. The difference between the in-kernel quirks and the command line version is that the in-kernel quirks have some assurance from the hardware vendor that there is actually isolation. You assume all risk when you patch-in and use the override. The mechanism by which they work is roughly the same though.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
hi,aw
I followed this article step by step( https://access.redhat.com/documentation … e-GPU.html ) to passthrough Grid K2 to my Win7 vm and got code 43 after installed drivers.
[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# /usr/libexec/qemu-kvm --version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-60.el7), Copyright (c) 2003-2008 Fabrice Bellard
[root@localhost ~]# lspci |grep K2
05:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
grub:
linux16 /boot/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=aa5139f6-942e-474e-853c-3d5b2cc49125 ro vconsole.keymap=us crashkernel=auto vconsole.font=latarcyrheb-sun16 rhgb quiet LANG=en_US.UTF-8 intel_iommu=on rdblacklist=nouveau nouveau.modeset=0
[root@localhost ~]# lsmod
Module Size Used by
vhost_net 33961 1
macvtap 18302 1 vhost_net
macvlan 19046 1 macvtap
tun 27183 5 vhost_net
vfio_iommu_type1 17636 2
vfio_pci 36438 2
vfio 20810 8 vfio_iommu_type1,vfio_pci
bnep 19704 2
bluetooth 372662 7 bnep
rfkill 26536 3 bluetooth
fuse 87661 3
ipt_MASQUERADE 12880 3
iptable_nat 13011 1
nf_nat_ipv4 13263 1 iptable_nat
nf_nat 21798 3 ipt_MASQUERADE,nf_nat_ipv4,iptable_nat
nf_conntrack_ipv4 14862 2
nf_defrag_ipv4 12729 1 nf_conntrack_ipv4
xt_conntrack 12760 1
nf_conntrack 101024 6 ipt_MASQUERADE,nf_nat,nf_nat_ipv4,xt_conntrack,iptable_nat,nf_conntrack_ipv4
ipt_REJECT 12541 2
xt_CHECKSUM 12549 1
iptable_mangle 12695 1
ip6table_filter 12815 0
ip6_tables 27025 1 ip6table_filter
iptable_filter 12810 1
ip_tables 27239 3 iptable_filter,iptable_mangle,iptable_nat
ebtable_nat 12807 0
ebtables 30913 1 ebtable_nat
bridge 110196 0
stp 12976 1 bridge
llc 14552 2 stp,bridge
sg 36533 0
iTCO_wdt 13480 0
iTCO_vendor_support 13718 1 iTCO_wdt
dm_mirror 22135 0
dm_region_hash 20862 1 dm_mirror
dm_log 18411 2 dm_region_hash,dm_mirror
dm_mod 102999 2 dm_log,dm_mirror
coretemp 13435 0
kvm_intel 138567 12
kvm 441119 1 kvm_intel
crct10dif_pclmul 14289 0
crc32_pclmul 13113 0
crc32c_intel 22079 0
ghash_clmulni_intel 13259 0
aesni_intel 55624 0
lrw 13286 1 aesni_intel
gf128mul 14951 1 lrw
glue_helper 13990 1 aesni_intel
ablk_helper 13597 1 aesni_intel
cryptd 20359 3 ghash_clmulni_intel,aesni_intel,ablk_helper
serio_raw 13462 0
pcspkr 12718 0
sb_edac 22344 0
edac_core 62330 1 sb_edac
lpc_ich 16977 0
i2c_i801 18135 0
mfd_core 13435 1 lpc_ich
mei_me 18568 0
mei 77872 1 mei_me
ntb 35932 0
shpchp 37032 0
ipmi_si 53257 0
ipmi_msghandler 45306 1 ipmi_si
mperf 12667 0
nfsd 284378 1
auth_rpcgss 59368 1 nfsd
nfs_acl 12837 1 nfsd
lockd 93977 1 nfsd
sunrpc 293453 5 nfsd,auth_rpcgss,lockd,nfs_acl
uinput 17625 0
ext4 528957 1
mbcache 14958 1 ext4
jbd2 98341 1 ext4
sd_mod 45373 3
crc_t10dif 12714 1 sd_mod
crct10dif_common 12595 2 crct10dif_pclmul,crc_t10dif
usb_storage 66305 0
sr_mod 22416 0
cdrom 42556 1 sr_mod
ast 60327 2
syscopyarea 12529 1 ast
sysfillrect 12701 1 ast
sysimgblt 12640 1 ast
i2c_algo_bit 13413 1 ast
drm_kms_helper 52758 1 ast
ttm 83948 1 ast
mlx4_core 223339 0
isci 137622 0
mpt2sas 193927 2
ahci 25819 0
drm 297829 4 ast,ttm,drm_kms_helper
e1000e 258529 0
libahci 32009 1 ahci
libsas 83532 1 isci
libata 219478 3 ahci,libahci,libsas
raid_class 13554 1 mpt2sas
ptp 18933 1 e1000e
i2c_core 40325 5 ast,drm,i2c_i801,drm_kms_helper,i2c_algo_bit
pps_core 19106 1 ptp
scsi_transport_sas 41034 3 isci,mpt2sas,libsas
Offline
hi,aw
I followed this article step by step( https://access.redhat.com/documentation … e-GPU.html ) to passthrough Grid K2 to my Win7 vm and got code 43 after installed drivers.[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# /usr/libexec/qemu-kvm --version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-60.el7), Copyright (c) 2003-2008 Fabrice Bellard
[root@localhost ~]# lspci |grep K2
05:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
Looks like you're running RHEL, please file a bugzilla and add me to the cc - alex.williamson@redhat.com
Please include the type of hardware that you're using (Nvidia is rather particular about being on a GRID qualified system), lspci -vvv, domain xml, libvirt log for the domain (/var/log/libvirt/qemu/$DOMAIN.log), and host dmesg. Thanks
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
@mutiny
The ACS override patch is just a quirk, like the code you're point to. It just happens to be a dynamic quirk that let's you specify devices via the kernel command line. It's certainly possible to add static entries for processor root ports, but that table is not the right place to do it. That table triggers not only reporting the isolation of the device, but enabling of isolation. So you wouldn't want to trigger that on anything except devices that follow the same programming model.
Have you read my blog post on IOMMU groups? I think that gives a pretty good overview of why groups take the shape they do and the dangers of assuming isolation where there is none. The difference between the in-kernel quirks and the command line version is that the in-kernel quirks have some assurance from the hardware vendor that there is actually isolation. You assume all risk when you patch-in and use the override. The mechanism by which they work is roughly the same though.
Thank you for the response and information aw.
I've gone ahead and read your blog post again. I do have a better understanding of the overall idea and possible risks. However, if I did want to go ahead and assume that there is some notion of isolation or at least some isolation capabilities, what would be the best approach to force the issue (even hard coding IDs specific to my system)? Mosty because I am stuck with the hardware I do have, and don't mind experimenting and testing for a while, even if there are risks.
(1) For the Z97 PCH root ports (also a Wildcat Point PCH), if I made the same assumption that this particular chipset also happens to belong to the category of PCH root ports with vendor verification, would the code I referenced be the appropriate place for that?
(2) What would be the best approach for static entries for the Core i7 processor's root port, since it probably does not follow the same model as PCH root ports? Would something like the original ACS patches be more appropriate for the processor root ports?
Thanks for the information and all that you've contributed!
Last edited by mutiny (2015-03-26 04:25:42)
Offline
I've gone ahead and read your blog post again. I do have a better understanding of the overall idea and possible risks. However, if I did want to go ahead and assume that there is some notion of isolation or at least some isolation capabilities, what would be the best approach to force the issue (even hard coding IDs specific to my system)? For the Z97 PCH root ports (also a Wildcat Point PCH), if I made the same assumption that this particular chipset also happens to belong to the category of PCH root ports with vendor verification, would the code I posted be the appropriate place for that? And then, what would be the best approach for the Core i7 processor's root port, since it probably does not follow the same model as PCH root ports? Would the original ACS patches be more appropriate for the processor root ports? Thanks for the information and all that you've contributed!
I don't think I'm understanding the first half of your question. You're running on a Z97 PCH, aka Wildcat Point, which is already covered by the quirks you quoted. Why would you need to add IDs to the table for your system for PCH root ports? Do you have different device IDs or is something else below the root port causing undesirable grouping? If you have something in the same family as the PCH devices we currently quirk, it's plausible to think that adding IDs to that table will work correctly, but I can't confirm it. I thought we were already including all the major chipsets though. If we're missing something, let me know and I can ask Intel.
For Core i5/7 or Xeon E3 processor root ports, we have no indication that there's any isolation nor programming information to enable whatever isolation the ports may provide. Intel is not interested in providing quirks for these ports. You could add a fixed quirk for your specific IDs, but I fail to see how that's any better than the ACS override patch and command line control. Are you simply looking for a smaller patch that's more portable between kernel versions? Note that the ACS override patch does allow you to specify specific vendor and device IDs if you want to be more surgical than enabling all downstream ports.
Last edited by aw (2015-03-26 04:38:37)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
gah, accidental quote, ignore me
Last edited by aw (2015-03-26 04:38:19)
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
I don't think I'm understanding the first half of your question. You're running on a Z97 PCH, aka Wildcat Point, which is already covered by the quirks you quoted. Why would you need to add IDs to the table for your system for PCH root ports? Do you have different device IDs or is something else below the root port causing undesirable grouping? If you have something in the same family as the PCH devices we currently quirk, it's plausible to think that adding IDs to that table will work correctly, but I can't confirm it. I thought we were already including all the major chipsets though. If we're missing something, let me know and I can ask Intel.
For Core i5/7 or Xeon E3 processor root ports, we have no indication that there's any isolation nor programming information to enable whatever isolation the ports may provide. Intel is not interested in providing quirks for these ports. You could add a fixed quirk for your specific IDs, but I fail to see how that's any better than the ACS override patch and command line control. Are you simply looking for a smaller patch that's more portable between kernel versions? Note that the ACS override patch does allow you to specify specific vendor and device IDs if you want to be more surgical than enabling all downstream ports.
Yes, you are right, I am asking about the Z97 PCH because my particular IDs don't seem to exist in the current quirked list. The current list under Wildcat PCH in quirks.c all have IDs of 0x9cXX, whereas the PCH IDs on my motherboard are 0x8c90, 0x8c94, 0x8c96, and 0x8c9c. The motherboard is a Gigabyte GA-Z97X-UD5H-BK.
I guess I was assuming there was something in the most recent kernel versions where it would be as easy as adding the processor root ports to a list, like the PCH root ports quirks, because as I understand it, without any support from Intel I'll basically be adding these quirks/patches to each kernel udpate for the rest of time. You are completely correct that I am just looking for something simpler to modify between kernel updates. If the ACS patch still best applies in this situation, then I'll go that route for the Core i7 root ports.
Offline
@Kingd your iommu doesn't seem to be enabled? cat /proc/cmdline? intel_iommu=on?
or you need ACS patch, you need to bind all devices in same group aka gpu and gpu audio= same group....
Nope IOMMU is activated.
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux root=UUID=... rw quiet intel_iommu=on pci-stub.ids=1002:6798
I haven`t tried the ACS pactch, but I will give the linux-vfio kernel from the AUR a shot. Maybe that will solve my problem.
//edit: Same error message with the new linux-vfio kernel any suggestions?
Here ist my qemu syntax, which caused the vfio error:
qemu-system-x86_64 -enable-kvm -m 8024 -cpu host,kvm=off -smp 4 -drive if=pflash,format=raw,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF-pure-efi.fd -drive if=pflash,format=raw,file=/usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd -device vfio-pci,host=01:00.0 -vga none
My discrete graphics card is assigned to 01:00.0 wit the id 1002:6798.
$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]
$ lspci -n
...
01:00.0 0300: 1002:6798
01:00.1 0403: 1002:aaa0
...
Last edited by Kingd (2015-03-26 08:55:24)
Offline
syshack wrote:hi,aw
I followed this article step by step( https://access.redhat.com/documentation … e-GPU.html ) to passthrough Grid K2 to my Win7 vm and got code 43 after installed drivers.[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# /usr/libexec/qemu-kvm --version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-60.el7), Copyright (c) 2003-2008 Fabrice Bellard
[root@localhost ~]# lspci |grep K2
05:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)Looks like you're running RHEL, please file a bugzilla and add me to the cc - alex.williamson@redhat.com
Please include the type of hardware that you're using (Nvidia is rather particular about being on a GRID qualified system), lspci -vvv, domain xml, libvirt log for the domain (/var/log/libvirt/qemu/$DOMAIN.log), and host dmesg. Thanks
i'v filed a bugzia,pls check,thx. ( https://bugzilla.redhat.com/show_bug.cgi?id=1206006)
if need more details,let me know and i will provid as soon as possible.
Offline
aw wrote:syshack wrote:hi,aw
I followed this article step by step( https://access.redhat.com/documentation … e-GPU.html ) to passthrough Grid K2 to my Win7 vm and got code 43 after installed drivers.[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# /usr/libexec/qemu-kvm --version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-60.el7), Copyright (c) 2003-2008 Fabrice Bellard
[root@localhost ~]# lspci |grep K2
05:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)Looks like you're running RHEL, please file a bugzilla and add me to the cc - alex.williamson@redhat.com
Please include the type of hardware that you're using (Nvidia is rather particular about being on a GRID qualified system), lspci -vvv, domain xml, libvirt log for the domain (/var/log/libvirt/qemu/$DOMAIN.log), and host dmesg. Thanks
i'v filed a bugzia,pls check,thx. ( https://bugzilla.redhat.com/show_bug.cgi?id=1206006)
if need more details,let me know and i will provid as soon as possible.
im sorry ,i make a stupid mistake. the fails caused by power shortage.
Offline
aw wrote:I don't think I'm understanding the first half of your question. You're running on a Z97 PCH, aka Wildcat Point, which is already covered by the quirks you quoted. Why would you need to add IDs to the table for your system for PCH root ports? Do you have different device IDs or is something else below the root port causing undesirable grouping? If you have something in the same family as the PCH devices we currently quirk, it's plausible to think that adding IDs to that table will work correctly, but I can't confirm it. I thought we were already including all the major chipsets though. If we're missing something, let me know and I can ask Intel.
For Core i5/7 or Xeon E3 processor root ports, we have no indication that there's any isolation nor programming information to enable whatever isolation the ports may provide. Intel is not interested in providing quirks for these ports. You could add a fixed quirk for your specific IDs, but I fail to see how that's any better than the ACS override patch and command line control. Are you simply looking for a smaller patch that's more portable between kernel versions? Note that the ACS override patch does allow you to specify specific vendor and device IDs if you want to be more surgical than enabling all downstream ports.
Yes, you are right, I am asking about the Z97 PCH because my particular IDs don't seem to exist in the current quirked list. The current list under Wildcat PCH in quirks.c all have IDs of 0x9cXX, whereas the PCH IDs on my motherboard are 0x8c90, 0x8c94, 0x8c96, and 0x8c9c. The motherboard is a Gigabyte GA-Z97X-UD5H-BK.
I guess I was assuming there was something in the most recent kernel versions where it would be as easy as adding the processor root ports to a list, like the PCH root ports quirks, because as I understand it, without any support from Intel I'll basically be adding these quirks/patches to each kernel udpate for the rest of time. You are completely correct that I am just looking for something simpler to modify between kernel updates. If the ACS patch still best applies in this situation, then I'll go that route for the Core i7 root ports.
Double checking ark, Z97 is actually Lynx Point, which I thought was covered by the two sets of Lynx Point entries we have, but apparently not. The datasheet confirms your device ID values. I'll ask Intel about these. Chances are good that they will follow the same programming model as existing PCH root ports and could be added to the table, but we'll need to see if Intel is willing to do the legwork to verify this.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
syshack wrote:i'v filed a bugzia,pls check,thx. ( https://bugzilla.redhat.com/show_bug.cgi?id=1206006)
if need more details,let me know and i will provid as soon as possible.im sorry ,i make a stupid mistake. the fails caused by power shortage.
Glad you got it working
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
[ 300.233368] mce: [Hardware Error]: Machine check events logged
This is very, very bad. Check what ever it is. It may be a simple overheat, or something worse.
Seems to occur always in 300 secs mark after boot up, I tried to read using mcelog program but Im not sure how to read the event.
Hmm, check your CPU for bugs. I have AMD family 15h cpu, and there's a revision guide for it saying that "hardware lockups may happen because of an interrupt" and stuff like that.
My CPU is i7 5820k, an 2011 v3 socket Haswell-E processor, this seems to be the documentation, I didn't found anything too much relevant about any posible cause of these hangs
http://www.intel.com/content/dam/www/pu … -vol-1.pdf
Also I tried to use kdump mechanism for "capturing" the kernel panic but seems that computer completelly crashes, maybe even before a kernel panic actually occurs? I had a feeling that reset button was unresponsible for a sec or two
Last edited by Cubex (2015-03-26 14:28:49)
Offline
Duelist wrote:[ 300.233368] mce: [Hardware Error]: Machine check events logged
This is very, very bad. Check what ever it is. It may be a simple overheat, or something worse.
Seems to occur always in 300 secs mark after boot up, I tried to read using mcelog program but Im not sure how to read the event.
Duelist wrote:Hmm, check your CPU for bugs. I have AMD family 15h cpu, and there's a revision guide for it saying that "hardware lockups may happen because of an interrupt" and stuff like that.
My CPU is i7 5820k, an 2011 v3 socket Haswell-E processor, this seems to be the documentation, I didn't found anything too much relevant about any posible cause of these hangs
http://www.intel.com/content/dam/www/pu … -vol-1.pdfAlso I tried to use kdump mechanism for "capturing" the kernel panic but seems that computer completelly crashes, maybe even before a kernel panic actually occurs? I had a feeling that reset button was unresponsible for a sec or two
https://www.kernel.org/doc/Documentation/sysrq.txt
I've not tried this yet, but I plan to.
Last edited by The_Moves (2015-03-26 14:40:39)
Offline
I have HP server and I am trying assign raid controller. It is being blocked by dma write patch. I wonder if I add my motherboard controller pci can be added to this and i can override the error I am getting
Hi aw,
I was just curious, since some PCIe root ports without ACS support but with vendor verification of disabled peer-to-peer interaction, some devices were quirked in the kernel:
static const u16 pci_quirk_intel_pch_acs_ids[] = { /* Ibexpeak PCH */ 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, /* Cougarpoint PCH */ 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, /* Pantherpoint PCH */ 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, /* Lynxpoint-H PCH */ 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, /* Lynxpoint-LP PCH */ 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, /* Wildcat PCH */ 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, /* Patsburg (X79) PCH */ 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, };
Now basically what I'm wondering, is if adding other root port PCI ids to this list would effectively be the same as applying previous ACS override kernel patches and providing override ID through kernel options? Also, would this be the same for adding the ID for the processor's own PCIe root ports, and not just PCH root ports?
I think I've posted something like this in the past, but for many Z97 motherboards, the system looks something like:
http://i.imgur.com/VtAr2ni.png
where the only GPU pci-express slots are provided by the processor, and none are available from the PCH, so the only option would be to force separation of the processor root port. What are the ramifications of doing this through the existing quirked list, instead of using your previous ACS patches (besides the obvious possible effects of ACS patches in general)
Offline
I have HP server and I am trying assign raid controller. It is being blocked by dma write patch. I wonder if I add my motherboard controller pci can be added to this and i can override the error I am getting
I don't know what dma write patch you're talking about, but assigning the integrated raid controller on HP systems is just asking for trouble and will likely be prevented on newer kernels that exclude devices associated with VT-d RMRRs. PS - please quote appropriately and avoid top-posting.
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline
tritron4 wrote:I have HP server and I am trying assign raid controller. It is being blocked by dma write patch. I wonder if I add my motherboard controller pci can be added to this and i can override the error I am getting
I don't know what dma write patch you're talking about, but assigning the integrated raid controller on HP systems is just asking for trouble and will likely be prevented on newer kernels that exclude devices associated with VT-d RMRRs. PS - please quote appropriately and avoid top-posting.
It is pci-e sas raid controller and not build in and Rmrr is preventing assignment of raid card. Is there a way to force assignment of such card ?
Everything was working fine with xen.
Offline
aw wrote:tritron4 wrote:I have HP server and I am trying assign raid controller. It is being blocked by dma write patch. I wonder if I add my motherboard controller pci can be added to this and i can override the error I am getting
I don't know what dma write patch you're talking about, but assigning the integrated raid controller on HP systems is just asking for trouble and will likely be prevented on newer kernels that exclude devices associated with VT-d RMRRs. PS - please quote appropriately and avoid top-posting.
It is pci-e sas raid controller and not build in and Rmrr is preventing assignment of raid card. Is there a way to force assignment of such card ?
Everything was working fine with xen.
As the error message says, contact your vendor. The problems with RMRRs are even worse that ACS and "working fine" may just mean you were lucky. We'll have a white paper published on this shortly, but in the meantime you can get an understanding of the problems from this thread - https://www.marc.info/?l=kvm&m=142561921430906
http://vfio.blogspot.com
Looking for a more open forum to discuss vfio related uses? Try https://www.redhat.com/mailman/listinfo/vfio-users
Offline