You are not logged in.

#1 2017-05-09 15:25:23

nannerpussy
Member
Registered: 2017-02-15
Posts: 96

Got my radeonSI + GTX770 cards working together, with lots of errors..

This has been a ridiculous project, mostly to teach myself about the kernel, drivers, and VFIO/passthrough. This GTX760 started throwing horizontal green lines up from initialization and BIOS to OS, so I threw it in a box and put in this crappy AMD Radeon HD7750 (Sea Islands GCN1). Long story short, I wanted to see if I could get AMD and nVIDIA to play nice together, use the nVIDIA as a discrete GPU for rendering, and display with my fake "on board" Radeon. I haven't gotten to Bumblebee or anything yet, I just now managed to get them both booting without kernel freezes, with open source drivers, and both reporting as available providers. The initial problem was a generic "fifo error" spam during boot, which was solved with [c]nouveua.modeset=0[/c], but I fixed that without the nomodeset flag. Now I am getting a ton of errors after mode-setting happens during boot. My BIOS settings are scant, this is a relatively older Gigabyte motherboard with no advanced settings or toggles for GPU or PCI-e ports in the BIOS.

First to prove I'm not crazy, here is the output of

xrandr --listproviders

Providers: number : 2
Provider 0: id: 0x80 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 6 outputs: 3 associated providers: 1 name:VERDE @ pci:0000:01:00.0
Provider 1: id: 0x47 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 4 outputs: 4 associated providers: 1 name:nouveau

Here is the juicy part of my dmesg:

[    9.881772] [drm] fb mappable at 0xC05D9000
[    9.881774] [drm] vram apper at 0xC0000000
[    9.881775] [drm] size 8294400
[    9.881777] [drm] fb depth is 24
[    9.881778] [drm]    pitch is 7680
[    9.881999] fbcon: radeondrmfb (fb0) is primary device
[    9.894419] Console: switching to colour frame buffer device 240x67
[    9.905286] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device
[    9.930145] [drm] Initialized radeon 2.49.0 20080528 for 0000:01:00.0 on minor 0
[    9.981504] [drm] amdgpu kernel modesetting enabled.
[   10.137806] nouveau 0000:04:00.0: fb: 2048 MiB GDDR5
[   10.198438] nouveau 0000:04:00.0: DRM: VRAM: 2048 MiB
[   10.198439] nouveau 0000:04:00.0: DRM: GART: 1048576 MiB
[   10.198442] nouveau 0000:04:00.0: DRM: TMDS table version 2.0
[   10.198443] nouveau 0000:04:00.0: DRM: DCB version 4.0
[   10.198445] nouveau 0000:04:00.0: DRM: DCB outp 00: 01000f02 00020030
[   10.198446] nouveau 0000:04:00.0: DRM: DCB outp 01: 02000f00 00000000
[   10.198447] nouveau 0000:04:00.0: DRM: DCB outp 02: 08011f82 00020030
[   10.198448] nouveau 0000:04:00.0: DRM: DCB outp 03: 02822f62 0f420010
[   10.198449] nouveau 0000:04:00.0: DRM: DCB outp 05: 04833fb6 0f420010
[   10.198450] nouveau 0000:04:00.0: DRM: DCB outp 06: 04033f72 00020010
[   10.198452] nouveau 0000:04:00.0: DRM: DCB conn 00: 00001030
[   10.198452] nouveau 0000:04:00.0: DRM: DCB conn 01: 01000131
[   10.198453] nouveau 0000:04:00.0: DRM: DCB conn 02: 00010261
[   10.198454] nouveau 0000:04:00.0: DRM: DCB conn 03: 00020346
[   10.198455] nouveau 0000:04:00.0: DRM: DCB conn 04: 00000460
[   10.221455] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   10.221458] [drm] Driver supports precise vblank timestamp query.
[   10.221636] nouveau 0000:04:00.0: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
[   10.276066] r8169 0000:02:00.0 enp2s0: link up
[   10.276085] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[   10.312231] nouveau 0000:04:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 1 [007fd52000 DRM] subc 0 mthd 0008 data 00000000
[   10.312393] nouveau 0000:04:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 1 [007fd52000 DRM] subc 0 mthd 0000 data 00000000
[   10.312399] nouveau 0000:04:00.0: DRM: MM: using COPY for buffer copies
[   10.312534] nouveau 0000:04:00.0: fifo: PBDMA0: 80004000 [GPPTR SIGNATURE] ch 1 [007fd52000 DRM] subc 0 mthd 0000 data 00000000
[   10.312668] nouveau 0000:04:00.0: fifo: PBDMA0: 80004000 [GPPTR SIGNATURE] ch 1 [007fd52000 DRM] subc 0 mthd 0000 data 00000000
[   10.312833] nouveau 0000:04:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 1 [007fd52000 DRM] subc 0 mthd 0008 data 00000000
[   10.313070] nouveau 0000:04:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 1 [007fd52000 DRM] subc 0 mthd 0008 data 00000000
[   10.313211] nouveau 0000:04:00.0: fifo: read fault at affffff000 engine 07 [HOST0] client 06 [HOST] reason 03 [VA_LIMIT_VIOLATION] on channel 1 [007fd52000 DRM]
[   10.313212] AMD-Vi: Event logged [
[   10.313218] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaaa40 flags=0x0030]
[   10.313219] AMD-Vi: Event logged [
[   10.313222] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaaa50 flags=0x0030]
[   10.313222] AMD-Vi: Event logged [
[   10.313225] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaaa80 flags=0x0030]
[   10.313226] AMD-Vi: Event logged [
[   10.313228] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaaa60 flags=0x0030]
[   10.313368] nouveau 0000:04:00.0: fifo: fifo engine fault on channel 1, recovering...
[   10.379973] [drm] Cannot find any crtc or sizes - going 1024x768
[   10.383434] nouveau 0000:04:00.0: fifo: read fault at 00ff08f000 engine 00 [GR] client 04 [FE] reason 01 [PDE_SIZE] on channel -1 [007f986000 unknown]
[   12.383366] nouveau 0000:04:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1353/gf100_grctx_generate()!
[   12.383514] AMD-Vi: Event logged [
[   12.383519] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaa000 flags=0x0010]
[   12.383560] AMD-Vi: Event logged [
[   12.383563] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaa040 flags=0x0010]
[   14.386642] nouveau 0000:04:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
[   16.386681] nouveau 0000:04:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
[   18.386652] nouveau 0000:04:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
[   20.386658] nouveau 0000:04:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
[   22.386825] nouveau 0000:04:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1366/gf100_grctx_generate()!
[   22.386838] nouveau 0000:04:00.0: gr: failed to construct context
[   22.386870] nouveau 0000:04:00.0: gr: init failed, -16
[   22.394157] nouveau 0000:04:00.0: DRM: allocated 1024x768 fb: 0x60000, bo ffff880225fcf400
[   22.395964] nouveau 0000:04:00.0: fb1: nouveaufb frame buffer device
[   22.396014] [drm] Initialized nouveau 1.3.1 20120801 for 0000:04:00.0 on minor 1
[   22.945214] nouveau 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   22.945219] radeon 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   25.997885] nouveau 0000:04:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:1628/gf100_gr_init_ctxctl()!
[   25.997890] nouveau 0000:04:00.0: gr: 409000 - done 00007200
[   25.997898] nouveau 0000:04:00.0: gr: 409000 - stat 00000000 00027200 00000000 00000000
[   25.997905] nouveau 0000:04:00.0: gr: 409000 - stat 00000000 00000000 0000000c 00000220
[   25.997908] nouveau 0000:04:00.0: gr: 502000 - done 00000340
[   25.997915] nouveau 0000:04:00.0: gr: 502000 - stat 80000000 00006500 00000000 00000000
[   25.997923] nouveau 0000:04:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
[   25.997925] nouveau 0000:04:00.0: gr: 50a000 - done 00000340
[   25.997933] nouveau 0000:04:00.0: gr: 50a000 - stat 80000000 00006500 00000000 00000000
[   25.997940] nouveau 0000:04:00.0: gr: 50a000 - stat 00000000 00000000 00000002 00000000
[   25.997943] nouveau 0000:04:00.0: gr: 512000 - done 00000340
[   25.997950] nouveau 0000:04:00.0: gr: 512000 - stat 80000000 00008e00 00000000 00000000
[   25.997958] nouveau 0000:04:00.0: gr: 512000 - stat 00000000 00000000 00000002 00000000
[   25.997960] nouveau 0000:04:00.0: gr: 51a000 - done 00000340
[   25.997968] nouveau 0000:04:00.0: gr: 51a000 - stat 80000000 00008e00 00000000 00000000
[   25.997975] nouveau 0000:04:00.0: gr: 51a000 - stat 00000000 00000000 00000002 00000000
[   25.997977] nouveau 0000:04:00.0: gr: init failed, -16
[   40.996738] nouveau 0000:04:00.0: Xorg[388]: failed to idle channel 2 [Xorg[388]]
[   43.880856] fuse init (API version 7.26)
[   70.873625] snd_hda_intel 0000:04:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x008f2d00
[   71.883622] snd_hda_intel 0000:04:00.1: azx_get_response timeout, switching to single_cmd mode: last cmd=0x008f2d00

Here is the full dmesg log https://www.dropbox.com/s/6lua6czx4vcrn … g.txt?dl=0

Here's the output of 

lspci | grep -e VGA -e 3D

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E]
04:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)

I know there are a few semi-related topics around here, but this is a very touchy deal. The Wiki has gotten me this far, but I'm wondering if anyone with experience can tell me if the kernel errors are manageable,

EDIT: Compiled and patched Nouveau and DRM from the official Git repo. Working on eliminating the DRM errors, then going to try manually disabling some of the outputs on the nVIDIA. Also, Bumblebee was a BAD IDEA. The nVIDIA drivers were causing endless loops and lockups.

Last edited by nannerpussy (2017-05-09 23:08:32)

Offline

#2 2017-05-10 13:15:01

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: Got my radeonSI + GTX770 cards working together, with lots of errors..

[   10.313228] IO_PAGE_FAULT device=04:00.0 domain=0x000d address=0x000000aaaaaaaa60 flags=0x0030]

Suspiciously uniform address and suspiciously high (682GB). Maybe the GTX is so broken it tries to read/write random addresses instead of those it was told to, maybe it's a bug in nouveau or IOMMU driver which sent wrong addresses to the GPU. You could try the proprietary driver or different kernel version just to see what happens. Or see what happens without IOMMU, if you are not afraid of the possibility of the GPU writing random bits all over RAM, possibly including data which are to be written to disk.

If possible, get some similar GPU for testing or find somebody who has one and uses it with Arch. If you are the only one with such problems, your GTX may just be toast.

Last edited by mich41 (2017-05-10 13:18:51)

Offline

#3 2017-05-13 05:52:51

nannerpussy
Member
Registered: 2017-02-15
Posts: 96

Re: Got my radeonSI + GTX770 cards working together, with lots of errors..

ONE BIG GIANT EDIT:

I won. The war is over. The final piece I can't get beat into submnission is X trying to draw a display on an imaginary display for the discrete card, resulting in this and some lag when I first run DRI_PRIME=1 to launch an app:

nvc0_screen_create:873 - Error allocating PGRAPH context for M2MF: -16

Not much out there in the Googles about nvc_screen_create, but there are some bug reports on it with kernel discussion. I am also having trouble capping my refresh, with it currently sitting at 25890 frames per second and some other hilariously odd values, but I'll take it as a win.

Last edited by nannerpussy (2017-05-14 00:21:18)

Offline

Board footer

Powered by FluxBB