You are not logged in.
I recently got a new laptop with AMD integrated graphics and an Nvidia dedicated card. My desired set up is a three monitor set up with two external screens and my laptop's built in screen.
I managed to get my computer to run reasonably well with the set up now, but whenever I run a video game and sometimes when I'm not doing much at all, my two external monitors (which I'm certain are connected to the nvidia card) freeze. Sometimes this causes xorg to crash, but if it doesn't I can still use my built in monitor just fine, but the two external monitors will stay frozen until I reboot.
I've finally figured out that it's bad form, but when I used the xorg.conf provided by nvidia-xconfig, my built-in display would be stuck with the startx logs, while my two external monitors worked perfectly. I could do whatever I wanted on them including playing games and they functioned just fine. I've since deleted xorg.conf and all of the files in xorg.conf.d.
All of this makes me wonder if I'm not actually using my nvidia card to render things. I'm not sure if this has anything to do with PRIME render offload, I think it's running, but I am unsure of how to tell other than that I've attempted to set it up using xrandr.
I believe I have the correct kernel parameters set:
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet splash nvidia-drm.modeset=1 ibt=off"
Here is an average Xorg Log for me: https://pastebin.com/ZQ57wMxm
Output of glxinfo -B: https://pastebin.com/7nCMB9XE
Output of sudo journalctl -b: https://pastebin.com/P1MV7tu6
I've done what I can to really comb through the logs and read through the wiki, I've just definitely reached the point in my abilities where I need some guidance on the next step.
Last edited by 3lbsOfSalt (2024-02-03 06:09:19)
Offline
Try removing xf86-video-amdgpu, the modesetting driver you're falling back to in that case is much better tested with a PRIME setup.
And while the fact that you're connected to an external monitor on the nvidia card should nudge things into that direction, to be more sure what renders on what, run your games explicitly with prime-run from the nvidia-prime package.
Log wise I see no faults, are those posted from such a "breaking" session or after the reboot to restore usability? If after a reboot rather post
sudo journalctl -b-1
so we get the journal from a broken boot.
Are you using a compositor? And does the behaviour change if you suspend/stop/disable it before playing games?
Last edited by V1del (2024-02-02 15:47:34)
Offline
Yes, that was from the reboot. Here is a breaking session log: https://pastebin.com/uYg1Dj8U
And wow, running it with prime-run did the trick. The game ran perfectly with that on. So that means that my games are running on the AMD card by default.
What does that mean I need to do? Am I going to have to prepend prime-run to all of my graphics-type applications?
Also, I just saw the last part of your question, I wasn't originally using a compositor, I installed and enabled picom and there was no difference in performance before and after.
Last edited by 3lbsOfSalt (2024-02-02 16:03:35)
Offline
The journal log shows your AMDGPU dying which might still be related to the needed copy dance if you were playing on the monitor of the nvidia card while amdgpu was the render card.
Basically that's what that means yes. If you're using Proton based applications in steam it shouldn't be necessary since DXVK and VKD3D contain checks to explicitly seek out a dedicated GPU. For linux native you should be doing that, you can use steam's launch options to prepend that there. A "shortcut" would be to just start steam with that, which will pass the environment down to the children, but that would mean you'd have the nvidia gpu active all the time just for the steam client.
The compositor check I mentioned because they often can cause issues here, they're unlikely to be the fix if you aren't using them in the first place.
Last edited by V1del (2024-02-02 17:59:31)
Offline
nb. that amdgpu_drv.so was removed underneath the running X11 server, the client errors and Xorg crashes might be gone w/ xf86-video-amdgpu and might only have been triggered by the removal.
They don't show up in the first journal.
Online
I definitely marked this solved prematurely. I had only attempted to run a 2D game with the prime-run script. The 2D game still works
but as soon as I attempt to run a 3D game, prime-run or no, I get major artifacts on my screen, a frozen system, and I have to reboot. I get an interesting error in this journalctl when that happens: https://pastebin.com/sBL0Dyda
Also, another interesting piece of information is that if I run firefox without prime-run and I'm doing some sort of in browser pdf rendering like what is on overleaf.com (latex editor)
I have the same type of visual artifacts appear and a freeze with that as well. If I run firefox with prime run that doesn't happen, so I'm going to guess that that is related to the same
issue I was having with the 2D game. Here is a journalctl from frozen sessions where that specifically happens: https://pastebin.com/CpiwxCA8 The relevant section to me seems to be around line 2081 when it has amdgpu_job_timeout.
the client errors and Xorg crashes might be gone w/ xf86-video-amdgpu and might only have been triggered by the removal.
If that's true then I'll reinsall xf86-video-amdgpu and assess the performance and errors that I get with it installed.
Last edited by 3lbsOfSalt (2024-02-03 06:26:46)
Offline
Remove / disable supergfxd, make sure you don't have fbdev=1 (also not in some modprobe config!), lose "ibt=off" (that's not required since a year or so) and post an updated xorg log (this isn't a wayland session and xwayland, is it?)
Also while the error in the first journal isn't related to that at all you might have to select the VK driver, https://wiki.archlinux.org/title/Vulkan … initialize
Do you have a screenshot of the artifacts?
Online
Remove / disable supergfxd, make sure you don't have fbdev=1 (also not in some modprobe config!), lose "ibt=off" (that's not required since a year or so)
Okay, I did all of that, but when I disabled superfxd, xrandr stopped recognizing my two external monitors and if I tried to run any game, it wouldn't load and then would crash, but not freeze my computer, so I had to re-enable it to attempt to get screenshots of the artifacts from the games. Interestingly though, when I turned it on, and attempted to run my 3D game (Doom 2016) it worked just fine and I even played 10 minutes of it before I decided it felt stable and closed it. Just to see if that was a fluke, I ran a different 3D game and my computer promptly froze. On the monitor I still had use of I couldn't find anything in journalctl or the xorg logs about it.
I've also been experiencing significantly more freezes in general, with some happening within a minute of me starting my computer. This xorg log is from one of those (with superfxd enabled): https://pastebin.com/cnfm9uQn
(this isn't a wayland session and xwayland, is it?)
No it's not.
Also while the error in the first journal isn't related to that at all you might have to select the VK driver, https://wiki.archlinux.org/title/Vulkan … initialize
I did this, I just set the environment variable in /etc/environment. I did that before attempting to gather artifacts, and I wonder if this is why Doom2016 worked.
Do you have a screenshot of the artifacts?
I'm sorry for the poor visual quality, it's obviously hard to get a good screenshot of the artifacts when performing the series of steps to have them show up causes my computer to freeze quite rapidly: https://imgur.com/a/c8kBuD5
Perhaps this will also be helpful, here is an xorg log with supergfxd disabled: https://pastebin.com/SCtnNXjK
I ran a few games during that session, all of which crashed, but I thought maybe it would add helpful info to the log.
Offline
w/o supergfxd the nvidia GPU gets completely ignored, it's probably deactivated by default (in doubt with the supergfxd installation)
The Samsung C27F398 and DELL 2007FP are connected there and of course you can also not prime run.
Feb 02 19:27:55 WorthyBlade kernel: RIP: 0010:nv_drm_handle_flip_occurred+0x108/0x210 [nvidia_drm]
from your previous journal acually only occurs when you stop the session.
But to be clear
On the monitor I still had use of I couldn't find anything in journalctl or the xorg logs about it.
the system doesn't stall entirely, you're just not getting updates on your external displays?
Remove xf86-video-amdgpu again and uninstall supergfxd, you might have to rebuild the initramfs to get rid of the blacklist
modprobe -c | grep nvidia | grep -v alias
Edit: are you running a compositor (eg. picom)?
Last edited by seth (2024-02-03 21:07:17)
Online
the system doesn't stall entirely, you're just not getting updates on your external displays?
Yes, I should have been more clear on that point. Only the monitors connected through the NVidia card will freeze, although I can (only sometimes) move an application that is running on those monitors to my built-in monitor and it will still work. Sometimes my built in monitor completely freezes as well, sometimes I can start applications sometimes I can't. When I mentioned that I couldn't see anything in my journal, the built-in display was still working and I was able to pull up a new terminal to check the current journal.
might have to rebuild the initramfs to get rid of the blacklist
I didn't realize that I needed to do that after updating kernel modules. Thank you for catching me on that one, I wouldn't have recognized that.
modprobe -c | grep nvidia | grep -v alias
This was extremely helpful. After uninstalling xf86-video-amdgpu and supergfxd, as well as removing the modprobe file and regenerating the initramfs and rebooting, I checked the output and saw that the nvidia modules were still blacklisted. I poked around and found out that I had had bumblebee installed as a dependency of primus, which I must have installed out of frustration, hoping that it would fix my problem without reading the documentation. I uninstalled it, and primus and regenerated the initramfs again, which removed the nvidia blacklists.
It seems really promising now. I've done a bunch of tests, and unfortunately I don't have access to my full setup (with my external monitors) so I won't know for sure how it looks until tomorrow evening. But as it stands, I've managed to get everything to work the way I want it to, my games are working well right now, 3D and 2D, but if I don't prepend prime-run to the launch options it still breaks everything which is annoying. Is that just an nvidia drivers thing? I would love to not have to use that with my web browser, but it's something I can live with.
are you running a compositor (eg. picom)?
No, I removed picom after I saw #4.
I'll post an update tomorrow night, but for now, thank you so much, I learned a ton and made it way farther than I otherwise would have.
Offline
if I don't prepend prime-run to the launch options it still breaks everything which is annoying. Is that just an nvidia drivers thing?
I guess you still have the vulkan driver unconditioanally exported?
Undo that, if at all you want to export that along prime-run (which is just a wrapper script setting some environment, so you can also do it there) but in an ideal shouldland world this wouldn't be neccessary either.
You might get away w/ installing supergfxd again, but I recommend to first establish a fully functional baseline and then expand from there.
Online
I guess you still have the vulkan driver unconditioanally exported?
Undo that, if at all you want to export that along prime-run (which is just a wrapper script setting some environment, so you can also do it there) but in an ideal shouldland world this wouldn't be neccessary either.
I haven't noticed a huge difference with it exported or not, so I'm just leaving it off for now.
As for the update, things are working better now for sure, anything I run on my built in monitor will pretty much always work. I can even run some games on my external monitors, but I still get random freezes, where one of my screens will stop updating, but I can move any applications running on that screen to another and it will still be working. Sometimes I can't get anything to work, like starting an application or anything, but most times I'll at least retain control of the mouse on the built in monitor, but I can't interact with anything. I can consistently freezes them to happen if I run Doom 2016 on one of my external monitors, while running it on the built in will work perfectly fine. I've tried making several small tweaks to my system to see what happens, but so far the freezes with doom are pretty stable. Sometimes one of the external monitors will freeze for seemingly no reason at all and it's not consistent which one does it.
I know that the wiki says that freezes are very hard to debug, but as it mostly only happens with my external monitors it seems to be a graphics drivers thing still. Whenever a freeze happens, I check the journal, I check Xorg logs and I check htop just to see if maybe I'm using too many resources, but none of those have given me any types of log messages for why my monitors would be freezing.
I don't know if it will help, but here's a journal from a boot where it freezes (I just ran doom 2016 on one of my external monitors to make it freeze): https://pastebin.com/Qb0VnrjT
Is there a log location that I'm missing? I can't find anything else to check on the wiki. I check dmesg sometimes, but my impression is that I just get the same stuff from running journalctl -b
One more note, I did have an experience where I tried removing a screen from xrandr, and then re-adding it, and it updated to its current state, but then remained frozen.
Last edited by 3lbsOfSalt (2024-02-05 05:53:12)
Offline
I can consistently freezes them to happen if I run Doom 2016 on one of my external monitors, while running it on the built in will work perfectly fine.
prime-run or on the AMD chip?
Might be an issue w/ https://wiki.archlinux.org/title/PRIME# … ronization - is that enabled?
Please post your xorg logfor the freeze, it's more relevant and there're no errors in the journal.
Also be aware of the second note in https://wiki.archlinux.org/title/PRIME#Reverse_PRIME
Do you use xf86-video-amdgpu again? Rather don't - the offloading is better tested w/ the modesetting driver, amd+nvidia is a less common setup.
Online
prime-run or on the AMD chip?
prime-run. If I try to do anything without prime run the whole thing just looks like TV static regardless of which monitor I attempt to run it on.
Might be an issue w/ https://wiki.archlinux.org/title/PRIME# … ronization - is that enabled?
Yes, for every monitor except the built in one, which seems consistent with what I understand about it.
Please post your xorg logfor the freeze, it's more relevant and there're no errors in the journal.
Here is my xorg log for the freeze: https://pastebin.com/xFPFNdnt
Also be aware of the second note in https://wiki.archlinux.org/title/PRIME#Reverse_PRIME
Okay, that's good to be aware of. I don't know that it applies here because my built-in monitor is always on, but I'll watch out for that.
Do you use xf86-video-amdgpu again? Rather don't - the offloading is better tested w/ the modesetting driver, amd+nvidia is a less common setup.
No, it's no longer installed on my system. The system does work significantly better using the modesetting driver and as I read more about this issue I keep seeing reasons not to use that driver specifically.
Offline
The eDP-2 is VRR capable @165Hz but
[ 14.171] (==) modeset(0): VariableRefresh: disabled
Does the problem exist if you limit the internal display to 60Hz?
Edit: https://wiki.archlinux.org/title/Variab … figuration
Last edited by seth (2024-02-05 23:10:26)
Online
Does the problem exist if you limit the internal display to 60Hz?
If I limit the internal display to 60hz, I get a bunch of static only on the internal display very similar to what happens if I run a game without the prime-run script on any display.
After reading through that, I started messing with the VRRTest tool, which seemed to work fine on all of my displays, though it ran better on my internal one than my external one. I couldn't find anything interesting there unfortunately.
I'm sure you have already been able to see it if you're bringing up VRR, but with my primary monitor at 144hz and my external monitors both at 60hz, would it be worth attempting to enable AsyncFlipSecondaries? Normally, I would just attempt to do it and see what happens, but all of my attempts at writing an xorg.conf have done more harm than good, and I don't see what that would have to do with screen freezing other than it is always my external monitors doing the freezing.
Offline
You could try, the proper way of doing that without conflicts is an OutputClass section rather than a device section as an OutputClass section defines what to do when a given device appears, rather than telling the xorg server that this "Device" needs to be present in that form otherwise to fail.
You'd do
Section "OutputClass"
Identifier "amdgpu"
Driver "modesetting"
MatchDriver "amdgpu"
Option "AsyncFlipSecondaries" "true"
EndSection
but tearing doesn't usually imply freezing, so I'd hold my breath
Offline