Terrible performance regression with Nvidia 390.25 driver

seth · 2018-02-08 14:59:09

You're aware that the environment applies per process, ie. you can set it for kwin and unset it (or use a different value) for other processes?

loqs · 2018-02-08 15:11:33

PTI is not needed or used on AMD systems such as Tom B's. You can check with

cat /sys/devices/system/cpu/vulnerabilities/meltdown

kokoko3k · 2018-02-08 15:19:47

seth wrote:

You're aware that the environment applies per process, ie. you can set it for kwin and unset it (or use a different value) for other processes?

[ot]
Yes, but vsync at driver level seems to work better than the one provided by kwin.
Also, i can switch it on and off by a shortcut, instead of opening the kwin compositor configuration
[/ot]

Tom B · 2018-02-08 15:57:44

loqs wrote:

PTI is not needed or used on AMD systems such as Tom B's. You can check with
cat /sys/devices/system/cpu/vulnerabilities/meltdown

I know it's off topic but does the kernel detect this and enable PTI only when it's needed or do I still need to specifically set the nopti kernel parameter?

kokoko3k wrote:

AFAIK, page table isolation has been introduced in 4.14, so nopti has been effective since then.
__GL_YIELD="USLEEP" has proven to be a performance killer in several games in the past (borderlands 2 dropped from 120fps to 20 or 30), so i just removed it and relied to forcefullcompositionpipeline for vsync.

Also off topic but I had some strange effects with forcefullcompositionpipeline in some games, from memory WoW and Skyrim both showed high framerates but felt jerky. Sorry, it's difficult to describe but there was a noticeable difference. Changing from forcefullcompositionpipeline to __GL_YIELD="USLEEP" fixed the issue for me. It was 3 months ago when i set it so newer drivers may behave differently.

Last edited by Tom B (2018-02-08 16:00:37)

loqs · 2018-02-08 16:37:25

Tom B wrote:

loqs wrote:
PTI is not needed or used on AMD systems such as Tom B's. You can check with
cat /sys/devices/system/cpu/vulnerabilities/meltdown
I know it's off topic but does the kernel detect this and enable PTI only when it's needed or do I still need to specifically set the nopti kernel parameter?

See the command above it should be autodecting the system is AMD and not use PTI unless its it is forced on the a kernel parameter.

seth · 2018-02-08 17:39:12

kokoko3k wrote:

[ot]
Yes, but vsync at driver level seems to work better than the one provided by kwin.
[/ot]

Whether you use the nvidia-settings override, the __GL_SYNC_TO_VBLANK environment or some client switch: It's the same glSwapInterval call, so this assumption is rather ... errr .... esoteric ;-)

rob-tech · 2018-02-08 17:51:06

I tested with and without the USLEEP parameter before I rolled back the driver, and apart from tearing everything in chromium was as laggy as with USLEEP. If this parameter kills your gaming performance then you should apply it as the wiki mentions (only to kwin itself as a desktop startup script), then there is no performance penalty for other software and native vsync works perfectly.

Forcing fullcompositionpipeline is only likely to break some software that doesn't expect this behaviour and it also introduces micro stutter for me. Also keep in mind that KWin and the KDE desktop effects run smoothly whether I used USLEEP or not on 390.25. I noticed the driver regressions in chromium initially after upgrading since that is the most commonly used GPU accelerated software on my system and USLEEP has no influence on that here. After this I rolled back to 387, also there are clearly other regressions with people suffering from flickering and artifacting on GNOME, Mate and Cinnamon, with multiple generations or cards from Kepler to Pascal.

KPTI was enabled since kernel 4.14.11 and there were no issues, there is also no KPTI enabled on some of the AMD systems that suffer in this thread.

From my end all is pointing to an Nvidia driver defect.

Last edited by rob-tech (2018-02-08 18:54:39)

loqs · 2018-02-08 18:55:59

The same issue occur using linux-lts and nvidia-lts?

Last edited by loqs (2018-02-08 18:56:20)

phunni · 2018-02-08 20:11:37

I've also been having these issues. GTX 1070. I've downgraded, which seems to have fixed issues with Chromium and general desktop "snappiness" (although that's obviously completely subjective). Steam, however, won't start at all now, although I'm guessing that's because I didn't downgrade the nvidia lib32 packages and I seem to recall they have to be the same version as the 64 bit packages or something like that. I'm going to upgrade again and then giving gavinhungry's solution a go.

loqs · 2018-02-08 20:44:37

This patch make any difference to 390.25 under 4.15?

diff --git a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/conftest.sh b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/conftest.sh
index 292d7da..5f254e1 100755
--- a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/conftest.sh
+++ b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/conftest.sh
@@ -2123,6 +2123,7 @@ compile_test() {
             #endif
             #include <drm/drm_atomic.h>
             #include <drm/drm_atomic_helper.h>
+            #include <linux/version.h>
             #if !defined(CONFIG_DRM) && !defined(CONFIG_DRM_MODULE)
             #error DRM not enabled
             #endif
@@ -2146,8 +2147,12 @@ compile_test() {
                 /* 2014-12-18 88a48e297b3a3bac6022c03babfb038f1a886cea */
                 i = DRIVER_ATOMIC;
 
+                #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
                 /* 2015-04-10 df63b9994eaf942afcdb946d27a28661d7dfbf2a */
                 for_each_crtc_in_state(s, c, cs, i) { }
+                #else
+                for_each_new_crtc_in_state(s, c, cs, i) {}
+                #endif
 
                 /* 2015-05-18 036ef5733ba433760a3512bb5f7a155946e2df05 */
                 a = offsetof(struct drm_mode_config_funcs, atomic_state_alloc);
diff --git a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-connector.c b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-connector.c
index cf16b6f..a66ae5a 100644
--- a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-connector.c
+++ b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-connector.c
@@ -33,6 +33,7 @@
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
+#include <linux/version.h>
 
 static void nv_drm_connector_destroy(struct drm_connector *connector)
 {
@@ -87,7 +88,11 @@ static enum drm_connector_status __nv_drm_connector_detect_internal(
             break;
         }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         encoder = drm_encoder_find(dev, connector->encoder_ids[i]);
+#else
+        encoder = drm_encoder_find(dev, NULL, connector->encoder_ids[i]);
+#endif
 
         if (encoder == NULL) {
             BUG_ON(encoder != NULL);
diff --git a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-crtc.c b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-crtc.c
index b54128a..d820dc2 100644
--- a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-crtc.c
+++ b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-crtc.c
@@ -37,6 +37,7 @@
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
+#include <linux/version.h>
 
 static const u32 nv_default_supported_plane_drm_formats[] = {
     DRM_FORMAT_ARGB1555,
@@ -141,7 +142,11 @@ static int nv_drm_plane_atomic_check(struct drm_plane *plane,
         goto done;
     }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     for_each_crtc_in_state(plane_state->state, crtc, crtc_state, i) {
+#else
+    for_each_new_crtc_in_state(plane_state->state, crtc, crtc_state, i) {
+#endif
         struct nv_drm_crtc_state *nv_crtc_state = to_nv_crtc_state(crtc_state);
         struct NvKmsKapiHeadRequestedConfig *head_req_config =
             &nv_crtc_state->req_config;
@@ -365,7 +370,11 @@ static int nv_drm_crtc_atomic_check(struct drm_crtc *crtc,
 
         req_config->flags.displaysChanged = NV_TRUE;
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         for_each_connector_in_state(crtc_state->state,
+#else
+        for_each_new_connector_in_state(crtc_state->state,
+#endif
                                     connector, connector_state, j) {
             if (connector_state->crtc != crtc) {
                 continue;
@@ -613,7 +622,11 @@ int nv_drm_get_crtc_crc32_ioctl(struct drm_device *dev,
         goto done;
     }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     crtc = drm_crtc_find(dev, params->crtc_id);
+#else
+    crtc = drm_crtc_find(dev, NULL, params->crtc_id);
+#endif
     if (!crtc) {
         ret = -ENOENT;
         goto done;
diff --git a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-modeset.c b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-modeset.c
index da15d89..91f64ea 100644
--- a/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-modeset.c
+++ b/NVIDIA-Linux-x86_64-390.25-no-compat32/kernel/nvidia-drm/nvidia-drm-modeset.c
@@ -33,6 +33,7 @@
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>
+#include <linux/version.h>
 
 struct nv_drm_atomic_state {
     struct NvKmsKapiRequestedModeSetConfig config;
@@ -110,7 +111,11 @@ nv_drm_atomic_apply_modeset_config(struct drm_device *dev,
     memset(requested_config, 0, sizeof(*requested_config));
 
     /* Loop over affected crtcs and construct NvKmsKapiRequestedModeSetConfig */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     for_each_crtc_in_state(state, crtc, crtc_state, i) {
+#else
+    for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+#endif
         /*
          * When commiting a state, the new state is already stored in
          * crtc->state. When checking a proposed state, the proposed state is
@@ -178,7 +183,11 @@ void nv_drm_atomic_helper_commit_tail(struct drm_atomic_state *state)
          nv_drm_write_combine_flush();
     }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     for_each_crtc_in_state(state, crtc, crtc_state, i) {
+#else
+    for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+#endif
         struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
         struct nv_drm_crtc_state *nv_crtc_state = to_nv_crtc_state(crtc->state);
         struct nv_drm_flip *nv_flip = nv_crtc_state->nv_flip;
@@ -282,7 +291,11 @@ static void nv_drm_atomic_commit_task_callback(struct work_struct *work)
             ret);
     }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     for_each_crtc_in_state(state, crtc, crtc_state, i) {
+#else
+    for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+#endif
         struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
 
         if (wait_event_timeout(
@@ -351,7 +364,11 @@ static int nv_drm_atomic_commit_internal(
          * condition between two/more nvKms->applyModeSetConfig() on single
          * crtc and generate flip events in correct order.
          */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         for_each_crtc_in_state(state, crtc, crtc_state, i) {
+#else
+        for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+#endif
             struct nv_drm_device *nv_dev = to_nv_device(dev);
             struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
 
@@ -372,7 +389,11 @@ static int nv_drm_atomic_commit_internal(
             }
         }
     } else {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         for_each_crtc_in_state(state, crtc, crtc_state, i) {
+#else
+        for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+#endif
             struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
 
             if (atomic_read(&nv_crtc->has_pending_commit) ||
@@ -388,7 +409,11 @@ static int nv_drm_atomic_commit_internal(
      * flip events.
      */
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     for_each_crtc_in_state(state, crtc, crtc_state, i) {
+#else
+    for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+#endif
         struct nv_drm_crtc *nv_crtc = to_nv_crtc(crtc);
 
         atomic_set(&nv_crtc->has_pending_commit, true);

Edit:
cleaned up accidental deletion of blank line

Last edited by loqs (2018-02-08 21:12:00)

rob-tech · 2018-02-10 00:43:37

Well, I guess the patched nvidia 387 with 4.15 kernel was good while it lasted as after today's 4.15.2 update the system broke with flashing text, and even xorg would not start. I had to arch-chroot from arch install media in order to install linux-lts. I also had to remove the pacman ignorepkg entries and perform a full system upgrade with the corresponding nvidia-lts driver (as dependencies were not satisfied and it was easier to restore the system this way, then to search archive for old nvidia and operate with no GUI) . After mkinitcpio was ran and grub was updated the system would boot correctly with both 4.15 and 4.14 kernels so the patched driver no longer works with the newest kernel, and the performance problems are back.

On the bright side at least I can use Windows 10 Pro in the meantime and now I know how to recover the system from the worst possible regressions. I'll revisit this when nvidia updates the driver.

loqs · 2018-02-10 00:47:00

When you build a custom module you need to rebuild it whenever the kernel ABI changes, so if the official nvidia package has been rebuilt as it was for linux 4.15.2-2 you need to do a rebuild as well.

rob-tech · 2018-02-10 01:57:56

You are right, my mistake. Since this is not a dkms module, does this mean that I have to rebuild a new pacman package from the PKGBUILD each time there is an upgrade?

cirrus9 · 2018-02-10 02:24:14

The latest updates (nvidia 390.25-8) and (linux 4.15.2-2) are working without any problems for me so far. So, you might want to try them rob-tech.

matte3560 · 2018-02-10 10:13:15

I'm still seeing worse performance with nvidia 390.25-8 on linux 4.15.2-2 compared to nvidia 387.34 with the patch posted on page 1.

loqs · 2018-02-10 12:28:05

Can you try 390.25-9 please.

matte3560 · 2018-02-10 12:35:34

390.25-9 made no difference for me. I'm still seeing worse performance, and nv_flush_cache is sitting between 20-30% in perf top when I have a busy browser window open.

Tom B · 2018-02-10 16:55:49

Unfortunately 390.25-9 makes no difference for me either. Since it affects Mint and Ubuntu as well according to the threads over at nvidia devtalk, I'm not sure it's something that can be fixed by anyone other than nvidia.

loqs · 2018-02-10 17:05:01

Tom B wrote:

Unfortunately 390.25-9 makes no difference for me either. Since it affects Mint and Ubuntu as well according to the threads over at nvidia devtalk, I'm not sure it's something that can be fixed by anyone other than nvidia.

Looking at the threads over on nvidia forum only one affected user seems to have posted the output from nvidia-bug-report. This system using 390.25 is not affected by the issue.

Tom B · 2018-02-10 17:13:01

I've uploaded my nvidia-bug-report.log.gz here: https://r.je/nvidia-bug-report.log.gz I'll paste the link on the nvidia forums as well.

rob-tech · 2018-02-10 17:22:26

No improvement here for me also on the latest packages, I'll upload my bug report on nvidia when I have a moment as i'm certain only they can fix it.

hrkristian · 2018-02-11 17:17:55

How did this even make it out of testing repos when the regression is discussed there? Boggles the mind.

Anyone else been met with issues following a downgrade to 387.xx, by the way?

Omar007 · 2018-02-11 17:35:02

hrkristian wrote:

How did this even make it out of testing repos when the regression is discussed there?

Not everyone is affected by this. I've been on 390.25 since it hit the testing repos and haven't had or noticed any problem other than the missing /sys/class/drm/card0-* entries.
And that only prevented me from running/selecting the Gnome on Wayland session, which I wasn't using anyway since even if I can select it, it still falls back to llvmpipe (which is a whole other problem that existed long before 390.25; https://bugs.archlinux.org/task/53284)

Last edited by Omar007 (2018-02-12 14:06:53)

V1del · 2018-02-11 18:57:11

I'm also largely unaffected from what I can tell (on a Nvidia Titan X), another thing that might come into play is that if you used a kernel earlier than 4.15.2-2 there were a lot of config options missing that do contribute to performance, so it might've potentially been kernel config related slowdown. Though I see that there are a few people reporting they have the issue with these versions as well, so your mileage may vary.

Might also be helpful if this can be traced back to certain configuration, e.g. the changelog mentions changes in regards to the CompositionPipeline, so if you used that option, check if disabling it improves performance so maybe the culprit can be narrowed down.

Last edited by V1del (2018-02-11 18:58:58)

opacalumen · 2018-02-11 23:29:56

Removing the ForceFullCompositionPipeline option from my conf file in /etc/X11/xorg.conf.d seems to have dramatically helped for me.

These are the lines I commented out:

#    Option	   "metamodes" "nvidia-auto-select +0+0 { ForceFullCompositionPipeline = On }"
#    Option	   "AllowIndirectGLXProtocol" "off"
#    Option	   "TripleBuffer" "on"

Arch Linux

#26 2018-02-08 14:59:09

Re: Terrible performance regression with Nvidia 390.25 driver

#27 2018-02-08 15:11:33

Re: Terrible performance regression with Nvidia 390.25 driver

#28 2018-02-08 15:19:47

Re: Terrible performance regression with Nvidia 390.25 driver

#29 2018-02-08 15:57:44

Re: Terrible performance regression with Nvidia 390.25 driver

#30 2018-02-08 16:37:25

Re: Terrible performance regression with Nvidia 390.25 driver

#31 2018-02-08 17:39:12

Re: Terrible performance regression with Nvidia 390.25 driver

#32 2018-02-08 17:51:06

Re: Terrible performance regression with Nvidia 390.25 driver

#33 2018-02-08 18:55:59

Re: Terrible performance regression with Nvidia 390.25 driver

#34 2018-02-08 20:11:37

Re: Terrible performance regression with Nvidia 390.25 driver

#35 2018-02-08 20:44:37

Re: Terrible performance regression with Nvidia 390.25 driver

#36 2018-02-10 00:43:37

Re: Terrible performance regression with Nvidia 390.25 driver

#37 2018-02-10 00:47:00

Re: Terrible performance regression with Nvidia 390.25 driver

#38 2018-02-10 01:57:56

Re: Terrible performance regression with Nvidia 390.25 driver

#39 2018-02-10 02:24:14

Re: Terrible performance regression with Nvidia 390.25 driver

#40 2018-02-10 10:13:15

Re: Terrible performance regression with Nvidia 390.25 driver

#41 2018-02-10 12:28:05

Re: Terrible performance regression with Nvidia 390.25 driver

#42 2018-02-10 12:35:34

Re: Terrible performance regression with Nvidia 390.25 driver

#43 2018-02-10 16:55:49

Re: Terrible performance regression with Nvidia 390.25 driver

#44 2018-02-10 17:05:01

Re: Terrible performance regression with Nvidia 390.25 driver

#45 2018-02-10 17:13:01

Re: Terrible performance regression with Nvidia 390.25 driver

#46 2018-02-10 17:22:26

Re: Terrible performance regression with Nvidia 390.25 driver

#47 2018-02-11 17:17:55

Re: Terrible performance regression with Nvidia 390.25 driver

#48 2018-02-11 17:35:02

Re: Terrible performance regression with Nvidia 390.25 driver

#49 2018-02-11 18:57:11

Re: Terrible performance regression with Nvidia 390.25 driver

#50 2018-02-11 23:29:56

Re: Terrible performance regression with Nvidia 390.25 driver

Board footer