KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

gen2arch · 2018-02-02 12:22:46

After an upgrade of system and kernel (custom kernel 4.15) I can no longer boot into my system: black screen; using the arch generic kernel 4.14.15 works though.

I thought this might be related to the harfbuzz problem, but even after downgrading harfbuzz, the problem remains.

Anyone knows what may cause this.

thanks

gummo

Last edited by gen2arch (2018-02-03 05:12:36)

Trilby · 2018-02-02 12:25:35

gen2arch wrote:

Anyone knows what may cause this.

Yeah: your custom kernel. As you've told us absolutely nothing about what you've customized in that kernel, there's really nothing else that could be done here.

Please modify your title to indicate the problem is with your own kernel.

EDIT: I just realized you downgraded to a different version of the stock kernel. The obvious test would be to check whether the stock build of 4.15 works or not.

Last edited by Trilby (2018-02-02 12:26:54)

pb · 2018-02-02 13:51:13

gen2arch wrote:

(custom kernel 4.15)

I have custom kernel 4.15 too and Plasma 5 (5.12 beta, but it's not important) and everything works ok (with the other kernels: linux, linux-pf, linux-zen everythink works ok, too). Then it's probably your fault in building yours custom kernel.

gen2arch · 2018-02-03 05:08:46

Trilby wrote:

gen2arch wrote:
Anyone knows what may cause this.
Yeah: your custom kernel. As you've told us absolutely nothing about what you've customized in that kernel, there's really nothing else that could be done here.
Please modify your title to indicate the problem is with your own kernel.
EDIT: I just realized you downgraded to a different version of the stock kernel. The obvious test would be to check whether the stock build of 4.15 works or not.

I check out the kernel via asp (asp export linux); I apply graysky's patch (https://github.com/graysky2/kernel_gcc_patch) for better processor support and I use the modprobd-db mechanism and localmodconfig build target to build the kernel.

Within these nothing changed recently, so the breakage is somehow related to 4.15.

All this seems to be related also to nvidia(-dkms) and gcc as noted in other threads that seem to hit a similar problem, see

https://bbs.archlinux.org/viewtopic.php?id=234112 and

https://bbs.archlinux.org/viewtopic.php?id=234067

I somehow managed to get my custom kernel back running by enabling the testing repo, which resulted in a newer nvidia driver (390.25).

Unfortunately I cannot give more precise info on what was the original problem or why it works now.

thanks

gummo

loqs · 2018-02-03 08:37:46

Did you check the DKMS output from the build of nvidia 387.34 on 4.15? I would have expected it to fail without a patch.
You should be able to see in the journal from the bad boots that the nvidia modules were not loaded. If you install the kernel using pacman would also expect the DKMS failure to be recorded there.
You would have needed something like the following to make 387.34 work with 4.15

diff --git a/kernel/nvidia-modeset/nvidia-modeset-linux.c b/kernel/nvidia-modeset/nvidia-modeset-linux.c
index edeb152..cd0ce2b 100644
--- a/kernel/nvidia-modeset/nvidia-modeset-linux.c
+++ b/kernel/nvidia-modeset/nvidia-modeset-linux.c
@@ -21,6 +21,7 @@
 #include <linux/random.h>
 #include <linux/file.h>
 #include <linux/list.h>
+#include <linux/version.h>
 
 #include "nvstatus.h"
 
@@ -566,9 +567,17 @@ static void nvkms_queue_work(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
     WARN_ON(!ret);
 }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
 static void nvkms_timer_callback(unsigned long arg)
+#else
+static void nvkms_timer_callback(struct timer_list * t)
+#endif
 {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     struct nvkms_timer_t *timer = (struct nvkms_timer_t *) arg;
+#else
+    struct nvkms_timer_t *timer = from_timer(timer, t, kernel_timer);
+#endif
 
     /* In softirq context, so schedule nvkms_kthread_q_callback(). */
     nvkms_queue_work(&nvkms_kthread_q, &timer->nv_kthread_q_item);
@@ -606,10 +615,16 @@ nvkms_init_timer(struct nvkms_timer_t *timer, nvkms_timer_proc_t *proc,
         timer->kernel_timer_created = NV_FALSE;
         nvkms_queue_work(&nvkms_kthread_q, &timer->nv_kthread_q_item);
     } else {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         init_timer(&timer->kernel_timer);
+#else
+        timer_setup(&timer->kernel_timer,nvkms_timer_callback,0);
+#endif
         timer->kernel_timer_created = NV_TRUE;
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         timer->kernel_timer.function = nvkms_timer_callback;
         timer->kernel_timer.data = (unsigned long) timer;
+#endif
         mod_timer(&timer->kernel_timer, jiffies + NVKMS_USECS_TO_JIFFIES(usec));
     }
     spin_unlock_irqrestore(&nvkms_timers.lock, flags);
diff --git a/kernel/nvidia/nv.c b/kernel/nvidia/nv.c
index ad5091b..a469bf9 100644
--- a/kernel/nvidia/nv.c
+++ b/kernel/nvidia/nv.c
@@ -320,7 +320,11 @@ static irqreturn_t   nvidia_isr             (int, void *, struct pt_regs *);
 #else
 static irqreturn_t   nvidia_isr             (int, void *);
 #endif
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
 static void          nvidia_rc_timer        (unsigned long);
+#else
+static void          nvidia_rc_timer        (struct timer_list *);
+#endif
 
 static int           nvidia_ctl_open        (struct inode *, struct file *);
 static int           nvidia_ctl_close       (struct inode *, struct file *);
@@ -2472,10 +2476,18 @@ nvidia_isr_bh_unlocked(
 
 static void
 nvidia_rc_timer(
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     unsigned long data
+#else
+    struct timer_list * t
+#endif
 )
 {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     nv_linux_state_t *nvl = (nv_linux_state_t *) data;
+#else
+    nv_linux_state_t *nvl = from_timer(nvl, t, rc_timer);
+#endif
     nv_state_t *nv = NV_STATE_PTR(nvl);
     nvidia_stack_t *sp = nvl->sp[NV_DEV_STACK_TIMER];
 
@@ -3386,9 +3398,13 @@ int NV_API_CALL nv_start_rc_timer(
         return -1;
 
     nv_printf(NV_DBG_INFO, "NVRM: initializing rc timer\n");
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     init_timer(&nvl->rc_timer);
     nvl->rc_timer.function = nvidia_rc_timer;
     nvl->rc_timer.data = (unsigned long) nvl;
+#else
+    timer_setup(&nvl->rc_timer,nvidia_rc_timer,0);
+#endif
     nv->rc_timer_enabled = 1;
     mod_timer(&nvl->rc_timer, jiffies + HZ); /* set our timeout for 1 second */
     nv_printf(NV_DBG_INFO, "NVRM: rc timer initialized\n");

Last edited by loqs (2018-02-03 08:44:37)

gen2arch · 2018-02-03 11:47:39

loqs wrote:

Did you check the DKMS output from the build of nvidia 387.34 on 4.15? I would have expected it to fail without a patch.
You should be able to see in the journal from the bad boots that the nvidia modules were not loaded. If you install the kernel using pacman would also expect the DKMS failure to be recorded there.
You would have needed something like the following to make 387.34 work with 4.15

diff --git a/kernel/nvidia-modeset/nvidia-modeset-linux.c b/kernel/nvidia-modeset/nvidia-modeset-linux.c
index edeb152..cd0ce2b 100644
--- a/kernel/nvidia-modeset/nvidia-modeset-linux.c
+++ b/kernel/nvidia-modeset/nvidia-modeset-linux.c
@@ -21,6 +21,7 @@
 #include <linux/random.h>
 #include <linux/file.h>
 #include <linux/list.h>
+#include <linux/version.h>
 
 #include "nvstatus.h"
 
@@ -566,9 +567,17 @@ static void nvkms_queue_work(nv_kthread_q_t *q, nv_kthread_q_item_t *q_item)
     WARN_ON(!ret);
 }
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
 static void nvkms_timer_callback(unsigned long arg)
+#else
+static void nvkms_timer_callback(struct timer_list * t)
+#endif
 {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     struct nvkms_timer_t *timer = (struct nvkms_timer_t *) arg;
+#else
+    struct nvkms_timer_t *timer = from_timer(timer, t, kernel_timer);
+#endif
 
     /* In softirq context, so schedule nvkms_kthread_q_callback(). */
     nvkms_queue_work(&nvkms_kthread_q, &timer->nv_kthread_q_item);
@@ -606,10 +615,16 @@ nvkms_init_timer(struct nvkms_timer_t *timer, nvkms_timer_proc_t *proc,
         timer->kernel_timer_created = NV_FALSE;
         nvkms_queue_work(&nvkms_kthread_q, &timer->nv_kthread_q_item);
     } else {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         init_timer(&timer->kernel_timer);
+#else
+        timer_setup(&timer->kernel_timer,nvkms_timer_callback,0);
+#endif
         timer->kernel_timer_created = NV_TRUE;
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
         timer->kernel_timer.function = nvkms_timer_callback;
         timer->kernel_timer.data = (unsigned long) timer;
+#endif
         mod_timer(&timer->kernel_timer, jiffies + NVKMS_USECS_TO_JIFFIES(usec));
     }
     spin_unlock_irqrestore(&nvkms_timers.lock, flags);
diff --git a/kernel/nvidia/nv.c b/kernel/nvidia/nv.c
index ad5091b..a469bf9 100644
--- a/kernel/nvidia/nv.c
+++ b/kernel/nvidia/nv.c
@@ -320,7 +320,11 @@ static irqreturn_t   nvidia_isr             (int, void *, struct pt_regs *);
 #else
 static irqreturn_t   nvidia_isr             (int, void *);
 #endif
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
 static void          nvidia_rc_timer        (unsigned long);
+#else
+static void          nvidia_rc_timer        (struct timer_list *);
+#endif
 
 static int           nvidia_ctl_open        (struct inode *, struct file *);
 static int           nvidia_ctl_close       (struct inode *, struct file *);
@@ -2472,10 +2476,18 @@ nvidia_isr_bh_unlocked(
 
 static void
 nvidia_rc_timer(
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     unsigned long data
+#else
+    struct timer_list * t
+#endif
 )
 {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     nv_linux_state_t *nvl = (nv_linux_state_t *) data;
+#else
+    nv_linux_state_t *nvl = from_timer(nvl, t, rc_timer);
+#endif
     nv_state_t *nv = NV_STATE_PTR(nvl);
     nvidia_stack_t *sp = nvl->sp[NV_DEV_STACK_TIMER];
 
@@ -3386,9 +3398,13 @@ int NV_API_CALL nv_start_rc_timer(
         return -1;
 
     nv_printf(NV_DBG_INFO, "NVRM: initializing rc timer\n");
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
     init_timer(&nvl->rc_timer);
     nvl->rc_timer.function = nvidia_rc_timer;
     nvl->rc_timer.data = (unsigned long) nvl;
+#else
+    timer_setup(&nvl->rc_timer,nvidia_rc_timer,0);
+#endif
     nv->rc_timer_enabled = 1;
     mod_timer(&nvl->rc_timer, jiffies + HZ); /* set our timeout for 1 second */
     nv_printf(NV_DBG_INFO, "NVRM: rc timer initialized\n");

Thanks loqs! that is really valuable info!

Good to know that in fact it the combo 387.34 on 4.15 was doomed to fail; this narrows down the problem.

In fact there was an error message, saying (I paraphrase): "Error! There is no instance of nvidia XX for kernel XX located in the DKMS tree", but then, in the log file that dkms said to look at, the error was said to be something completely different, namely: a compiler mismatch between 7.21 (compilation of the running kernel) and 7.3. (actual dkms compilation of the modules).

But I am almost 100% sure that this was not the case as I compiled also 4.15 with gcc 7.3!

On the other hand: in spite of this error, the kernel seems to have been built nevertheless.

And I'm not sure if this compiler mismatch was the original cause of the whole problem.

thanks

gummo

EDIT: Your are right: looking at bad boots via journalctl --list-boots and journalctl -b I see that the nvidia module isn't even loaded.

Last edited by gen2arch (2018-02-03 12:50:26)

loqs · 2018-02-03 12:18:51

Were multiple kernels installed at the time such as linux-lts 4.14.16? As in such a case the 4.15 module build fails writes a log, then the 4.14.16 module build fails and overwrites the previous log.

gen2arch · 2018-02-03 12:55:22

loqs wrote:

Were multiple kernels installed at the time such as linux-lts 4.14.16? As in such a case the 4.15 module build fails writes a log, then the 4.14.16 module build fails and overwrites the previous log.

I see.

Yes, absolutely: I have three kernels installed at the same time: my custom kernel, the stock arch kernel and the stock arch lts kernel.

thanks

gummo

E5ten · 2018-02-06 16:56:19

I have the same issue with nouveau, after upgrading to 4.15 this morning.

loqs · 2018-02-07 14:50:10

How it could have the same cause as nouveau is a built in module? If it was the same problem then the same solution would resolve it.

Arch Linux

#1 2018-02-02 12:22:46

KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#2 2018-02-02 12:25:35

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#3 2018-02-02 13:51:13

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#4 2018-02-03 05:08:46

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#5 2018-02-03 08:37:46

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#6 2018-02-03 11:47:39

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#7 2018-02-03 12:18:51

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#8 2018-02-03 12:55:22

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#9 2018-02-06 16:56:19

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

#10 2018-02-07 14:50:10

Re: KDE problems with (custom)kernel 4.15, nvidia-dkms 387.34, gcc 7.3

Board footer