You are not logged in.
Can confirm that latest linux-4.18.3 kernel didn't help. It actually seems to be worse. Now when I boot with "nosmp" -- only see one CPU (used to be able to see both) and NetworkManager doesn't connect. I once was able to actually boot with the new kernel, but another bug with nouveau blanked my screen. On next reboot back to stuck systemd. Downgraded to last pre-4.18 kernel I have in my cache and everything back to normal.
With so many people having this issue -- strange that we still don't know the root cause...
Offline
romstor and the journal has nothing for any of the failed boots?
If someone wants to bisect between 4.18 and 4.17 that should find the cause.
Offline
Yesterday I tried to boot linux-4.18.3 without "quiet" and an additional kernel cmdline boot parameter "earlyprintk=vga".
There was a message about rcu_preempt and some sort of race.If I'm not mistaken it's related to real-time ... and the RCU subsystem of the kernel (?)
Today I can't reproduce with the exact message I saw yesterday, but thought I post here anyway.
By reproduce I mean with linux-4.18.3 with the normal and the fallback initramfs.
Normal initramfs will not boot and stall without printing additional messages and fallback just boots normally - both without quiet option.If I may: I thought about using an additional boot parameter to get systemd to print more helpful insight in what's going on taken from:
*) https://www.freedesktop.org/software/sy … -line.html
*) https://www.kernel.org/doc/html/latest/ … eters.htmlMaybe "debug", or "--log-level=debug" shows more output. I'll test it and report back, until someone beats me too it.
Reporting back: additional kernel boot parameter "earlyprintk=vga" and "debug" did the trick.
I got stuck, but was able to get a more verbose output of why the stalling is happening.
"INFO: rcu_preempt detected stalls on CPUs/tasks:
...
sending NMI from CPU1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at acpi_processor_ffh_cstate_enter
rcu_sched kthread starved for 55296 jiffies!
RCU grace-period kthread stack dump
Call Trace ..."
What's strange is, I got to see the stack dumps / traces only once with "earlyprintk=vga debug" (without quiet). Repeatedly saw less verbose output after the second boot and I tried it several times. If I just use "debug", then linux-4.18.3 boots up fine in debug mode (several times).
I wouldn't mind opening my first kernel upstream bug report, but I don't now what to provide and how to reproduce it with meaningful output.
Can someone confirm that linux-4.18.3 with the normal initramfs with "earlyprintk=vga debug" (without quiet) produces any relevant output which can be used to report it upstream?
Last edited by NiceGuy (2018-08-21 17:43:36)
Offline
https://www.kernel.org/doc/html/latest/ … -bugs.html covers reporting bugs you could try the http://vger.kernel.org/vger-lists.html#linux-acpi list or the main http://vger.kernel.org/vger-lists.html#linux-kernel
Without a bisection or full backtrace it may not get much attention or be actionable.
Offline
romstor and the journal has nothing for any of the failed boots?
I have the same error messages as OP. Couldn't recover those from journalctl or /dev/vcs1, only have screenshots. There was a post earlier in this thread with a link to the kernel commit. I searched the diffs for "smp" and there were clearly changes in the code that we under that feature flag. This is as far as I can go in terms of figuring this out, but the issue definitely stems from kernel changes.
Last edited by romstor (2018-08-21 18:04:02)
Offline
I booted with "debug" and without "quiet" and got a "Call Trace", althouth I don't know what that means. I took pictures, and I also believe that bisecting would help the kernel developers very much. I will try to do this, but I don't promise that I will have a result in the next few days (and building on my Core 2 Duo isn't really fun). If anyone else wants to bisect this too, you are welcome.
Offline
Before reporting it upstream could everyone affected check the journals is there anything record from the failed boots?
Giving upstream the dmesg from a failed boot will almost certainly help upstream locate the cause of the issue.
My root filesystem (like any other filesystem) isn't mounted at the point where my system stops booting, so the dmesg output isn't written to disk. I don't know if it is possible in principle to build a custom initramfs and use something like /bin/sh from the initramfs as init (not instead of systemd later, really just at the beginning), and then do some steps manually to write a log to disk and then do the "usual" booting steps from the shell to get to the point where the problem occurs. I believe this requires much knowledge about how the initramfs and the boot process works. If someone understands what I am thinking of and knows how to do this, feel free to post this.
Offline
Makepkg#Parallel_compilation will help a bit on 2 cores if you have not already enabled it.
If you do two bisects a day I expect it will take one to two weeks assuming no mistakes are made. Please post if you need any help.
Please also see post #46 which may provide upstream enough to start with.
Offline
I started bisecting, but I am unsure if I am doing it on the right git repository. Should I be doing the linux-git package from the AUR, the one from git.archlinux.org, or directly from the kernel source at kernel.org? Or does it not matter? I am currently doing it from the linux.git on the git.archlinux.org site.
Offline
It probably does not matter as the arch specific patches are applied last and as git commits as well so if you land on one of those do not report it upstream.
Are you following Bisecting_bugs_with_Git or doing a manual kernel build?
Edit:
What did you use as the good and the bad commits and did you use the config from 4.17 or 4.18?
Last edited by loqs (2018-08-22 15:46:38)
Offline
I haven't started "git bisect", but I can confirm that v4.17 is indeed "good" and I found that v4.18-rc1 is "bad", so I will start with these.
I am using the repository at kernel.org, using "make localmodconfig" to speed up the build process and doing the "traditional compilation" without the ABS.
Offline
I am doing a manual kernel build. For the good commit, I used the tag "v4.17.14-arch1", because that number was the version of the last working kernel from pacman, and I used "v4.18.1-arch1" as the bad commit, also because that was the version number of the last working kernel from pacman. For the config, I have been regenerating it with "make localmodconfig", but I think I might start over and just use the one from 4.17.
Offline
Upgraded to linux 4.18.3.arch1-1 and linux-firmware 20180815.f1b95fe-1 it did work in first instance, then after reboot same problem, reverting back to 4.17.14.arch1-1 and linux-firmware 20180717.8d69bab-1
Offline
Upgraded to linux 4.18.3.arch1-1 and linux-firmware 20180815.f1b95fe-1 it did work in first instance, then after reboot same problem, reverting back to 4.17.14.arch1-1 and linux-firmware 20180717.8d69bab-1
If I understand correctly you installed 4.18.3 and it worked until reboot because the new kernel would not be in use until reboot.
Offline
@loqs
No idea, when I've upgraded it did work after I've reboot just to apply all my updates, but today after a cold start it didn't work.
Offline
@loqs
No idea, when I've upgraded it did work after I've reboot just to apply all my updates, but today after a cold start it didn't work.
Yeah I have had this too. Reboots seem to work sometimes, but I can't reliably reproduce it.
Offline
Same problem on Dell E6500 and kernel 4.18.3
Offline
I've been having the same problem with a Core 2 Duo T6500 and kernel 4.18.3. So far I've been able to boot using initrams-fallback as suggested in this post.
Apparently, a 4.18.4 update was pushed to the repos recently. Can anyone confirm whether it is safe to update and if it will override my current initramfs-fallback image?
Offline
@Flacko
4.18.4 doesn't work for me, reverted back to 4.17.14
Offline
4.18.4 doesn't work for me either, but it hasn't made things worse. If you are happy to keep booting the fallback then go for it.
Offline
For anyone else running a git bisect on the kernel, ccache can help speed up build times. It caches the compiler output, so if there are any files that have not been changed between commits, it simply copies the cached compile instead of recompiling. The last kernel I built completed in 10 minutes instead of the usual 37 minutes for me, so hopefully I have a result soon.
Edit: According to this article, you may have to set the CONFIG_LOCALVERSION_AUTO flag to "n" in the .config file in order for ccache to get linux kernel compiles to hit a cached compile.
Last edited by nic3-14159 (2018-08-23 21:43:18)
Offline
Well, I'm impressed with how my lil' T9600 is holding up using only one core (best ten beans I ever spent on eBay) because it's not exactly running like a turtle, but I am pegging it at 100% every now and then. I guess most applications don't optimize for both cores anyway so the general feel is about the same unless I'm doing something heavy. I hope we can figure out which module is breaking Core 2 systems.
Offline
I am running a Duo Core. 4.18.4 doesn't work for me, nor does fallback. Since this problem has arisen, 5 out 6 reboot attempts fail. Usually I get through on the 5th or 6th one. I think I'll just leave the machine running until it's clear a solution has been found.
_________________________
Asus X200CA Notebook
Offline
I am running a Duo Core. 4.18.4 doesn't work for me, nor does fallback. Since this problem has arisen, 5 out 6 reboot attempts fail. Usually I get through on the 5th or 6th one. I think I'll just leave the machine running until it's clear a solution has been found.
That's the spirit!
Seriously though if it doesn't boot even with fallback I would roll back at that point...
Offline
Edit: According to this article, you may have to set the CONFIG_LOCALVERSION_AUTO flag to "n" in the .config file in order for ccache to get linux kernel compiles to hit a cached compile.
Thanks for pointing to ccache! Unfortunately this config option didn't help to reduce the compile time here, but according to this blog post one has to set the kernel build timestamp like this
KBUILD_BUILD_TIMESTAMP='' make -j2
in order to make ccache actually useful for repeated kernel builds. I haven't tested it yet, but I will use both tweaks from now on because I believe both are necessary.
Offline