You are not logged in.
@gregfrankenstein the only thing that I would consider missing is from #46 if someone could get a dmesg from a boot of hanging configuration.
The arch package maintainers tend to wait on upstream and there now seems to be a boot time workaround of specifying the clocksource.
Offline
I have almost finished bisecting and it looks like the commit nic3-14159 found is really the one to blame, I can see that I am near that commit with "git bisect visualize".
I believe that the kernel maintainers will ask if the issue is still present in v4.19-rc1 (or whatever the first rc-release will be called), which I believe will be released in a few hours if everything is going as usually. I am going to test it (probably not today, but tomorrow), and also test it with the commit reverted if the problem is still present.
@loqs: I haven't managed to get a dmesg because it stalls very early in the boot process. When I have finished bisecting, I will try to get a rescue shell, but from my current understanding booting with "break=postmount" will not work because no filesystem is mounted at the point where is stalls. Maybe I am not able to explain the problem clearly. I will summarize in a later post what information I am able to give.
@nic3-14159: If you are able to get a dmesg, what did you do to achieve that?
Offline
Thank you mates,
my output from this
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
was
tsc hpet acpi_pm
tsc did NOT work
hpet works
acpi_pm works
So I've added just "clocksource=hpet" to GRUB_CMDLINE_LINUX and now detects both cores and works fine.
Would it be better using acpi_pm???
How can I know which one (hpet or acpi_pm) performs better in my laptop? Which test can I make in order to know it?
Offline
The best is tsc if it works, hpet is slower and acpi_pm is the worst of the three.
https://access.redhat.com/documentation … mestamping
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
Thank you progandy!! hpet is going to stay set until the issue gets solved.
Offline
4.18.5.arch1-1 doesn't work for me :-(
I had the same problem on an older HP Compaq desktop with Antergos. I installed the kernel, and then i was able to boot it.
Offline
@nic3-14159: If you are able to get a dmesg, what did you do to achieve that?
You have to increase the verbosity of the kernel boot and then log it.
netconsole might work, but somehow I think this problem occurs so early that only a serial console can capture the output.
Edit: If you are lucky, then the relevant data is visible on the frozen boot screen and you can manually copy it without setting up a second system to log the data.
Last edited by progandy (2018-08-26 23:42:16)
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
@loqs: Would this be enough to report it upstream?
I used "earlyprintk=vga debug break=postmount" (without quiet) as kernel boot parameters
and it got me the following output, but it stalls so early in the boot process there is no dmesg
or any sort of log or /new_root mount or any mounts at all (netconsole etc I did not use).
It takes about 20 - 30 seconds of stalling with the above mentioned kernel boot parameters before more output happens.
---
INFO: rcu_preempt detected stalls on CPUs/tasks:
o0-...!: (0 ticks this GP) idle=608/0/0 softirq=21/21 fqs=0 last_accelerate: e800/e800, non-lazy_posted: 549, ..
o(detected by 1, t=18220 jiffies, g=- 284, c=- 285, q=236)
Sending NMI from CPU1 to CPUs0:
NMI backtrace for CPU0 skipped: idling at acpi_processor_ffh_cstate_enter+0x67/0xb0
rcu_preempt kthread starved for 18220 jiffies! g18446744073709551332 c18446744073709551332 f0x0 RCU_GP_WAIT_FQS(3) -> state=0x402 -> cpu=0
RCU grace-period kthread stack dump:
rcu_preempt I 0 10 2 0x80000000
Call Trace:
? __schedule+0x29b/0x8b0
schedule+0x32/0x90
schedule_timeout+0x1d1/0x4a0
? collect_expired_timers+0xa0/0xa0
rcu_gp_kthread+0x43e/0x950
? synchronize_rcu_expedited+0x30/0x30
kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x35/0x40
---
The "boot_delay=[in Milliseconds to delay each printk]" kernel boot parameter might further help to get more output or capture it from the beginning.
Last edited by NiceGuy (2018-08-27 07:46:53)
Offline
I checked the photos I was able to take on the frozen boot screen, an I can confirm that the call trace section that @NiceGuy posted is the same as what I got. The part before had some different numbers, but the messages were of the same structure
Offline
Then I report it upstream to the bug tracker of the kernel later today.
Thanks for the confirmation @nic3-14159.
Last edited by NiceGuy (2018-08-27 08:31:04)
Offline
Then I report it upstream to the bug tracker of the kernel later today.
The mailing list might be faster than https://bugzilla.kernel.org/buglist.cgi … _based_on= but the subsystem does not appear to require use of the mailing list.
Offline
I just filed this bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=200957
My bisection lead to the same first bad commit which @nic3-14159 found, and I also found that 4.19-rc1 is affected.
Offline
I just filed this bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=200957My bisection lead to the same first bad commit which @nic3-14159 found, and I also found that 4.19-rc1 is affected.
It took me longer than I thought, had a busy day and struggled with kernel bugzilla issues not letting me finish and upload.
We therefore have now another bug report open as duplicate:
https://bugzilla.kernel.org/show_bug.cgi?id=200959
Additionally, I self-built kernel 4.17.19, which has no issues and boots just fine (based on the default Archlinux kernel 4.17.14 config).
Anyway, hopefully this gets fixed soon. :-)
Last edited by NiceGuy (2018-08-28 17:08:38)
Offline
Does anyone know if this is affecting any other distributions? I've looked around on google, but I haven't found any mention of it.
Offline
Does anyone know if this is affecting any other distributions? I've looked around on google, but I haven't found any mention of it.
Are there any non-rolling distributions shipping with kernel 4.18 yet? I guess a rolling distribution on core 2 is not too common. By the way, my i3 machine is still on 4.17.12, I have yet to reboot for the kernel update to complete.
Last edited by progandy (2018-08-28 17:41:27)
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
Does anyone know if this is affecting any other distributions?
I took a look at Fedoras Bugzilla a few days ago, but couldn't find anything like this. I have a parallel installation of Debian testing (with kernels from unstable) and tested the 4.18-rc4 kernel from experimental, but I only booted once with it and had no problem. Maybe I have to boot more often with it to reproduce it under Debian like other users here, but the 4.18 kernel is not yet even in unstable, so only very few Debian users could have been hit by this.
Are there any non-rolling distributions shipping with kernel 4.18 yet?
According to Fedoras package tracker the 4.18 kernel has not been packaged for their 28 release which is the latest one, but it will certainly be packaged soon because they are following the stable releases as far as I know. And I believe that Ubuntu will have the 4.18 kernel in their next 18.10 release. So Fedora users will be the first ones who could have this problem if I consider only well-known non-rolling distributions.
Offline
Phoronix has us covered with an article. \o/
https://phoronix.com/scan.php?page=news … -CPU-Issue
That should help us to bring awareness to this issue for other distributions as well.
Offline
I'm still a bit confused - from all the information I'm seeing, this seems to be consitent among core2 users. But I'm running a Lenovo X200 core2 and have rebooted into the 4.18 kernel countless times without a single issue.
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
Arch Linux News still has nothing about this issue
Offline
I'm still a bit confused - from all the information I'm seeing, this seems to be consitent among core2 users. But I'm running a Lenovo X200 core2 and have rebooted into the 4.18 kernel countless times without a single issue.
What are the command line options for linux in your boot manager? And what is the output of the command "nproc"?
Last edited by nic3-14159 (2018-08-29 01:09:30)
Offline
Arch Linux News still has nothing about this issue
Sort of a head scratcher to me as well.
Seems like relevant information . Perhaps because it only impacts a relatively smaller group of users with older hardware, it doesn't meet the criteria for an entry there.
_________________________
Asus X200CA Notebook
Offline
peng wrote:Arch Linux News still has nothing about this issue
Sort of a head scratcher to me as well.
Seems like relevant information . Perhaps because it only impacts a relatively smaller group of users with older hardware, it doesn't meet the criteria for an entry there.
It doesn't really surprise me. The front page is usually stuff that affects every user of a package or library and requires manual intervention during a pacman update. An upstream kernel regression on very specific hardware may not fit with that.
Offline
What are the command line options for linux in your boot manager? And what is the output of the command "nproc"?
$ cat /proc/cmdline
BOOT_IMAGE=../vmlinuz-linux root=/dev/sda2 rw quiet acpi_osi=Linux initrd=../intel-ucode.img,../initramfs-linux.img
$ nproc
2
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
nic3-14159 wrote:What are the command line options for linux in your boot manager? And what is the output of the command "nproc"?
$ cat /proc/cmdline BOOT_IMAGE=../vmlinuz-linux root=/dev/sda2 rw quiet acpi_osi=Linux initrd=../intel-ucode.img,../initramfs-linux.img $ nproc 2
Hmm... Nothing I know of looks like it would affect anything, but I guess if your system works, it works.
Offline
I can boot on the kernel 4.18.5 if I add nosmp to the boot parameters, however, my laptop only recognizes 1 cpu when running nproc. I also get a message about my bios being broken (libreboot), but I think I have been getting this message since changing over to libreboot. When I tried using the acpi=off boot parameter, the laptop boots, but I cannot log in because the keyboard stops working.
Last edited by MrLinuxFish (2018-08-29 07:08:02)
Offline