You are not logged in.
Peter Zijlstra decided that it is the best to just revert the "bad" patch, but with additional comments, see here:
https://lkml.org/lkml/2018/9/5/246
He didn't ask for testing, and I also think that there is no need to test this because the patch just reverts to the behaviour the kernel had before the 4.18 releases. And I remember at least one person here has already tested a kernel with the "bad" commit reverted.
Offline
The patch Peter Zijlstra made did not fix this issue for me. I haven't used patches before, so this all was new to me.
I tried applying the patch on linux-mainline 4.19rc2 from AUR by adding the patch file in the same directory as the PKGBUILD, adding it to the list of source files and updating the checksums. I thought this was enough as the PKGBUILD already has a patch applying section under prepare(). I even checked the patched file (kernel/time/clocksource.c) and the patch seems to have been applied (work_struct is replaced by kthread_work for example).
But the computer stalled just like with linux 4.18. I haven't tried applying the patch on 4.18.
Offline
He didn't ask for testing, and I also think that there is no need to test this because the patch just reverts to the behaviour the kernel had before the 4.18 releases. And I remember at least one person here has already tested a kernel with the "bad" commit reverted.
Indeed, I successfully compiled and tested the latest 4.17.19 kernel, which is now EOL (end of life - no more stable releases for 4.17), based on the default Arch kernel config. No issues with 4.17.19. No need to revert the "bad" commit.
I also was able to successfully compile and test a 4.18.5 kernel with the "bad" commit reverted and had no boot stalling issues (based on the default Arch kernel config).
Offline
AFAICS, the latest 4.18 stable series release, 4.18.6, has none of Peter Zijlstra's patches incorporated.
So, make sure, if you install and test it, you have a parallel kernel version ready to be used instead. The alternative is to boot with additional kernel boot parameters on affected kernels as a workaround.
Last edited by NiceGuy (2018-09-06 20:50:04)
Offline
I build a kernel package from the original "4.18.5.arch1-1" PKGBUILD and included the revert patch.
Rebuild it in a "systemd-nspawn" machine, on a clean chroot with "makechrootpkg". Boots fine without issues on my hardware.
Offline
What can I do if I don't know how to apply a kernel patch?
Offline
What can I do if I don't know how to apply a kernel patch?
Switch to the lts kernel
https://ugjka.net
paru > yay | vesktop > discord
pacman -S spotify-launcher
mount /dev/disk/by-...
Offline
stoelpi wrote:What can I do if I don't know how to apply a kernel patch?
Switch to the lts kernel
Or boot with tsc=unstable or clocksource=hpet until it is fixed.
https://wiki.archlinux.org/index.php/Kernel_parameters
Last edited by progandy (2018-09-06 18:01:11)
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
Don't tsc=unstable or clocksource=hpet have caveats?
https://ugjka.net
paru > yay | vesktop > discord
pacman -S spotify-launcher
mount /dev/disk/by-...
Offline
Not in this case. The kernel first chooses tsc as the clocksource. This is fine on most modern systems, but not on older cpus like the Core2. Linux detects that it is unstable and now tries to switch to hpet as the clocksource. During this process the kernel locks up and stalls. If you tell the kernel that tsc is unstable before it boots, it directly chooses hpet and the bug is avoided.
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
The good news is that the patch to fix our issue has been pulled by Thomas Gleixner: https://lkml.org/lkml/2018/9/6/402
What remains is that Linus gets a pull request and commits the fixes into his tree (git kernel 4.19-rc3 hopefully) and only after that is Greg KH going to pick it up and back-port it to the 4.18 stable series. A bit of luck, maybe 4.18.7 or 4.18.8.
Maybe if we ask politely and our Arch kernel developers aren't to busy doing something else important, we could try and request that the patch gets back-ported earlier by them, but this also means Linus has to accept and commit it into his tree first.
We're definitely getting closer to having this fixed once and for all.
Good job, everyone involved!
Last edited by NiceGuy (2018-09-06 21:01:13)
Offline
Maybe if we ask politely and our Arch kernel developers aren't to busy doing something else important, we could try and request that the patch gets back-ported earlier by them, but this also means Linus has to accept and commit it into his tree first.
No, don't do that, be patient please.
Offline
NiceGuy wrote:Maybe if we ask politely and our Arch kernel developers aren't to busy doing something else important, we could try and request that the patch gets back-ported earlier by them, but this also means Linus has to accept and commit it into his tree first.
No, don't do that, be patient please.
Relax!
Has already been reported by someone not being aware of this forum thread: https://bugs.archlinux.org/task/59945
If you fully understood what I wrote, you would have known what I meant with: 'Linus has to accept, pull and commit it into his tree first'.
Stable release 4.18.7 will not contain the fix, as 4.18.7-rc1 still lacks any kernel/time/clocksource.c changes.
[Edited to add]
Announcement of 4.18.7-rc1: https://lkml.org/lkml/2018/9/7/1271
Last edited by NiceGuy (2018-09-08 13:39:29)
Offline
Linus accepted, pulled and commited the fix into his tree: https://git.kernel.org/pub/scm/linux/ke … c926ae7fca
Our early boot stalling issue comes to an end, at least for now, with mainline kernel 4.19-rc3.
Shouldn't take long for Greg KH to pick it up and back-port it to a 4.18 stable release.
Woohoo! What an effort, if you really want something get fixed upstream and pulled by Linus.
Offline
Neither of these patches work for me. I tried the revert patch on top of 4.18.5.arch1-1, following the guide on the wiki. The patch was applied and the package was built without errors. But it just does not boot. So, the patch does not work for me.
Adding kernel parameters clocksource=hpet or tsc=unstable don't work either. (Yes, I should have tried this first.)
I am booting my system using EFI (systemd-boot). May this have an effect?
Linux 4.17 was working fine for me. Now I'm using the LTS kernel as a workaround, but I'm afraid because 4.19 is becoming the next LTS kernel.
Offline
@GoatWarning bisect between 4.17 and 4.18 and find the cause. As far as I am aware 4.19 has not been announced as the next LTS kernel.
Offline
@GoatWarning: are Intel Core 2 main-boards capable of having an UEFI implementation?
What comes to my mind of testing this issue with more than 2 dozen reboots and kernel boot parameter variations:
1) if you add "debug" with and without "clocksource=hpet" or "tsc=unstable",
2) if you don't add anything other than your usual kernel boot parameters,
FOR 1 and 2: are you able to boot successfully at least once, if you try ~10 times (not more than 10 times)?
3) I know you posted your instructions before, but are you certain the patch has been successfully applied in your own kernel building process?
Offline
4) Tomorrow, you could test mainline kernel 4.19-rc3, maybe there is an additional fix in there, which might help you.
Offline
4.19 is in the list of Longterm kernel releases. I have never used git-bisect, I'm not sure I can, but I'll try.
HP 6730b laptop has UEFI and my CPU is "Intel(R) Core(TM)2 Duo CPU P8600" (info taken from lscpu)
I am pretty sure that the patch was applied. I checked the file (src/archlinux-linux/kernel/time/clocksource.c) and it has the changes. There is also a file called clocksource.c.orig, which is the original file without the patch.
I tried booting with debug and clocksource=hpet and it never boots (tried ~10 times). I should note that even the fallback image does not boot. I'm now learning how to use git-bisect, I think that's the way to go.
Last edited by GoatWarning (2018-09-10 05:32:50)
Offline
I just finished bisecting (using guide at ArchWiki for AUR package linux-git) and this is the result:
84c8b58ed3addf17d3beb2e5037b001ffa65c5ef is the first bad commit
Offline
I just finished bisecting (using guide at ArchWiki for AUR package linux-git) and this is the result:
84c8b58ed3addf17d3beb2e5037b001ffa65c5ef is the first bad commit
That looks like an unrelated issue to the one in this thread - you should either open a new thread or report upstream.
Offline
As the patch has not been included for 4.18.8 review someone might want to ask on the linux-stable mailing list if it is being considered for inclusion in linux-stable.
It might be the missing Cc to linux-stable on https://git.kernel.org/pub/scm/linux/ke … fd434adb3a caused it to be overlooked.
Offline
Done!
Offline
Wonderful news, everyone!
The patch has landed in Greg KH's stable queue and therefore 4.18.9 finally includes it!
I received a confirmation e-mail today from Greg:
"This is a note to let you know that I've just added the patch titled
clocksource: Revert "Remove kthread"
to the 4.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
clocksource-revert-remove-kthread.patch
and it can be found in the queue-4.18 subdirectory."
Offline
I would like to thank everyone who helped in solving this problem, especially those who took a time to bisect and submit their findings.
@nic3-14159 @NiceGuy @loqs @progandy @viktorj and all others.
Much appreciated
Thank you
Offline