Continuous ACPI errors after kernel update resulting in high CPU usage

danirybe · 2024-02-23 16:34:14

My system info:

Laptop model: ASUS VivoBook D540NV-GQ065T

OS: Arch Linux x86_64

Kernel: 6.6.14-2-lts

WM: sway

CPU: Intel Pentium N420 (4) @ 2.500GHz

GPU1: Intel Apollo Lake [HD Graphics 505]

GPU2: NVIDIA GeForce 920MX

Problem:

After updating the kernel to 6.6.17.1-lts noticed high cpu temperature and lag. After running htop noticed that journald was using 30-60% of cpu.

Running journalctl -f gives these lines over and over again:

Feb 19 21:09:12 danirybe kernel: ACPI Error: Could not disable RealTimeClock events (20230628/evxfevnt-243)

Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 08, disabling event (20230628/evgpe-839)

Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0A, disabling event (20230628/evgpe-839)

Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0B, disabling event (20230628/evgpe-839)

Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PM_Timer (0), disabling (20230628/evevent-255)

Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PowerButton (2), disabling (20230628/evevent-255)

Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - SleepButton (3), disabling (20230628/evevent-255)

What I've tried:

1: adding following parameters to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running sudo grub-mkconfig -o /boot/grub/grub.cfg

pci=off disables my keyboard and produces the following error on boot:

ERROR: device `UUID=4fe5af1f-a57c-4ae3-996b-3ec9891b7529` not found. Skipping fsck.
mount: /new_root: can't find UUID=4fe5af1f-a57c-4ae3-996b-3ec9891b7529 on real boot.
You are now being dropped into an emergency shell.
sh: can't access tty; job control turned off
[rootfs ~]#

pci=nomsi produces no effect

pci=noacpi no errors, but keyboard doesn't work, so can't log in.

pci=nommconf infinite boot.

2: switching to the newest version of the default linux kernel (not lts). Same problem.

The only thing that worked was returning to linux-lts 6.6.14-2

Is there anything I can do or am I better off not updating for now?

seth · 2024-02-23 16:53:56

https://bbs.archlinux.org/viewtopic.php?id=292747

loqs · 2024-02-23 18:48:24

With 6.6.14-2 are the messages from your first code block present at all? Have you tried linux-lts-6.6.15-1-x86_64.pkg.tar.zst or linux-lts-6.6.16-1-x86_64.pkg.tar.zst?

danirybe · 2024-02-24 07:31:21

loqs, to answer your first question: no, it's errors all the time, or no errors at all. To answer your second question, linux-lts-6.6.15-1 and linux-lts-6.6.16-1 produce the same problem.

seth · 2024-02-24 08:05:46

Just to be absolutely sure, "uname -a" suggests you're booting the updated kernel when this happens?

JambonII · 2024-02-24 08:41:55

@danirybe

Got same problem with similar laptop, the only way i have found to "fix" this problem is to keep "linux-6.7.arch3-1-x86_64" and "linux-headers-6.7.arch3-1-x86_64"

danirybe · 2024-02-24 08:47:23

seth wrote:

Just to be absolutely sure, "uname -a" suggests you're booting the updated kernel when this happens?

Yes. On a side note, I'll be unavalaibale for aproximately 3 hours from the time of this reply.

loqs · 2024-02-24 09:52:16

@seth see anything in the 6.6.15 ChangeLog6.6.15 other than 484514580275321fe711af120bf39a17e3d7b313 worth a test revert?
Edit:
linux-lts 6.6.15 with 484514580275321fe711af120bf39a17e3d7b313]484514580275321fe711af120bf39a17e3d7b313 reverted and pkgrel increased to 1.1
https://drive.google.com/file/d/1Oj1kYK … sp=sharing linux-lts-6.6.15-1.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1vEBIn_ … sp=sharing linux-lts-headers-6.6.15-1.1-x86_64.pkg.tar.zst

Last edited by loqs (2024-02-24 12:51:17)

seth · 2024-02-24 13:30:26

The "problem" is that all affected systems we've seen seem to use newer intel chips

Something might have re-introduced https://lore.kernel.org/linux-acpi/2019 … intel.com/
If the patched kernels, kindly provided by loqs, do not fix it, try to add "pci=hpiosize=0" (this will likely not get you a graphical target, but might prevent the acpi notification storm)

danirybe · 2024-02-24 13:55:25

loqs wrote:

@seth see anything in the 6.6.15 ChangeLog6.6.15 other than 484514580275321fe711af120bf39a17e3d7b313 worth a test revert?
Edit:
linux-lts 6.6.15 with 484514580275321fe711af120bf39a17e3d7b313]484514580275321fe711af120bf39a17e3d7b313 reverted and pkgrel increased to 1.1
https://drive.google.com/file/d/1Oj1kYK … sp=sharing linux-lts-6.6.15-1.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1vEBIn_ … sp=sharing linux-lts-headers-6.6.15-1.1-x86_64.pkg.tar.zst

loqs, thank you kindly for your time in effort in trying to pinpoint down this issue, but unfortunately, your patches didn't work for me. The problem persists.

danirybe · 2024-02-24 13:56:15

seth wrote:

The "problem" is that all affected systems we've seen seem to use newer intel chips
Something might have re-introduced https://lore.kernel.org/linux-acpi/2019 … intel.com/
If the patched kernels, kindly provided by loqs, do not fix it, try to add "pci=hpiosize=0" (this will likely not get you a graphical target, but might prevent the acpi notification storm)

I've tried adding your suggested flag, seth, but it didn't seem to have any effect.

Last edited by danirybe (2024-02-24 13:56:38)

danirybe · 2024-02-24 13:59:06

Seeing as the problem seems to have something to do with my hardware, maybe it also has something to so with the fact that my laptop battery doesn't work properly? The laptop turns off immediately after I unplug the charging cord even after changing the battery for a new one.

seth · 2024-02-24 14:04:10

Seems the unsupported battery cannot provide enough voltage, but does the problem exist when not pulling the charger?

Can you get us a journal and "lsmod" output from a 6.6.14 boot ?

sudo journalctl -b | curl -F 'file=@-' 0x0.st

danirybe · 2024-02-24 14:07:14

seth wrote:

Seems the unsupported battery cannot provide enough voltage, but does the problem exist when not pulling the charger?
Can you get us a journal and "lsmod" output from a 6.6.14 boot ?
sudo journalctl -b | curl -F 'file=@-' 0x0.st

sure: http://0x0.st/H5UQ.txt, http://0x0.st/H5Uh.txt

Last edited by danirybe (2024-02-24 14:08:51)

seth · 2024-02-24 14:32:00

"pci=nocrs module_blacklist=nouveau clocksource=tsc initcall_blacklist=simpledrm_platform_driver_init"

@loqs, the addressed patch was part of an entire series fumbling around w/ the RTC
If we blame the rtc (the OP doesn't ultimately settle on HPET) it might be worthwile to revert them all

commit 49a76c08bcfc2260af9bf9975e9f770c6f46102a
commit 9d20185601a030266e222bd7a513984f28170e18
commit d2d8ceb748346dc1e957f19b65a75532b146a9a9
commit 905d9e1c69b25578520bcbbff3a8f7356ec3b33c
commit 484514580275321fe711af120bf39a17e3d7b313

loqs · 2024-02-24 16:00:58

linux-lts 6.6.15 pkgrel increased to 1.2 with:

  git revert -n 49a76c08bcfc2260af9bf9975e9f770c6f46102a # rtc: Extend timeout for waiting for UIP to clear to 1s
  git revert -n 9d20185601a030266e222bd7a513984f28170e18 # rtc: Add support for configuring the UIP timeout for RTC reads
  git revert -n d2d8ceb748346dc1e957f19b65a75532b146a9a9 # rtc: mc146818-lib: Adjust failure return code for mc146818_get_time()
  git revert -n 905d9e1c69b25578520bcbbff3a8f7356ec3b33c # rtc: Adjust failure return code for cmos_set_alarm()
  git revert -n 484514580275321fe711af120bf39a17e3d7b313 # rtc: cmos: Use ACPI alarm for non-Intel x86 systems too

https://drive.google.com/file/d/1TLxJuA … sp=sharing linux-lts-6.6.15-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1vf67wV … sp=sharing linux-lts-headers-6.6.15-1.2-x86_64.pkg.tar.zst

danirybe · 2024-02-24 17:56:56

loqs wrote:

linux-lts 6.6.15 pkgrel increased to 1.2 with:

  git revert -n 49a76c08bcfc2260af9bf9975e9f770c6f46102a # rtc: Extend timeout for waiting for UIP to clear to 1s
  git revert -n 9d20185601a030266e222bd7a513984f28170e18 # rtc: Add support for configuring the UIP timeout for RTC reads
  git revert -n d2d8ceb748346dc1e957f19b65a75532b146a9a9 # rtc: mc146818-lib: Adjust failure return code for mc146818_get_time()
  git revert -n 905d9e1c69b25578520bcbbff3a8f7356ec3b33c # rtc: Adjust failure return code for cmos_set_alarm()
  git revert -n 484514580275321fe711af120bf39a17e3d7b313 # rtc: cmos: Use ACPI alarm for non-Intel x86 systems too

https://drive.google.com/file/d/1TLxJuA … sp=sharing linux-lts-6.6.15-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1vf67wV … sp=sharing linux-lts-headers-6.6.15-1.2-x86_64.pkg.tar.zst

Thanks again for your work, but these patches also don't fix the problem. I'll try reverting some commits as well tomorrow, maybe I'll be able to find something.

loqs · 2024-02-24 18:39:35

 git bisect start
status: waiting for both good and bad commits
$ git bisect bad v6.6.15
status: waiting for good commit(s), bad commit known
$ git bisect good v6.6.14
Bisecting: 165 revisions left to test after this (roughly 7 steps)
[14bafd198066480568967e5fa445ce3a7bbbad98] cifs: after disabling multichannel, mark tcon for reconnect

Linux bisection 6.6.14 to 6.6.15 The folder contains the first bisection point built and the PKGBUILD source for that build if you want to work off that.

seth · 2024-02-24 21:48:59

The only suspect below that seems commit ecabe8cd456d3bf81e92c53b074732f3140f170d

Above it I kinda blame any of
commit 847e1eb30e269a094da046c08273abe3f3361cf2
commit c9c63d6a45414e90d1059bbb6e6e335e7355f1f7
commit e791a345fa73276643e860b59cb5c5054e5013b5
commit 26e85f7b0a16a284acd1d181a9869dccf1d5ca90
commit 1bd81374bc2fa7ae98824b4eb9ac3e97c29d6971
commit 0232a19a0e215e1630ea821fd83fcf8d2de39b21

ace6fb9da63e3071e9be9f68a2c895032b5d0775 would be the last before those.
But tbh, w/o any data from the compromised system, that's just blind guessing - could be almost anything and saves only one bisection step

danirybe · 2024-02-26 18:24:44

@loqs, @seth

Update:

After doing some bisectin' this is the result:

847e1eb30e269a094da046c08273abe3f3361cf2 is the first bad commit
commit 847e1eb30e269a094da046c08273abe3f3361cf2
Author: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Date:   Mon Jan 8 15:20:58 2024 +0900

    platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
    
    commit 5913320eb0b3ec88158cfcb0fa5e996bf4ef681b upstream.
    
    p2sb_bar() unhides P2SB device to get resources from the device. It
    guards the operation by locking pci_rescan_remove_lock so that parallel
    rescans do not find the P2SB device. However, this lock causes deadlock
    when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan
    locks pci_rescan_remove_lock and probes PCI devices. When PCI devices
    call p2sb_bar() during probe, it locks pci_rescan_remove_lock again.
    Hence the deadlock.
    
    To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar().
    Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources()
    for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(),
    refer the cache and return to the caller.
    
    Before operating the device at P2SB DEVFN for resource cache, check
    that its device class is PCI_CLASS_MEMORY_OTHER 0x0580 that PCH
    specifications define. This avoids unexpected operation to other devices
    at the same DEVFN.
    
    Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/
    Fixes: 9745fb07474f ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support")
    Cc: stable@vger.kernel.org
    Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Link: https://lore.kernel.org/r/20240108062059.3583028-2-shinichiro.kawasaki@wdc.com
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Tested-by Klara Modin <klarasmodin@gmail.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/platform/x86/p2sb.c | 180 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 139 insertions(+), 41 deletions(-)

seth · 2024-02-26 22:00:58

seth, literally wrote:

Above it I kinda blame any of
commit 847e1eb30e269a094da046c08273abe3f3361cf2

https://bugzilla.kernel.org/enter_bug.cgi

From the reason given in https://lore.kernel.org/linux-pci/6xb24 … 3uik5ef6j/ (synthetic tests out of a compromised state, leading to maybe just a cosmetic issue) this should probably be reverted asap.

loqs · 2024-02-26 23:09:40

Or reply to https://lore.kernel.org/all/20240129170 … ation.org/ instructions on how to reply in the link.
Edit:
https://bugzilla.kernel.org/show_bug.cgi?id=218531

Last edited by loqs (2024-02-27 14:17:46)

colada · 2024-03-12 06:08:42

Same ASUS D540 here. As far as i can get, there is no workaround of this issue, but downgrade to latest working kernel? Slightly offtopic but I've lost it and actually barely understand how to compile it manually. I've wasted a few days trying to build kernel on this machine - no luck, there is certainly not enough RAM to make build with default arch config. Moved storage device to better machine and actually compiled and installed kernel but failed to boot due to (as i understand) broken initramfs or so. Now i have my hands down and cannot understand at all -- are you fixed it? What can i do to run arch on this machine? Can you guys explain what conclusion are you get to?

seth · 2024-03-12 08:13:12

https://bugzilla.kernel.org/show_bug.cgi?id=218531#c41 - apply the patch or wait until the upstream fix hits the repos.

Alternatively downgrade to (reportedly) 6.6.14
https://archive.archlinux.org/packages/ … kg.tar.zst
https://archive.archlinux.org/packages/ … kg.tar.zst

there is certainly not enough RAM to make build

The kernel doesn't require all that much.

loqs · 2024-03-12 08:24:03

seth wrote:

https://bugzilla.kernel.org/show_bug.cgi?id=218531#c41 - apply the patch or wait until the upstream fix hits the repos.

https://git.kernel.org/pub/scm/linux/ke … 433f7f8f06 is in the 6.8 build now in testing.

Arch Linux

#1 2024-02-23 16:34:14