You are not logged in.

#1 2020-04-28 08:59:54

lamargo
Member
Registered: 2020-04-26
Posts: 6

[SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Hi everyone,

I bought a used Dell Latitude 7490/0KP0FT a while ago. The first few months it worked flawlessly, but after a Kernel update from late 4 to early 5 (unfortunately I do not have exact versions) I started getting random Kernel Panics while the system was booted: The display would freeze and after a few seconds the Caps Lock key would start to blink, indicating a Kernel Panic. I spend some time debugging, but eventually resented to using an older LTS kernel, which worked fine for a while.

About a week back I started getting Kernel Panics again, but this time relatively consistently when I suspended or resumed the system (both seem to trigger the issue sometimes). The freeze happens maybe every fifth suspend/resume cycle.
When it occurs on suspend: The display turns off, but the power LED stays on, after a while Caps Lock starts blinking. And when it occurs during resume, the display does not turn on at all, and Caps Lock starts blinking.

I have the latest intel-ucode package installed (ucode version 2019-10-03).

The problem occurs with all Kernel versions that I tested:

  • an ancient 4.19.80-2-lts

  • the current Arch kernel (5.6.7-arch1-1 as of now)

  • the most recent mainline kernel, that I compiled myself (5.7.0-rc2)

This puzzles me, as it seems to indicate that the problem is independent of the kernel version.

Since the problem only occurs during suspend operations, my guess is that it has to do with power management.
Workarounds I've tried:

  • Update to the latest BIOS (1.13.1 11/08/2019)

  • Limit the C-states of the CPU with kernel parameter "processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll"

  • Disable Intel SpeedStep, TurboBoost in BIOS

  • Using the software-only s2idle instead of a full s3 suspend

However, with each of them the error re-occurred at least once.

I logged kernel messages via netconsole, and was able to capture the following:

[45611.653983] kauditd_printk_skb: 17 callbacks suppressed
[45611.653986] audit: type=1130 audit(1588023884.683:364): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=i3lock@marco comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[45611.669115] PM: suspend entry (s2idle)
[45611.679592] Filesystems sync: 0.010 seconds
[45611.686633] Freezing user space processes ... (elapsed 0.002 seconds) done.
[45611.689458] OOM killer disabled.
[45611.689460] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[45611.690541] printk: Suspending console(s) (use no_console_suspend to debug)
[76323.487821] done.
[76323.492497] audit: type=1130 audit(1588054596.330:365): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=shadow comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76323.497199] audit: type=1701 audit(1588054596.333:366): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=395 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" sig=6 res=1
[76323.565564] thermal thermal_zone7: failed to read out thermal zone (-61)
[76323.597401] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
[76323.597484] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
[76323.598277] audit: type=1131 audit(1588054596.436:367): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[76323.607919] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[76323.609494] systemd[1]: Stopping Flush Journal to Persistent Storage...
[76323.610785] PM: suspend exit
[76323.613808] systemd[1]: systemd-suspend.service: Succeeded.
[76323.614263] systemd[1]: Finished Suspend.
[76323.614804] systemd[1]: Stopped target Sleep.
[76323.614917] systemd[1]: Reached target Suspend.
[76323.615111] systemd[1]: Stopped target Suspend.
[76323.615353] audit: type=1130 audit(1588054596.453:368): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76323.615357] audit: type=1131 audit(1588054596.453:369): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76323.661787] systemd-coredump[107968]: #3  0x0000556f74dad161 n/a (systemd-journald + 0x19161)
[76323.661791] systemd-coredump[107968]: #4  0x0000556f74dae0f3 n/a (systemd-journald + 0x1a0f3)
[76323.661795] systemd-coredump[107968]: #5  0x00007f562b0795d8 n/a (libsystemd-shared-245.so + 0x7b5d8)
[76323.661799] systemd-coredump[107968]: #6  0x00007f562b07988b sd_event_dispatch (libsystemd-shared-245.so + 0x7b88b)
[76323.661802] systemd-coredump[107968]: #7  0x00007f562b07b311 sd_event_run (libsystemd-shared-245.so + 0x7d311)
[76323.632105] systemd[1]: systemd-journal-flush.service: Succeeded.
[76323.632554] systemd[1]: Stopped Flush Journal to Persistent Storage.
[76323.632967] systemd[1]: Stopped Journal Service.
[76323.634994] systemd[1]: Starting Journal Service...
[76323.635155] audit: type=1131 audit(1588054596.470:370): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journal-flush comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76323.635160] audit: type=1130 audit(1588054596.470:371): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76323.635165] audit: type=1131 audit(1588054596.470:372): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76323.635172] audit: type=1334 audit(1588054596.473:374): prog-id=4 op=UNLOAD
[76323.650671] systemd[1]: shadow.service: Succeeded.
[76323.661755] systemd-coredump[107968]: Process 395 (systemd-journal) of user 0 dumped core.
[76323.661766] systemd-coredump[107968]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.a9df153f6779423cba0a83b2033f9200.395.1588054596000000000000.lz4
[76323.661771] systemd-coredump[107968]: Stack trace of thread 395:
[76323.661775] systemd-coredump[107968]: #0  0x00007f562b06aa82 journal_file_append_object (libsystemd-shared-245.so + 0x6ca82)
[76323.661780] systemd-coredump[107968]: #1  0x00007f562b06e6cd n/a (libsystemd-shared-245.so + 0x706cd)
[76323.661784] systemd-coredump[107968]: #2  0x00007f562b06f314 journal_file_append_entry (libsystemd-shared-245.so + 0x71314)
[76323.661806] systemd-coredump[107968]: #8  0x0000556f74d9a1f6 n/a (systemd-journald + 0x61f6)
[76323.661810] systemd-coredump[107968]: #9  0x00007f562b288023 __libc_start_main (libc.so.6 + 0x27023)
[76323.661814] systemd-coredump[107968]: #10 0x0000556f74d9a91e n/a (systemd-journald + 0x691e)
[76323.809481] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[76323.809643] systemd[1]: Condition check resulted in Store a System Token in an EFI Variable being skipped.
[76323.809753] systemd[1]: Condition check resulted in First Boot Wizard being skipped.
[76323.809876] systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
[76324.171312] systemd-journald[107999]: File /var/log/journal/c7ceb37f70cc4042849caa4b9a50fbe1/system.journal corrupted or uncleanly shut down, renaming and replacing.
[76324.239835] systemd[1]: Started Journal Service.
[76324.257428] systemd-journald[107999]: Received client request to flush runtime journal.
[76329.219251] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[76329.219321] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[76329.246155] kauditd_printk_skb: 8 callbacks suppressed
[76329.246156] audit: type=1334 audit(1588054602.083:383): prog-id=51 op=LOAD
[76335.128721] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[76336.218905] Shutting down cpus with NMI
[76336.229816] Kernel Offset: 0x12600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[76336.229822] Rebooting in 30 seconds..
[76367.239199] ACPI MEMORY or I/O RESET_REG.

Unfortunately it is not much. I suspect the systemd-coredump issues are just due to a previous unclean shutdown, thus only the last few lines are of interest.

Some more system information:

dmesg | grep -i "error\|exception\|warning"
[    1.349857] RAS: Correctable Errors collector initialized.
[    2.377295] i8042: Warning: Keylock active
[    9.912612] random: 2 urandom warning(s) missed due to ratelimiting
[   10.040223] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
[   11.072428] ACPI Warning: \_SB.IETM._TRT: Return Package has no elements (empty) (20180810/nsprepkg-94)
[   11.081115] ACPI Warning: SystemMemory range 0x00000000FE028000-0x00000000FE0281FF conflicts with OpRegion 0x00000000FE028000-0x00000000FE028207 (\_SB.PCI0.GEXP.BAR0) (20180810/utaddress-204)
[   11.096256] intel-lpss: probe of INT3446:00 failed with error -16
[   11.517008] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2

Especially the ACPI warnings the intel-lpss stand out to me, but I am not sure what to make of them.

dmesg| grep -i acpi
[    0.000000] BIOS-e820: [mem 0x00000000b915c000-0x00000000b915cfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000ca328000-0x00000000ca36ffff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000ca370000-0x00000000cacd1fff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x00000000b915c000-0x00000000b915cfff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x00000000ca328000-0x00000000ca36ffff] ACPI data
[    0.000000] reserve setup_data: [mem 0x00000000ca370000-0x00000000cacd1fff] ACPI NVS
[    0.000000] efi:  ACPI=0xca336000  ACPI 2.0=0xca336000  SMBIOS=0xf0000  SMBIOS 3.0=0xf0020  ESRT=0xcb0e0018  MEMATTR=0xc5f21298  TPMEventLog=0xb9250018
[    0.015081] ACPI: Early table checksum verification disabled
[    0.015085] ACPI: RSDP 0x00000000CA336000 000024 (v02 DELL  )
[    0.015089] ACPI: XSDT 0x00000000CA3360C0 000104 (v01 DELL   CBX3     01072009 AMI  00010013)
[    0.015096] ACPI: FACP 0x00000000CA360B60 00010C (v05 DELL   CBX3     01072009 AMI  00010013)
[    0.015103] ACPI: DSDT 0x00000000CA336258 02A908 (v02 DELL   CBX3     01072009 INTL 20160422)
[    0.015108] ACPI: FACS 0x00000000CACCF180 000040
[    0.015111] ACPI: APIC 0x00000000CA360C70 0000BC (v03 DELL   CBX3     01072009 AMI  00010013)
[    0.015115] ACPI: FPDT 0x00000000CA360D30 000044 (v01 DELL   CBX3     01072009 AMI  00010013)
[    0.015118] ACPI: FIDT 0x00000000CA360D78 0000AC (v01 DELL   CBX3     01072009 AMI  00010013)
[    0.015122] ACPI: MCFG 0x00000000CA360E28 00003C (v01 DELL   CBX3     01072009 MSFT 00000097)
[    0.015126] ACPI: HPET 0x00000000CA360E68 000038 (v01 DELL   CBX3     01072009 AMI. 0005000B)
[    0.015129] ACPI: SSDT 0x00000000CA360EA0 000359 (v01 SataRe SataTabl 00001000 INTL 20160422)
[    0.015133] ACPI: SSDT 0x00000000CA361200 0012E0 (v02 SaSsdt SaSsdt   00003000 INTL 20160422)
[    0.015137] ACPI: HPET 0x00000000CA3624E0 000038 (v01 INTEL  KBL-ULT  00000001 MSFT 0000005F)
[    0.015141] ACPI: SSDT 0x00000000CA362518 000BEE (v02 INTEL  xh_OEMBD 00000000 INTL 20160422)
[    0.015145] ACPI: UEFI 0x00000000CA363108 000042 (v01                 00000000      00000000)
[    0.015149] ACPI: SSDT 0x00000000CA363150 0017AE (v02 CpuRef CpuSsdt  00003000 INTL 20160422)
[    0.015153] ACPI: LPIT 0x00000000CA364900 000094 (v01 INTEL  KBL-ULT  00000000 MSFT 0000005F)
[    0.015156] ACPI: SSDT 0x00000000CA364998 000161 (v02 INTEL  HdaDsp   00000000 INTL 20160422)
[    0.015160] ACPI: SSDT 0x00000000CA364B00 00029F (v02 INTEL  sensrhub 00000000 INTL 20160422)
[    0.015164] ACPI: SSDT 0x00000000CA364DA0 003002 (v02 INTEL  PtidDevc 00001000 INTL 20160422)
[    0.015168] ACPI: SSDT 0x00000000CA367DA8 000517 (v02 INTEL  TbtTypeC 00000000 INTL 20160422)
[    0.015171] ACPI: DBGP 0x00000000CA3682C0 000034 (v01 INTEL           00000002 MSFT 0000005F)
[    0.015175] ACPI: DBG2 0x00000000CA3682F8 000054 (v00 INTEL           00000002 MSFT 0000005F)
[    0.015179] ACPI: SSDT 0x00000000CA368350 0007DD (v02 INTEL  UsbCTabl 00001000 INTL 20160422)
[    0.015182] ACPI: SSDT 0x00000000CA368B30 00531B (v02 DptfTa DptfTabl 00001000 INTL 20160422)
[    0.015186] ACPI: MSDM 0x00000000CA36DE50 000055 (v03 DELL   CBX3     06222004 AMI  00010013)
[    0.015190] ACPI: SLIC 0x00000000CA36DEA8 000176 (v03 DELL   CBX3     01072009 MSFT 00010013)
[    0.015193] ACPI: NHLT 0x00000000CA36E020 00002D (v00 INTEL  EDK2     00000002      01000013)
[    0.015197] ACPI: TPM2 0x00000000CA36E050 000034 (v03 DELL   CBX3     00000001 AMI  00000000)
[    0.015201] ACPI: ASF! 0x00000000CA36E088 0000A0 (v32 INTEL   HCG     00000001 TFSM 000F4240)
[    0.015205] ACPI: DMAR 0x00000000CA36E128 000138 (v01 INTEL  EDK2     00000001 INTL 00000001)
[    0.015208] ACPI: BGRT 0x00000000CA36E260 000038 (v00                 01072009 AMI  00010013)
[    0.015219] ACPI: Local APIC address 0xfee00000
[    0.138135] ACPI: PM-Timer IO Port: 0x1808
[    0.138137] ACPI: Local APIC address 0xfee00000
[    0.138145] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.138146] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    0.138146] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[    0.138147] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
[    0.138148] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
[    0.138149] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
[    0.138149] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
[    0.138150] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
[    0.138180] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.138182] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.138183] ACPI: IRQ0 used by override.
[    0.138185] ACPI: IRQ9 used by override.
[    0.138187] Using ACPI (MADT) for SMP configuration information
[    0.138189] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.394492] ACPI: Core revision 20180810
[    0.486486] PM: Registering ACPI NVS region [mem 0xb915c000-0xb915cfff] (4096 bytes)
[    0.486486] PM: Registering ACPI NVS region [mem 0xca370000-0xcacd1fff] (9838592 bytes)
[    0.486486] ACPI: bus type PCI registered
[    0.486486] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.486486] ACPI: Added _OSI(Module Device)
[    0.486486] ACPI: Added _OSI(Processor Device)
[    0.486486] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.486486] ACPI: Added _OSI(Processor Aggregator Device)
[    0.486486] ACPI: Added _OSI(Linux-Dell-Video)
[    0.486486] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    0.555041] ACPI: 11 ACPI AML tables successfully acquired and loaded
[    0.575516] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[    0.586536] ACPI: Dynamic OEM Table Load:
[    0.586536] ACPI: SSDT 0xFFFF9AED1B811000 00058B (v02 PmRef  Cpu0Ist  00003000 INTL 20160422)
[    0.586541] ACPI: \_PR_.PR00: _OSC native thermal LVT Acked
[    0.586801] ACPI: Dynamic OEM Table Load:
[    0.586810] ACPI: SSDT 0xFFFF9AED1B537400 0003FF (v02 PmRef  Cpu0Cst  00003001 INTL 20160422)
[    0.587611] ACPI: Dynamic OEM Table Load:
[    0.587617] ACPI: SSDT 0xFFFF9AED1B8A70C0 0000BA (v02 PmRef  Cpu0Hwp  00003000 INTL 20160422)
[    0.588275] ACPI: Dynamic OEM Table Load:
[    0.588282] ACPI: SSDT 0xFFFF9AED1B8C2800 000628 (v02 PmRef  HwpLvt   00003000 INTL 20160422)
[    0.588768] ACPI: Dynamic OEM Table Load:
[    0.588768] ACPI: SSDT 0xFFFF9AED1B915000 000D14 (v02 PmRef  ApIst    00003000 INTL 20160422)
[    0.588768] ACPI: Dynamic OEM Table Load:
[    0.588768] ACPI: SSDT 0xFFFF9AED1B4A3400 000317 (v02 PmRef  ApHwp    00003000 INTL 20160422)
[    0.595347] ACPI: Dynamic OEM Table Load:
[    0.595354] ACPI: SSDT 0xFFFF9AED1B537800 00030A (v02 PmRef  ApCst    00003000 INTL 20160422)
[    0.599788] ACPI: EC: EC started
[    0.599789] ACPI: EC: interrupt blocked
[    0.607138] ACPI: \_SB_.PCI0.LPCB.ECDV: Used as first EC
[    0.607138] ACPI: \_SB_.PCI0.LPCB.ECDV: GPE=0x6e, EC_CMD/EC_SC=0x934, EC_DATA=0x930
[    0.607138] ACPI: \_SB_.PCI0.LPCB.ECDV: Used as boot DSDT EC to handle transactions
[    0.607138] ACPI: Interpreter enabled
[    0.607138] ACPI: (supports S0 S3 S4 S5)
[    0.607138] ACPI: Using IOAPIC for interrupt routing
[    0.607153] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.608240] ACPI: Enabled 7 GPEs in block 00 to 7F
[    0.625770] ACPI: Power Resource [WRST] (on)
[    0.661301] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e])
[    0.661309] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.666356] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME AER PCIeCapability LTR]
[    0.691363] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.691456] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *10 11 12 14 15)
[    0.691542] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.691629] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.691714] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.691800] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.691885] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.691971] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.699652] ACPI: EC: interrupt unblocked
[    0.699661] ACPI: EC: event unblocked
[    0.699670] ACPI: \_SB_.PCI0.LPCB.ECDV: GPE=0x6e, EC_CMD/EC_SC=0x934, EC_DATA=0x930
[    0.699672] ACPI: \_SB_.PCI0.LPCB.ECDV: Used as boot DSDT EC to handle transactions and events
[    0.699777] ACPI: bus type USB registered
[    0.699795] PCI: Using ACPI for IRQ routing
[    0.727692] pnp: PnP ACPI init
[    0.728746] system 00:00: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.728911] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.728971] system 00:02: Plug and Play ACPI device, IDs INT3f0d PNP0c02 (active)
[    0.729093] pnp 00:03: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.729502] system 00:04: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.729578] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.730084] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.732168] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.740878] pnp: PnP ACPI: found 8 devices
[    0.746810] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.269713] DMAR: ACPI device "device:6a" under DMAR at fed91000 as 00:15.0
[    1.269717] DMAR: ACPI device "device:6b" under DMAR at fed91000 as 00:15.1
[    1.269720] DMAR: ACPI device "device:6c" under DMAR at fed91000 as 00:15.2
[    1.269722] DMAR: ACPI device "device:6d" under DMAR at fed91000 as 00:15.3
[    1.331480] ACPI: Lid Switch [LID0]
[    1.331582] ACPI: Power Button [PBTN]
[    1.331635] ACPI: Sleep Button [SBTN]
[    1.331686] ACPI: Power Button [PWRF]
[    1.336380] ACPI: Thermal Zone [THM] (25 C)
[   11.072428] ACPI Warning: \_SB.IETM._TRT: Return Package has no elements (empty) (20180810/nsprepkg-94)
[   11.073016] ACPI: AC Adapter [AC] (on-line)
[   11.081115] ACPI Warning: SystemMemory range 0x00000000FE028000-0x00000000FE0281FF conflicts with OpRegion 0x00000000FE028000-0x00000000FE028207 (\_SB.PCI0.GEXP.BAR0) (20180810/utaddress-204)
[   11.081121] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   11.131337] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[   11.142873] battery: ACPI: Battery Slot [BAT0] (battery present)
[   11.145016] acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[   11.868516] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)

I am kind of at a loss here. The disability to suspend limits the usefulness of a laptop severely.

Does anyone have an idea what I could do to suspend without kernel panics, or to further debug the issue? Has anyone had a similar issue before, or even the same machine?
I am grateful for any idea

Thanks in advance!

Last edited by lamargo (2020-05-03 17:08:34)

Offline

#2 2020-04-28 17:39:35

d_fajardo
Member
Registered: 2017-07-28
Posts: 1,364

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Have you considered this might be a hardware problem? I would run MemTest to check RAM and CPU registers for errors as a start.

Offline

#3 2020-04-29 08:33:59

lamargo
Member
Registered: 2020-04-26
Posts: 6

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Thank you for your response!
Following your suggestion I let memtest86 run overnight, but it came clean with no errors.

However I kind of suspected that, since the system does run stable once it is up. The error seems to only occur during transitions from/to suspend.

Offline

#4 2020-04-29 10:56:36

d_fajardo
Member
Registered: 2017-07-28
Posts: 1,364

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

ACPI Warning: SystemMemory range 0x00000000FE028000-0x00000000FE0281FF conflicts with OpRegion 0x00000000FE028000-0x00000000FE028207 (\_SB.PCI0.GEXP.BAR0) (20180810/utaddress-204)

It is from this warning that I thought it might be a good idea to check your memory but that's not the issue.
I would look at Power Management next and systemd. Are you running any DM?

I suspect the systemd-coredump issues are just due to a previous unclean shutdown

It could be systemd might be the issue. It seems systemd enters suspend OK:

systemd-suspend.service: Succeeded.

But then you get coredumps afterwards?
And this error in your logs could be a lead as well:

 Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler

On a brief internet search, I found an answer here.

Offline

#5 2020-04-29 18:19:02

lamargo
Member
Registered: 2020-04-26
Posts: 6

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

I am suspecting power management as well. And since the problem persists with different Kernel versions, I agree that the issue could have to do with systemd.
I am not sure it is a good idea, but I could try to downgrade systemd in order to verify this?

Are you running any DM?

Yes, I am running an LVM on top of a LUKS volume.

But then you get coredumps afterwards?

The stack trace contains 

[76323.661775] systemd-coredump[107968]: #0  0x00007f562b06aa82 journal_file_append_object (libsystemd-shared-245.so + 0x6ca82)

so journald seems to crash because of a corrupt log file. Right before that, the log shows

[76323.597484] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
[76323.598277] audit: type=1131 audit(1588054596.436:367): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[76323.607919] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.

So just before the kernel Panic, journald was killed which might have caused the corruption. However, during all the previous crashes, this seems to have never happened before, so it might a wrong trail as well ...

On a brief internet search, I found an answer here.

Thank you for the link.
They suggest the max_cstate=1 kernel parameter. I did already tried processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll (not sure if the processor. prefix is needed here), as well as deactivating C States completely in the UEFI, both did not change anything unfortunately

Offline

#6 2020-05-01 20:41:18

PopeXXIII
Member
Registered: 2013-11-13
Posts: 10

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

I'm experiencing the same issue on my Latitude 7490. Dell ePSA diagnostics all reported back that no issues were detected. It definitely seems like it is sleep/resume/cstate related. If I keep the laptop plugged in and have sleep disabled, it usually runs fine for several days.

Let me know if I can provide any logs or details that could be helpful.

Offline

#7 2020-05-01 21:23:28

lamargo
Member
Registered: 2020-04-26
Posts: 6

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Wow that is actually really interesting to hear! I was starting to think I was the only one with this problem.

Yes it's the exact same thing for me, if I don't suspend the system it runs stable. So far I've made good experiences with the hibernate mode, but it's a pain to always do that instead of a quick suspend and thus not really a long term solution.

How long have you been experiencing the panics? Did it start after some software update, BIOS update or was it always like this?
Logs would be super helpful! Do you have any dmesg of journald entries from the Panic event? As I've described in my original post, my system didn't really log any details that would help us.

I did some more investigation:

  • booted Ubuntu live image (19.10 and 20.04): same problem

  • booted Void Linux live image (which is not systemd based): seemed to work fine
    --> This seems to indicate it is a software issue or at least triggered by some specific software

  • rolled back all packages (as described here) to a date when the system was running ok: same problem
    -->This does not make any sense to me and seems to contradict the hypothesis that it is software related

Offline

#8 2020-05-02 06:33:12

d_fajardo
Member
Registered: 2017-07-28
Posts: 1,364

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

It does seem to point to hardware. Perhaps check the BIOS again and see if there is anything amiss there like in Power Management i.e. S1, S2 states etc.

Offline

#9 2020-05-02 12:18:12

davze
Member
Registered: 2020-05-02
Posts: 2

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Try those kernel parameters. Kaby Lake processors seem to have problems with C-States on recent kernels.  More Information

acpi_enforce_resources=lax i915.enable_dc=0 intel_idle.max_cstate=1

or

acpi_enforce_resources=lax i915.enable_psr=0

Both seem to work for me atm. Limiting the cstates is of course killing the battery...

Last edited by davze (2020-05-02 12:48:14)

Offline

#10 2020-05-02 17:44:59

lamargo
Member
Registered: 2020-04-26
Posts: 6

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Thank you @davze, this actually seems to work for me as well!

fyi, it appears just acpi_enforce_resources=lax i915.enable_dc=0 works too, however acpi_enforce_resources=lax i915.enable_psr=0 does not work on my machine

Offline

#11 2020-05-02 18:59:07

davze
Member
Registered: 2020-05-02
Posts: 2

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

Out of curiosity... Do you have the problem when suspending manually or only when the system suspends after a few minutes of idling?
I'm asking because on my Latitude 7490 I only had problems if the system suspends automatically. The problem was gone when I disabled Screen Energy Saving (KDE).

https://docs.kde.org/trunk5/en/kde-work … energy.png

Mod note: replaced oversized image with link -- V1del

Last edited by V1del (2020-05-11 20:59:11)

Offline

#12 2020-05-02 19:50:04

lamargo
Member
Registered: 2020-04-26
Posts: 6

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

For me it happens with both. I, too, disabled autosuspend because of it, but manual suspends trigger it just as well.

Last edited by lamargo (2020-05-02 19:50:36)

Offline

#13 2020-05-07 14:17:14

PopeXXIII
Member
Registered: 2013-11-13
Posts: 10

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

davze wrote:

Try those kernel parameters. Kaby Lake processors seem to have problems with C-States on recent kernels.  More Information

Yeah, as detailed in that link, I feel like my experiences with Intel graphics and (now) c-states have been poor for a while now. I get Intel graphics DRM hangs on occasion (that issue was worse around Nov-Dec, whatever kernel version was around then). I guess I'll think about disabling the power management items mentioned there, but life is crazy with teleworking and a toddler at home. I'll eventually get around to it... neutral

Last edited by PopeXXIII (2020-05-07 14:18:41)

Offline

#14 2020-05-11 20:36:12

jackrandom
Member
Registered: 2014-02-05
Posts: 1

Re: [SOLVED] Random Kernel Panics on suspend/resume on Dell Latitude 7490

I'm having the exact same experience since the end of last year, also with a latitude e7490, latest BIOS. I've tried multiple kernels and the latest were always affected the most, although it happened at least once on the 4.19 LTS kernel.

After endless hours of debugging, I installed Windows because I wanted to make sure it isn't hardware-related. All ePSA tests were always successful, as were multiple runs of MemTest86.

Interestingly enough, I was able to produce similar kernel panics under Windows 10 (= BSOD, WHEA_UNCORRECTABLE_ERROR). I've also noticed that the freezes practically always occurred on battery power, instead of when plugged in. So I suspected something related to C-states and/or GPU for a while as well.

Since I came across this article and followed the instructions (which - of course - was a pain, because Windows reinstalls the same faulty drivers again and again), the problems are since gone and I did not have a single freeze since then. I'm aware this isn't a windows support forum, I'm just pointing to it, because I meanwhile believe that 1) these issues are related and 2) faulty hardware isn't the cause of them.

I may now try to reinstall Arch Linux and boot with given kernel parameters, and can hopefully again use this laptop. I had so switch to an old Thinkpad from work for weeks now, because I couldn't get a panic-free Latitude experience. Thanks everyone for bringing some light and hope to the matter. I'll report.

Offline

Board footer

Powered by FluxBB