You are not logged in.

#26 2018-05-18 09:35:48

pkejr
Member
Registered: 2018-04-30
Posts: 20

Re: [Solved] System freeze randomly

Hi,
I am very frustrated because I didn't have a single freeze since more than a week and I don't know why.

 11:22:37 up 8 days, 20:28,  2 users,  load average: 0.68, 0.61, 0.54

One thing I did though (didn't reboot since, or restarted X though) is to add this to a file in /etc/X11/Xorg.conf.d

Section "Device"
    Identifier      "Intel Graphics"
    Driver          "intel"
    Option          "DRI" "false"
    Option          "AccelMethod"   "uxa"
    Option          "NoAccel"       "True"
EndSection

I got those information from here: http://www.thinkwiki.org/wiki/Category:X250
But I don't think it got actually applied, unless Xorg reloads the files in /etc/X11/xorg.conf.d from time to time?
I tried commenting it out to see if that was the thing that fixed it.
But yeah as said very frustrated because I'm quite sure the things I tried before (disabling TLP, swap and updating microcode) didn't fix it, and I'm pretty sure that this Xorg config is not applied (no reboot or restart).

Last edited by pkejr (2018-05-18 09:37:45)

Offline

#27 2018-05-19 03:42:12

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

I too have been plagued by this problem to the point where my laptop is almost unusable. Here are some full journals from recent occurences: one, and two, and three. And a /var/log/Xorg.0.log and a /var/log/Xorg.0.log.old from a boot after a freeze.

Unsure if should open my own thread, but the initial description in this thread is spot on and exactly what I experience. I run LVM on LUKS on EXT4. I have a Dell Precision 5510 and it has a buggy ACPI, not sure if that's the root cause:

$ journalctl -b -p err | ag ACPI
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.XHC.RHUB.HS11], AE_NOT_FOUND (20180105/dswload-211)
May 18 18:30:23 archlinux kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20180105/psobject-252)
May 18 18:30:23 archlinux kernel: ACPI Error: AE_NOT_FOUND, (SSDT:xh_rvp10) while loading table (20180105/tbxfload-228)
May 18 18:30:23 archlinux kernel: ACPI Error: 1 table load failures, 10 successful (20180105/tbxfload-246)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.PEG0.PEGP.TDGC], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PG00._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN00._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN00._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN01._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN01._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN02._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN02._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN03._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN03._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN04._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FNCL, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.FN04._ON, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.TZ00._TMP, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.TZ00._TMP, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.TZ01._TMP, AE_NOT_FOUND (20180105/psparse-550)
May 18 18:30:23 archlinux kernel: ACPI BIOS Error (bug): Failure looking up [\_SB.PCI0.LPCB.HEC.ECAV], AE_NOT_FOUND (20180105/psargs-364)
May 18 18:30:23 archlinux kernel: ACPI Error: Method parse/execution failed \_TZ.TZ01._TMP, AE_NOT_FOUND (20180105/psparse-550)

My system:

$ uname -a
Linux 5510 4.16.9-1-ARCH #1 SMP PREEMPT Thu May 17 02:10:09 UTC 2018 x86_64 GNU/Linux
$ lspci -nnv
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1910] (rev 07)
        Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [1028:06e5]
        Flags: bus master, fast devsel, latency 0
        Capabilities: <access denied>
        Kernel driver in use: skl_uncore

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Dell HD Graphics 530 [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 123
        Memory at db000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 90000000 (64-bit, prefetchable) [size=256M]
        I/O ports at f000 [size=64]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 07)
        Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [1028:06e5]
        Flags: fast devsel, IRQ 16
        Memory at dcc20000 (64-bit, non-prefetchable) [size=32K]
        Capabilities: <access denied>
        Kernel driver in use: proc_thermal
        Kernel modules: processor_thermal_device

00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31) (prog-if 30 [XHCI])
        Subsystem: Dell Sunrise Point-H USB 3.0 xHCI Controller [1028:06e5]
        Flags: bus master, medium devsel, latency 0, IRQ 125
        Memory at dcc10000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci

00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-H Thermal subsystem [8086:a131] (rev 31)
        Subsystem: Dell Sunrise Point-H Thermal subsystem [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at dcc39000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: intel_pch_thermal
        Kernel modules: intel_pch_thermal

00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-H Serial IO I2C Controller #0 [8086:a160] (rev 31)
        Subsystem: Dell Sunrise Point-H Serial IO I2C Controller [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at dcc38000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: intel-lpss
        Kernel modules: intel_lpss_pci

00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-H Serial IO I2C Controller #1 [8086:a161] (rev 31)
        Subsystem: Dell Sunrise Point-H Serial IO I2C Controller [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at dcc37000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: intel-lpss
        Kernel modules: intel_lpss_pci

00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
        Subsystem: Dell Sunrise Point-H CSME HECI [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 130
        Memory at dcc36000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: mei_me
        Kernel modules: mei_me

00:16.3 Serial controller [0700]: Intel Corporation Sunrise Point-H KT Redirection [8086:a13d] (rev 31) (prog-if 02 [16550])
        Subsystem: Dell Sunrise Point-H KT Redirection [1028:06e5]
        Flags: 66MHz, fast devsel, IRQ 19
        I/O ports at f0a0 [size=8]
        Memory at dcc35000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: serial

00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31) (prog-if 01 [AHCI 1.0])
        Subsystem: Dell Sunrise Point-H SATA controller [AHCI mode] [1028:06e5]
        Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 126
        Memory at dcc30000 (32-bit, non-prefetchable) [size=8K]
        Memory at dcc34000 (32-bit, non-prefetchable) [size=256]
        I/O ports at f090 [size=8]
        I/O ports at f080 [size=4]
        I/O ports at f060 [size=32]
        Memory at dcc33000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: <access denied>
        Kernel driver in use: ahci
        Kernel modules: ahci

00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #1 [8086:a110] (rev f1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: dcb00000-dcbfffff [size=1M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:1c.1 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #2 [8086:a111] (rev f1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: dc000000-dc9fffff [size=10M]
        Prefetchable memory behind bridge: 00000000c2100000-00000000c2afffff [size=10M]
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #9 [8086:a118] (rev f1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: dca00000-dcafffff [size=1M]
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:1d.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #13 [8086:a11c] (rev f1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: None
        Prefetchable memory behind bridge: None
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:1d.6 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #15 [8086:a11e] (rev f1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Bus: primary=00, secondary=05, subordinate=3d, sec-latency=0
        I/O behind bridge: 00002000-00002fff [size=4K]
        Memory behind bridge: c4000000-da0fffff [size=353M]
        Prefetchable memory behind bridge: 00000000a0000000-00000000c1ffffff [size=544M]
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a150] (rev 31)
        Subsystem: Dell Sunrise Point-H LPC Controller [1028:06e5]
        Flags: bus master, medium devsel, latency 0

00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
        Subsystem: Dell Sunrise Point-H PMC [1028:06e5]
        Flags: fast devsel
        Memory at dcc2c000 (32-bit, non-prefetchable) [disabled] [size=16K]

00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-H HD Audio [8086:a170] (rev 31) (prog-if 80)
        Subsystem: Dell Sunrise Point-H HD Audio [1028:06e5]
        Flags: bus master, fast devsel, latency 32, IRQ 131
        Memory at dcc28000 (64-bit, non-prefetchable) [size=16K]
        Memory at dcc00000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
        Subsystem: Dell Sunrise Point-H SMBus [1028:06e5]
        Flags: medium devsel, IRQ 16
        Memory at dcc32000 (64-bit, non-prefetchable) [size=256]
        I/O ports at f040 [size=32]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801

01:00.0 Network controller [0280]: Intel Corporation Wireless 8260 [8086:24f3] (rev 3a)
        Subsystem: Intel Corporation Wireless 8260 [8086:0050]
        Flags: bus master, fast devsel, latency 0, IRQ 132
        Memory at dcb00000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: <access denied>
        Kernel driver in use: iwlwifi
        Kernel modules: iwlwifi

02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)
        Subsystem: Dell RTS525A PCI Express Card Reader [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 124
        Memory at dc000000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: rtsx_pci
        Kernel modules: rtsx_pci

03:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a804] (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a801]
        Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
        Memory at dca00000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: nvme

My mkinitcpio.conf

MODULES=(intel_agp i915)
FILES=(/etc/modprobe.d/psmouse.conf /etc/modprobe.d/itco_wdt.conf /etc/modprobe.d/i915.conf)
HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block sd-encrypt sd-lvm2 filesystems fsck)

The only module options I have explicitly enabled that may be an issue is for 1915

  1. $ cat /etc/modprobe.d/i915.conf 
    options i915 enable_guc=1
  2. $ journalctl -b | ag taint
    May 18 22:09:53 archlinux kernel: Setting dangerous option enable_guc - tainting kernel
    May 18 22:10:00 archlinux systemd[1]: System is tainted: var-run-bad
    May 18 22:10:01 5510 kernel: CPU: 1 PID: 441 Comm: systemd-backlig Tainted: G     U           4.16.9-1-ARCH #1

I've tried everything I can think of including the following:

  1. Making sure kernel parameters are as sparse and sane as possible

    $ cat /proc/cmdline 
    initrd=\intel-ucode.img initrd=\initramfs-linux.img rw rd.luks.uuid=145628bb-0138-4b8b-bc94-2d041c756539 rd.luks.name=145628bb-0138-4b8b-bc94-2d041c756539=lvm root=/dev/lvmvg/root quiet
  2. Disabling swap because of issues with LVM swap

    1. $ cat /etc/fstab
      # Static information about the filesystems.
      # See fstab(5) for details.
      
      # <file system> <dir> <type> <options> <dump> <pass>
      # /dev/mapper/lvmvg-root
      UUID=584e8363-4e34-43fa-9ce1-aa71aa05ee24       /               ext4            rw,relatime,data=ordered            0 1
      
      # /dev/sdb1
      UUID=2F98-463D          /boot           vfat            rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro    0 2
      
      # /dev/mapper/lvmvg-var
      UUID=25ba1aa0-afca-499e-833e-36ed09591444       /var            ext4            rw,relatime,data=ordered            0 2
      
      # /dev/mapper/lvmvg-home
      UUID=36fd8e5e-1099-4559-83cb-b8e9a641d6b7       /home           ext4            rw,relatime,data=ordered            0 2
      
      # /dev/mapper/lvmvg-swap
      #UUID=c5ad34f4-4b2f-46be-9836-f3b1b87571d4      none            swap            defaults,discard        0 0
    2. $  free -m
                    total        used        free      shared  buff/cache   available
      Mem:           7828        1085        5142         394        1600        6114
      Swap:             0           0           0
  3. Trying different DEs (Plasma and i3), still happens.

  4. Trying different kernels, but also happens on zen and lts.

  5. I did change from modesetting back to the intel driver in hopes that may have been the issue, but I still get the freezes/kernel-panics.

    1. $ cat /etc/X11/xorg.conf.d/20-intel.conf 
      Section "Device"
          Identifier  "Intel Graphics"
          Driver      "intel"
          Option      "DRI"          "3"   # DRI3 is now default 
          Option      "AccelMethod"  "sna" # default
          #Option      "AccelMethod"  "uxa" # fallback
          Option      "TearFree"      "true"  # SNA may cause tearing,  enable "TearFree" to fix
       EndSection
  6. I've disabled any settings from powertop being applied (no TLP etc)

    $ systemctl list-dependencies 
    default.target
    ● ├─sddm.service
    ● └─multi-user.target
    ●   ├─dbus.service
    ●   ├─lm_sensors.service
    ●   ├─man-db.timer
    ●   ├─systemd-ask-password-wall.path
    ●   ├─systemd-logind.service
    ●   ├─systemd-networkd.service
    ●   ├─systemd-resolved.service
    ●   ├─systemd-user-sessions.service
    ●   ├─updatedb.timer
    ●   ├─wpa_supplicant@wifi0.service
    ●   ├─basic.target
    ●   │ ├─-.mount
    ●   │ ├─tmp.mount
    ●   │ ├─var.mount
    ●   │ ├─paths.target
    ●   │ ├─slices.target
    ●   │ │ ├─-.slice
    ●   │ │ └─system.slice
    ●   │ ├─sockets.target
    ●   │ │ ├─dbus.socket
    ●   │ │ ├─dm-event.socket
    ●   │ │ ├─dnscrypt-proxy.socket
    ●   │ │ ├─systemd-coredump.socket
    ●   │ │ ├─systemd-initctl.socket
    ●   │ │ ├─systemd-journald-audit.socket
    ●   │ │ ├─systemd-journald-dev-log.socket
    ●   │ │ ├─systemd-journald.socket
    ●   │ │ ├─systemd-networkd.socket
    ●   │ │ ├─systemd-udevd-control.socket
    ●   │ │ └─systemd-udevd-kernel.socket
    ●   │ ├─sysinit.target
    ●   │ │ ├─dev-hugepages.mount

Additionally, there are many times the system freezes almost instantly after entering my LUKS password - actually just happened two times in a row: (Neither will leave any logs as it happens before journald is up.)

  1. First time the hang was on "A start job is running for dev-lvmvg-root.device (7s / 1min 30s)";

  2. The second time on "A start job is running for Cryptography Setup for LVM (8s / no limit)".

There doesn't appear to be any consistent behavior as to when the system freezes; though there are some actions that seem to be anecdotally linked.

  1. Attempting to use bash auto-complete on a command that has a lot of potential matches to return (for example, trying to auto complete "systemctl status"+[tab to autocomplete] )

  2. Starting the Falkon web browser

+++++++edit++++++++++++
After much pain and gnashing of teeth my issues appear to be related to IOMMU. In my case these are the kernel parameters that allow me to at least use my system without it hard-freezing every other 1-2 mins. Don't get me wrong, it still freezes, but I'm now able to go much longer in between freezes.

iommu=off intel_iommu=off acpi_osi=Linux pci=nocrs pcie_aspm=off

I also turned off intel VT-x in my laptop's BIOS/UEFI settings.

$ journalctl -b |ag kvm:
May 20 16:34:55 5510 kernel: kvm: disabled by bios
May 20 16:34:55 5510 kernel: kvm: disabled by bios

Last edited by CarbonChauvinist (2018-05-21 01:03:36)


"the wind-blown way, wanna win? don't play"

Offline

#28 2018-05-21 01:10:08

qrwteyrutiyoup
Member
From: Canada
Registered: 2017-12-26
Posts: 17

Re: [Solved] System freeze randomly

Out of curiosity, have you people tried out linux-lts (currently 4.14.x) instead of the regular linux (currently 4.16.x) to see if this issue is also present?

Offline

#29 2018-05-21 03:27:09

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

@qrwteyrutiyoup, yes - in my case also happens with LTS. If I boot LTS with "iommu=off" also I can at least run htop for a few minutes at a time without locking up.

But, for instance even with "iommu=off" LTS just locked up again when I was just trying to auto-complete in bash (Konsole). "systemctl status system"+[tab] resulted in kernel panic (hard freeze, cant' switch ttys, and blinking capslock - I have to long-hold the power button down to reboot)


"the wind-blown way, wanna win? don't play"

Offline

#30 2018-05-21 17:24:41

qrwteyrutiyoup
Member
From: Canada
Registered: 2017-12-26
Posts: 17

Re: [Solved] System freeze randomly

CarbonChauvinist wrote:

@qrwteyrutiyoup, yes - in my case also happens with LTS. If I boot LTS with "iommu=off" also I can at least run htop for a few minutes at a time without locking up.

But, for instance even with "iommu=off" LTS just locked up again when I was just trying to auto-complete in bash (Konsole). "systemctl status system"+[tab] resulted in kernel panic (hard freeze, cant' switch ttys, and blinking capslock - I have to long-hold the power button down to reboot)


I see. I asked because I am experiencing something similar since recently: system freeze plus some kind of audio looping the last 1-2 sec. As I haven't used this laptop very often for a few months, I am guessing it's something recent, as I certainly didn't experience this when I used it heavily in past.

I currently have

intel_iommu=igfx_off

in my boot params, and right now I am testing LTS. No freezes yet (few hours already; with linux, I was having almost one freeze per hour, on average.

You may have a different issue, as yours seems to trigger very quickly. Were you able to collect logs via netconsole/serial console or something like that, to help diagnose the issue when it panics? I am going to try netconsole in a couple days, and hopefully, it can provide helpful info.

Offline

#31 2018-05-21 20:29:58

The Loko
Member
From: Spain
Registered: 2014-07-23
Posts: 100

Re: [Solved] System freeze randomly

I also have the same issue, I have a desktop PC with i5-2200. I'm not sure when this happened first time but it doesn't happens every day, maybe weekly or so, it usually happens on the first hours after power on. I have noticed that sometimes the USB power is cut (USB devices are powered off) and sometimes isn't. I can't notice anything relevant on the logs.
My laptop works fine without any issues.

Last edited by The Loko (2018-05-21 20:30:41)

Offline

#32 2018-05-22 19:18:26

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

qrwteyrutiyoup wrote:

I currently have

intel_iommu=igfx_off

in my boot params, and right now I am testing LTS. No freezes yet (few hours already; with linux, I was having almost one freeze per hour, on average.

You may have a different issue, as yours seems to trigger very quickly. Were you able to collect logs via netconsole/serial console or something like that, to help diagnose the issue when it panics? I am going to try netconsole in a couple days, and hopefully, it can provide helpful info.

Thanks, I'm going to have to try out netconsole or something for further debugging; as it currently stands my laptop is unusable with Arch which is heartbreaking and incredibly frustrating. I'll try netconsole and will read through the Boot Problems as well and try and get some more insight this weekend.

My gut feeling is that this is something to do with the buggy ACPI for my laptop (I'm on the latest version 1.7.0) and/or APIC issues? I'm unable to even boot with "acpi=off" though as I get no keyboard response and am unable to enter my LUKS password; booting with "acpi=ht" does help, but I will still invariably get a freeze at some point. Additionally "apic=off" also seemed to really help as well. I've tried "lapic=off" but that fubars the processor counts and only shows one processor core instead of the four cores I have (I have the i5-6440HQ, which is quad core, but without HT).

"irqfixup" also may be helping??? But again, I'm unsure and am really just stabbing in the dark trying different options. The combinations are too many to really get a hold on what's actually helping or what may be making things worse.

In general IOMMU, ACPI, and APIC kernel options are all the ones that seem to have some positive effect on this debilitating problem for me. But, I'm kinda over just trying random settings and never really getting anywhere.


"the wind-blown way, wanna win? don't play"

Offline

#33 2018-05-23 15:44:00

pkejr
Member
Registered: 2018-04-30
Posts: 20

Re: [Solved] System freeze randomly

Hello,

So after 13 days with no freeze, I decided to reboot my computer. And freezes are back. I have no idea what I did to stop them earlier...

Offline

#34 2018-05-25 22:29:28

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

So, in my case at least, I think it's related to c-state transitions; this thread Random freezes - Intel sums up what may be happening to my box.

Basically adding

intel_idle.max_cstate=2

to my kernel params allows me to use my laptop without any freezes. As soon as I increment the max_cstate any higher than 2 the freezes return.

It's funny because this is said to be an issue with Baytrail, but to my knowledge I'm Skylake

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz
Stepping:            3
CPU MHz:             818.802
CPU max MHz:         3500.0000
CPU min MHz:         800.0000
BogoMIPS:            5186.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

This works in that I can actually use my box again, but the battery life is much reduced, so it's not ideal at all and I don't really consider it a long term solution. I need to be able to get back to higher cstates.

Last edited by CarbonChauvinist (2018-05-25 22:30:14)


"the wind-blown way, wanna win? don't play"

Offline

#35 2018-05-26 06:27:49

HaCeMei
Member
Registered: 2013-03-24
Posts: 37

Re: [Solved] System freeze randomly

CarbonChauvinist wrote:

It's funny because this is said to be an issue with Baytrail, but to my knowledge I'm Skylake

There are reports on Broadwell, too.

This works in that I can actually use my box again, but the battery life is much reduced, so it's not ideal at all and I don't really consider it a long term solution. I need to be able to get back to higher cstates.

There is a script to disable C6 state only which seems to help

https://bugzilla.kernel.org/show_bug.cgi?id=109051#c434

Edit: Today I had another freeze with the script. So I returned to the bootoption.

Last edited by HaCeMei (2018-05-26 21:25:46)


No new thing under the sun

Offline

#36 2018-05-29 00:52:05

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

@HaCeMei, did you revise the script for your system? It was specific to Baytrail systems. I tried it out for a bit after tailoring to my system and discovered that C3 was one of the c-states that, if enabled, caused the freezes. I got really good results with just disabling C3.

cat scripts/c3off_formal.sh 
#!/bin/sh

#title:       c3off_formal.sh
#description: Disables all C3 core states
#original_author:      Wolfgang Reimer <linuxball (at) gmail.com>
#date:        2016014
#version:     1.0    
#usage:       sudo <path>/c3off_official.sh
#notes:       Intended as test script to verify whether erratum VLP52 (see
#             [1]) is the root cause for kernel bug 109051 (see [2]). In order
#             for this to work you must _NOT_ use boot parameter
#             intel_idle.max_cstate=<number>.
#
# [1] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
# [2] https://bugzilla.kernel.org/show_bug.cgi?id=109051

# Disable ($1 == 1) or enable ($1 == 0) core state, if not yet done.
disable() {
        local action
        read disabled <disable
        test "$disabled" = $1 && return
        echo $1 >disable || return
        action=ENABLED; test "$1" = 0 || action=DISABLED
        printf "%-8s state %7s for %s.\n" $action "$name" $cpu  
}

# Iterate through each core state and for Baytrail (BYT) disable all C6
# and enable all C7 states.
cd /sys/devices/system/cpu
for cpu in cpu[0-3]*; do
        for dir in $cpu/cpuidle/state*; do
                cd "$dir"
                read name <name
                case $name in
              C3) disable 1;;
            #C6) disable 1;;
            #C7s) disable 1;;
            #C8) disable 1;;
            #C9) disable 1;;
            #C10) disable 1;;
                esac
                cd ../../..
        done
done

However, I would still get freezes. Doing some more google-fu about Skylake power management issues with Linux lead me to trying to disable Intel pstates. Passing the following kernel parameter

intel_pstate=disable

seems to have solved the problem for me currently. I tried incrementally going through the stages of disabling the intel_pstate driver (intel_pstate=no_hwp and intel_pstate=passive), but both would freeze eventually. So far completely disabling Intel pstates and falling back on acpi-cpufreq has worked.

You can verify which scaling driver is being used by:

$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver 
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq

According to Intel_pstate CPU Performance Scaling Driver there isn't a real performance difference between intel_pstate and acpi-cpufreq.

I'll continue to test this out, but so far the results are really promising as I still have all the power saving benefits of high cstates since I'm not limiting the max_cstate  anymore.

Just to be clear in case this is of use to anyone else here's my intel card specs:

$ sudo lspci -nnv -s 00:02.0
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Dell HD Graphics 530 [1028:06e5]
        Flags: bus master, fast devsel, latency 0, IRQ 123
        Memory at db000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 90000000 (64-bit, prefetchable) [size=256M]
        I/O ports at f000 [size=64]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [100] Process Address Space ID (PASID)
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [300] Page Request Interface (PRI)
        Kernel driver in use: i915
        Kernel modules: i915

My current kernel:

$ uname -a
Linux 5510 4.16.11-1-ARCH #1 SMP PREEMPT Tue May 22 21:40:27 UTC 2018 x86_64 GNU/Linux

And my current kernel cmdline:

$ cat /proc/cmdline 
initrd=\intel-ucode.img initrd=\initramfs-linux.img rw rd.luks.uuid=145628bb-0138-4b8b-bc94-2d041c756539 rd.luks.name=145628bb-0138-4b8b-bc94-2d041c756539=lvm root=/dev/lvmvg/root quiet acpi_osi=! acpi_osi="!Windows 2012" acpi_backlight=vendor intel_pstate=disable

I am loading the latest HuC and GuC firmware:

journalctl -b |ag "HuC|GuC"
May 28 20:24:33 archlinux kernel: [drm] HuC: Loaded firmware i915/skl_huc_ver01_07_1398.bin (version 1.7)
May 28 20:24:33 archlinux kernel: [drm] GuC: Loaded firmware i915/skl_guc_ver9_33.bin (version 9.33)
May 28 20:24:33 archlinux kernel: i915 0000:00:02.0: GuC firmware version 9.33
May 28 20:24:33 archlinux kernel: i915 0000:00:02.0: GuC submission enabled
May 28 20:24:33 archlinux kernel: i915 0000:00:02.0: HuC enabled

Last edited by CarbonChauvinist (2018-05-29 01:06:24)


"the wind-blown way, wanna win? don't play"

Offline

#37 2018-05-30 12:35:44

pkejr
Member
Registered: 2018-04-30
Posts: 20

Re: [Solved] System freeze randomly

I tried with the suggestions you did (adding intel_pstate=disable) but my computer froze after a few hours.

Offline

#38 2018-05-30 15:26:54

HaCeMei
Member
Registered: 2013-03-24
Posts: 37

Re: [Solved] System freeze randomly

CarbonChauvinist wrote:

@HaCeMei, did you revise the script for your system? It was specific to Baytrail systems.

I happen to have a Baytrail processor (Celeron N2940).


No new thing under the sun

Offline

#39 2018-05-30 23:26:22

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

@pkejr - welp, the only other things I can suggest are checking all the current values for your i915 module to see the current parameter values.

$ sudo systool -m i915 -av

Then checking what parameters are available on your hardware

$ sudo modinfo -p i915

Start with adjusting values that are set on your box that are not defaults and focus on powersaving ones as well too. With any luck you'll zero in on something. If you're doing early KMS remember you have to pass your /etc/modprobe.d/1915.conf to your mkinitcpio.conf files section.

Maybe start looking at these specifically?

$ sudo modinfo -p i915 | ag "_dc|power_well"
enable_dc:Enable power-saving display C-states. (-1=auto [default]; 0=disable; 1=up to DC5; 2=up to DC6) (int)
disable_power_well:Disable display power wells when possible (-1=auto [default], 0=power wells always on, 1=power wells disabled when possible) (int)

Otherwise you'll need to work your way though Kernel Parameters to see if you can find anything that may help - I don't envy you as I know how frustrating this is.

@HaCeMei; I see, well you may have to stick with the kernel boot option then. Have you checked all the available cstates on your box? What's the output of

$ sudo ls /sys/devices/system/cpu/cpu*/cpuidle

and

$ sudo cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name

I'm just thankful I was able to finally fix my freezing (I've even enabled TLP without issue), for a long while now (since kernel 4.9? or maybe 4.11?). Recently the freezing was getting so frustrating I didn't even want to boot my box.


"the wind-blown way, wanna win? don't play"

Offline

#40 2018-06-13 08:23:19

tommykr
Member
From: Poland
Registered: 2009-03-15
Posts: 51

Re: [Solved] System freeze randomly

pkejr wrote:

Otherwise if I don't get any answer from you ArchLinux users, then I will have to say goodbye to Arch and use another (more stable) distribution.

I am afraid that changing linux distribution will not do anything. I have the same problem as you and randomly freezes occurred on Archlinux, OpenSUSE, Manjaro, Debian...

Offline

#41 2018-06-13 11:10:12

pkejr
Member
Registered: 2018-04-30
Posts: 20

Re: [Solved] System freeze randomly

Then that's unfortunate. I could not find any solution yet.

Offline

#42 2018-06-15 19:10:36

CYI
Member
Registered: 2018-06-15
Posts: 1

Re: [Solved] System freeze randomly

Hi,

I have got nearly the same issues with my Dell Inspiron 15 7577.
Random freezes with no other chance than rebooting the system with the power button (rebooting/shutting down normally doesn't work at all), screen freeze, repeating the same few seconds audio over and over again...

Adding "i915. semaphores=1" to the boot parameters (/etc/default/grub -> GRUB_CMDLINE_LINUX_DEFAULT="... i915. semaphores=1") works for me.

https://wiki.archlinux.org/index.php/In … tel_driver

Offline

#43 2018-06-15 22:03:22

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

@CYI, what's your processor and video card? What kernel are you running? Are you using modesetting or xf86-video-intel? What's your /etc/X11/xorg.conf.d/*.conf file(s)?

More generally, if running modinfo on a module does not show a parameter as being available - can you still modify said parameter?

I wonder because I don't think semaphores is an available parameter anymore on the current i915 module in 4.16.x:

$ sudo modinfo -p i915
modeset:Use kernel modesetting [KMS] (0=disable, 1=on, -1=force vga console preference [default]) (int)
panel_ignore_lid:Override lid status (0=autodetect, 1=autodetect disabled [default], -1=force lid closed, -2=force lid open) (int)
enable_dc:Enable power-saving display C-states. (-1=auto [default]; 0=disable; 1=up to DC5; 2=up to DC6) (int)
enable_fbc:Enable frame buffer compression for power savings (default: -1 (use per-chip default)) (int)
lvds_channel_mode:Specify LVDS channel mode (0=probe BIOS [default], 1=single-channel, 2=dual-channel) (int)
panel_use_ssc:Use Spread Spectrum Clock with panels [LVDS/eDP] (default: auto from VBT) (int)
vbt_sdvo_panel_type:Override/Ignore selection of SDVO panel mode in the VBT (-2=ignore, -1=auto [default], index in VBT BIOS table) (int)
reset:Attempt GPU resets (0=disabled, 1=full gpu reset, 2=engine reset [default]) (int)
vbt_firmware:Load VBT from specified file under /lib/firmware (charp)
error_capture:Record the GPU state following a hang. This information in /sys/class/drm/card<N>/error is vital for triaging and debugging hangs. (bool)
enable_hangcheck:Periodically check GPU activity for detecting hangs. WARNING: Disabling this can cause system wide hangs. (default: true) (bool)
enable_ppgtt:Override PPGTT usage. (-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full with extended address space) (int)
enable_psr:Enable PSR (0=disabled, 1=enabled - link mode chosen per-platform, 2=force link-standby mode, 3=force link-off mode) Default: -1 (use per-chip default) (int)
alpha_support:Enable alpha quality driver support for latest hardware. See also CONFIG_DRM_I915_ALPHA_SUPPORT. (bool)
disable_power_well:Disable display power wells when possible (-1=auto [default], 0=power wells always on, 1=power wells disabled when possible) (int)
enable_ips:Enable IPS (default: true) (int)
fastboot:Try to skip unnecessary mode sets at boot time (default: false) (bool)
prefault_disable:Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only. (bool)
load_detect_test:Force-enable the VGA load detect code for testing (default:false). For developers only. (bool)
force_reset_modeset_test:Force a modeset during gpu reset for testing (default:false). For developers only. (bool)
invert_brightness:Invert backlight brightness (-1 force normal, 0 machine defaults, 1 force inversion), please report PCI device ID, subsystem vendor and subsystem device ID to dri-devel@lists.freedesktop.org, if your machine needs it. It will then be included in an upcoming module version. (int)
disable_display:Disable display (default: false) (bool)
enable_cmd_parser:Enable command parsing (true=enabled [default], false=disabled) (bool)
mmio_debug:Enable the MMIO debug code for the first N failures (default: off). This may negatively affect performance. (int)
verbose_state_checks:Enable verbose logs (ie. WARN_ON()) in case of unexpected hw state conditions. (bool)
nuclear_pageflip:Force enable atomic functionality on platforms that don't have full support yet. (bool)
edp_vswing:Ignore/Override vswing pre-emph table selection from VBT (0=use value from vbt [default], 1=low power swing(200mV),2=default swing(400mV)) (int)
enable_guc:Enable GuC load for GuC submission and/or HuC load. Required functionality can be selected using bitmask values. (-1=auto, 0=disable [default], 1=GuC submission, 2=HuC load) (int)
guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int)
guc_firmware_path:GuC firmware path to use instead of the default one (charp)
huc_firmware_path:HuC firmware path to use instead of the default one (charp)
enable_dp_mst:Enable multi-stream transport (MST) for new DisplayPort sinks. (default: true) (bool)
inject_load_failure:Force an error after a number of failure check points (0:disabled (default), N:force failure at the Nth failure check point) (uint)
enable_dpcd_backlight:Enable support for DPCD backlight control (default:false) (bool)
enable_gvt:Enable support for Intel GVT-g graphics virtualization host support(default:false) (bool)

$ sudo modinfo -p i915 | ag -i semaphore

"the wind-blown way, wanna win? don't play"

Offline

#44 2018-06-15 22:31:38

loqs
Member
Registered: 2014-03-06
Posts: 17,171

Re: [Solved] System freeze randomly

@CarbonChauvinist the semaphore parameter was dropped for 4.16 https://git.kernel.org/pub/scm/linux/ke … 06f811587b

Offline

#45 2018-06-15 23:32:50

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

@loqs, I see thanks for that.

Having disabled the broken semaphores on Sandybridge, there is no need
for a modparam any more, so remove it in favour of a simple
HAS_LEGACY_SEMAPHORES() guard.

If I understand the commit description quoted above correctly @CFI either has a Sandybridge system (unlikely), or the i915.semaphores=1 kernel parameter is actually not doing anything and something else has fixed his issue. I suppose he could be running LTS kernel though.


"the wind-blown way, wanna win? don't play"

Offline

#46 2018-06-19 11:05:55

Ellypsis
Member
From: QC, Canada
Registered: 2013-01-31
Posts: 6
Website

Re: [Solved] System freeze randomly

Hi,

I also have this problem on a Thinkpad X1 Carbon 3rd Gen.
I tried Linux and Linux-CK kernel, but I had to "hard reboot" after the freeze.

Now, I boot on Linux LTS, and no more freeze. But if the problem comes from the kernel, we will have the freeze again, when the LTS kernel will include the current options of the kernel...

And like you, I don't know where to investigate, as there is no log at all.

I will continue to follow this thread.

Offline

#47 2018-06-19 11:12:16

pkejr
Member
Registered: 2018-04-30
Posts: 20

Re: [Solved] System freeze randomly

I have switched to LTS too, but I still have the freezes. Maybe it's less frequent, but I can't tell.
What's sure is that I can't rely on my laptop.

Offline

#48 2018-06-19 12:02:32

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [Solved] System freeze randomly

My suspicion is that it's an issue with a c-state transition on your box @pkejr. Have you tried limiting the max c-state via kernel command line? I went back through this thread and can't see you saying you tried this.

intel_idle.max_cstate=	[KNL,HW,ACPI,X86]
			0	disables intel_idle and fall back on acpi_idle.
			1 to 9	specify maximum depth of C-state.

My suggestion is to start with intel_idle.max_cstate=0 which completely disables intel_idle (also dramatically increases power consumption btw). If you're able to run with that for a while with no freezes then continue to increment the max cstate until you find the highest number at which your box freezes. Once you find that you can either continue to boot limiting the max c-state to that number, or you can try and selectively disable the specific cstates that may be troublesome.

What's the output of

$ ls -ld /sys/devices/system/cpu/cpu0/cpuidle/state*

and

$ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name

"the wind-blown way, wanna win? don't play"

Offline

#49 2018-06-19 13:08:08

mkbodanu4
Member
Registered: 2016-10-31
Posts: 6

Re: [Solved] System freeze randomly

Hi!
I got similar issue, but on desktop build, and it confuses a lot.

Computer Information:
CPU: Intel Core i5-7600k @ 4.6GHz (3.8 GHz Cache, 1.26V, Integrated Video Disabled)
GPU: NVidia GTX 1050 Ti (thee monitors connected)
RAM: 2 x 8GB DDR4 2666 (OC from 2400), 17-17-17-39 (Memtest86 report no errors in 20 tests)
SSD: 64Gb Toshiba for / (SMART shows good status)
HDD: 1Tb for /home and /srv (SMART shows good status too)

First time system freezes was 3 weeks ago, keyboard and mouse have not respond to any action. Since that time no such issues was catched but in last 3 days I getting lot of freezes.
No errors in journal nor Xorg.
If video played in browser it repeat last 2 seconds, but not when in spotify.

But today I got pc freeze when worked in phpStorm, two of 3 screens have become black and in journal last lines are:

чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: #
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # A fatal error has been detected by the Java Runtime Environment:
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: #
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: #  SIGSEGV (0xb) at pc=0x00007ffc1412ca57, pid=4932, tid=0x00007fd1fc3c9700
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: #
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # JRE version: OpenJDK Runtime Environment (8.0_152-b39) (build 1.8.0_152-release-1136-b39)
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # Java VM: OpenJDK 64-Bit Server VM (25.152-b39 mixed mode linux-amd64 compressed oops)
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # Problematic frame:
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # C  [linux-vdso.so.1+0xa57]
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: #
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # Core dump written. Default location: /home/mkbodanu4/core or core.4932
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: #
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # An error report file with more information is saved as:
чер 19 15:24:02 mkwork3 jetbrains-phpstorm.desktop[4877]: # /home/mkbodanu4/java_error_in_PHPSTORM_4932.log

File

/home/mkbodanu4/java_error_in_PHPSTORM_4932.log

was not created.

I tried disable overclocking, tested ram and all disks, disabled all energy settings in UEFI Setup and installed linux-lts version - nothing help.

So please, how can I debug this on PC?

P.S. Sorry for bad English smile

Offline

#50 2018-06-22 06:24:03

pkejr
Member
Registered: 2018-04-30
Posts: 20

Re: [Solved] System freeze randomly

Hi again,
It's been around 60 hours I'm using my laptop as usual, and I disabled the cstates (intel_idle.max_cstate=0).
It looks like that might be it, thanks CarbonChauvist for reminding me to try that (I thought I had tried ...)
Now because it's falling back to acpi_idle, I have some ACPI errors but it doesn't seem to be fatal because I didn't have any freeze yet.

[71843.696211] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170728/psargs-364)
[71843.696216] ACPI Error: Method parse/execution failed \_PR.CPU3._CST, AE_NOT_FOUND (20170728/psparse-550)

(This message appears at least once a hour, so it's filling my dmesg logs)

I will try if I get some more time to play with the intel_idle.max_cstate option (try to increment it and see if it freezes or not).

Thanks again! (Will mark as Solved if I don't get any freeze for at least 2 weeks).

Last edited by pkejr (2018-06-22 06:25:02)

Offline

Board footer

Powered by FluxBB