You are not logged in.

#1 2016-04-15 19:06:07

mayhape
Member
Registered: 2015-10-09
Posts: 8

NMI Watchdog Soft lockup Linux 4.5.0-1 SMP x86_64

Hi all

Since a couple of days I've been getting intermittent soft lockups which require restarting my computer.

Journalctl reports the bug as happening in GLRenderThread (when it happened today the pc was idling).

Apr 13 14:32:13 pc-main kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [GLRenderThread:3265]
Apr 13 14:32:41 pc-main kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [GLRenderThread:3265]
Apr 13 14:33:09 pc-main kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [GLRenderThread:3265]
Apr 13 14:33:37 pc-main kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [GLRenderThread:3265]
Apr 13 14:34:05 pc-main kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [GLRenderThread:3265]

Using the open source ATI driver

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: X.Org (0x1002)
    Device: AMD TAHITI (DRM 2.43.0, LLVM 3.7.1) (0x6798)
    Version: 11.2.0
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.1
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0, LLVM 3.7.1)
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.0
OpenGL core profile shading language version string: 4.10

Anyone else experiencing similar issues?

Edit: Seems I'm not the only one - just seen: https://bbs.archlinux.org/viewtopic.php?id=211402

Last edited by mayhape (2016-04-16 14:50:53)

Offline

#2 2016-04-17 19:41:01

kinghol
Member
Registered: 2014-02-28
Posts: 9

Re: NMI Watchdog Soft lockup Linux 4.5.0-1 SMP x86_64

Same probleme arch-lts
Graphic intel first gen+ ati 5580hd
Using open source drivers

Offline

#3 2016-04-20 12:37:18

sinatosk
Member
Registered: 2010-11-28
Posts: 107

Re: NMI Watchdog Soft lockup Linux 4.5.0-1 SMP x86_64

happens with me too and I'm using lts 4.4.7 with nvidia drivers

I use sddm as my display manager which is turns lead me to kde plasma and everytime I shutdown sddm or shutdown my system... I see

NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [QSGRenderThread:xxxx]

Last edited by sinatosk (2016-04-20 12:38:30)

Offline

#4 2016-04-27 00:23:57

hexadecagram
Member
Registered: 2011-05-20
Posts: 61

Re: NMI Watchdog Soft lockup Linux 4.5.0-1 SMP x86_64

4.4.8-1-lts here. I am seeing similar logs which persistently reflect a lock up, requiring me to power cycle every time my monitors come back form sleep.

I tried adding "nmi_watchdog=0" to my kernel options and it didn't seem to have any effect. I have not tried the sysctl.

Here's a few links that have turned up via Google:

Here's the relevant lines from my journalctl -b (emphasis mine):

Apr 26 16:24:29 hephaistos /usr/lib/gdm/gdm-x-session[1699]: (II) NVIDIA(0): Setting mode "NULL"
Apr 26 16:24:29 hephaistos kernel: usb 3-4.4: new high-speed USB device number 33 using xhci_hcd
Apr 26 16:24:29 hephaistos kernel: snd_hda_codec_hdmi hdaudioC1D0: HDMI: invalid ELD data byte 51
Apr 26 16:24:29 hephaistos kernel: snd_hda_codec_hdmi hdaudioC1D0: HDMI: invalid ELD data byte 0
Apr 26 16:24:29 hephaistos kernel: hub 3-4.4:1.0: USB hub found
Apr 26 16:24:29 hephaistos kernel: hub 3-4.4:1.0: 2 ports detected
Apr 26 16:24:30 hephaistos root[5782]: ACPI group/action undefined: jack/lineout / LINEOUT
Apr 26 16:24:30 hephaistos root[5784]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Apr 26 16:24:31 hephaistos kernel: snd_hda_codec_hdmi hdaudioC1D0: HDMI: invalid ELD data byte 0
Apr 26 16:24:34 hephaistos rtkit-daemon[1556]: Supervising 3 threads of 1 processes of 1 users.
Apr 26 16:24:34 hephaistos rtkit-daemon[1556]: Successfully made thread 5786 of process 1772 (/usr/bin/pulseaudio) owned by '1000' RT at priority 5.
Apr 26 16:24:34 hephaistos rtkit-daemon[1556]: Supervising 4 threads of 1 processes of 1 users.
Apr 26 16:24:52 hephaistos kernel: usb 3-4.4: USB disconnect, device number 33
Apr 26 16:24:55 hephaistos kernel: usb 3-4.4: new high-speed USB device number 34 using xhci_hcd
Apr 26 16:24:55 hephaistos kernel: hub 3-4.4:1.0: USB hub found
Apr 26 16:24:55 hephaistos kernel: hub 3-4.4:1.0: 2 ports detected
Apr 26 16:24:56 hephaistos kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [Xorg:1701]

A few questions to consider:

  • Are you using xfce4? I am. I was using Enlightenment until yesterday and wasn't experiencing this issue.

  • Which display manager are you using? I am using gdm.

  • Are you using multiple monitors? In my case, unplugging and re-plugging the monitor does not revive my system. Also, if I leave the monitors asleep for a while, the system slowly begins becoming unresponsive (it eventually panics and systemctl reboot fails), and I start hearing a periodic "click" from one of the monitors every 10-15 seconds or so, from monitor #3 (the one on my right) even though monitor #2 (center) is configured as the default sink in PulseAudio.

  • Are you using DisplayPort? I am (all 3 monitors). Interestingly, I have a notebook with a GTX980M that is not having this issue. I use it in dual display mode, using both the laptop screen and an external monitor via HDMI.

I have disabled sleep mode on my monitors (Settings Manager > Power Manager > Display > Handle display power management = unset) and hopefully the situation improves until I can investigate further.

Last edited by hexadecagram (2016-04-27 05:00:08)

Offline

#5 2016-04-27 05:19:51

hexadecagram
Member
Registered: 2011-05-20
Posts: 61

Re: NMI Watchdog Soft lockup Linux 4.5.0-1 SMP x86_64

I had some time tonight to toy with gdm a bit to see if it was the source of the error. My conclusion is that it is not.

First a few more details about my configuration:

  • [Left (DP-2): ASUS PB287Q

  • Center (DP-0): ASUS MG28UQ

  • Right (DP-4): ASUS PB287Q

Here are some observations:

  • Despite having configured xfce4 to be power-management-free (above), gdm sleeps all 3 monitors on idle, and waking them results in no NMI watchdlog logs. I went through a sleep-wake cycle 10 times and didn't see a single log. I did, however, see the invalid ELD data byte log a few times.

  • A strange side-effect is that sometimes all 3 monitors would wake, and sometimes only 2 would: DP-2 and DP-0.

  • To further confuse things,, the virtual desktop occasionally mixes up whether DP-2 is placed to the left or the right of DP-0.

I'm curious to know if waking other DMs result in soft lockups. Any takers?

Last edited by hexadecagram (2016-04-27 06:43:55)

Offline

#6 2016-07-24 09:17:52

hexadecagram
Member
Registered: 2011-05-20
Posts: 61

Re: NMI Watchdog Soft lockup Linux 4.5.0-1 SMP x86_64

After a few months of dealing with this, I think I nailed down what is causing this on my own machine: the nvidia driver.

The solution (for me) was:

  • add the following kernel options to /etc/default/grub (note that if you use a different bootloader then "64K\\\$0" will likely need less/more backslashes depending on how its configuration file is parsed):

    memmap=64K\\\$0 memory_corruption_check=0 nvidia-drm.modeset=1
  • add the following to the MODULES line in /etc/mkinitcpio.conf:

    nvidia nvidia_modeset nvidia_uvm nvidia_drm

Now my monitors no longer lock up and I can safely re-enable power management with no problems.

I'm not 100% sure that memmap should be set and the memory corruption check needs to be disabled, but it seems to help in my case. For more information, see this post.

The NVIDIA kernel options and ramdisk configuration is documented here.

Another thing you may want to experiment with is to set the

nmi_watchdog=0

kernel option and add the following to /etc/modprobe.d/nmi_watchdog.conf:

blacklist iTCO_wdt
blacklist iTCO_vendor_support

See this thread for more details.

I would recommend having a good, long look at the ATI wiki page for starters. In my case it turned out that I really should have RT-whole-FM.

Also, are you experiencing kernel panics with the errors? I was. That will help you determine what is causing the error. Be sure to read and try to understand the output of those panics.

HTH

Last edited by hexadecagram (2016-07-24 09:40:10)

Offline

Board footer

Powered by FluxBB