You are not logged in.

#1 2018-01-28 13:40:19

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Soft lockup - CPU stuck on ethernet plug in

Hi everyone,

I am running Arch on an older MacBook Pro (5,5, mid 2009), and have been running into problems with regard to ethernet connections.
Whenever I plug in an ethernet cable after the boot splash, or if I boot up with a cable connected, my system becomes unresponsive and sporadically shows soft or hard lockup messages of the sort:

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd-journal:314]
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

though without any sort of backtrace.
This started a few months ago and I have not been able to figure out what is happening.

Since the Macbook is a bit older, my first guess would have been a hardware failure somewhere around the ethernet controller, but everything works fine in Mac OSX, as well as in the Arch live system. Therefore I have just tried to setup Arch again from scratch, but the problem almost immediately reappeared.

The ethernet chip in question is

00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1)

Does anybody have a good idea, what could be causing this or how I could find out more about the source of the problem?
I'd really appreciate any help.

Offline

#2 2018-01-28 15:31:09

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,424

Re: Soft lockup - CPU stuck on ethernet plug in

Do you have any of the power saving utilities like powertop, TLP, laptop-mode tools or similar running? You might have to blacklist the ethernet controller explicitly, if that is the case.

Offline

#3 2018-01-28 16:32:33

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

No, at least not that I'm aware of. I had tried tlp for a short while, but that was causing other problems so I threw it off again.

Offline

#4 2018-01-28 16:57:27

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

systemctl list-unit-files --state=enabled

Online

#5 2018-01-28 17:04:01

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

UNIT FILE        STATE  
autovt@.service  enabled
dhcpcd.service   enabled
getty@.service   enabled
mbpfan.service   enabled
remote-fs.target enabled

5 unit files listed.

EDIT: mbpfan does not seem to be the culprit, since nothing changes, when it is disabled.

Last edited by Hausdorff (2018-01-28 17:11:52)

Offline

#6 2018-01-28 17:21:31

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

If anything it's dhcpcd - I had rather expected sth. like networkmanager ... :-(

Online

#7 2018-01-28 17:35:27

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

Thank you, you are right, it is dhcpcd. When it's disabled, my system doesn't lock up anymore. Though for some reason, it is then also fine to enable dhcpcd for the ethernet interface manually without crashing...

Offline

#8 2018-01-28 17:49:38

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

Unplug the cable, run "dhcpcd -d" directly in an interaactive shell, replug the cable and -maybe- see what causes the trouble.

Online

#9 2018-01-28 18:31:24

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

Nothing in particular seems to jump out. This is a full connect and disconnect.

kernel: forcedeth 0000:00:0a.0 enp0s10: link up
dhcpcd[6434]: enp0s10: carrier acquired
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
dhcpcd[6434]: enp0s10: IAID [...]
dhcpcd[6434]: enp0s10: adding address [...]
dhcpcd[6434]: enp0s10: pltime infinity, vltime infinity
dhcpcd[6434]: enp0s10: delaying IPv6 router solicitation for 0.4 seconds
dhcpcd[6434]: enp0s10: delaying IPv4 for 0.3 seconds
dhcpcd[6434]: enp0s10: reading lease `/var/lib/dhcpcd/dhcpcd-enp0s10.lease'
dhcpcd[6434]: enp0s10: rebinding lease of 192.168.178.105
dhcpcd[6434]: enp0s10: sending REQUEST (xid 0x37b33c3f), next in 3.8 seconds
dhcpcd[6434]: enp0s10: acknowledged 192.168.178.105 from 192.168.178.1
dhcpcd[6434]: enp0s10: probing address 192.168.178.105/24
dhcpcd[6434]: enp0s10: probing for 192.168.178.105
dhcpcd[6434]: enp0s10: ARP probing 192.168.178.105 (1 of 3), next in 1.3 seconds
dhcpcd[6434]: enp0s10: soliciting an IPv6 router
dhcpcd[6434]: enp0s10: delaying Router Solicitation for LL address
dhcpcd[6434]: enp0s10: sending Router Solicitation
dhcpcd[6434]: enp0s10: Router Advertisement from [...]
dhcpcd[6434]: enp0s10: adding address [...]/64
dhcpcd[6434]: enp0s10: pltime 3600 seconds, vltime 7200 seconds
dhcpcd[6434]: enp0s10: adding route to [...]/64
dhcpcd[6434]: enp0s10: adding default route via [...]
dhcpcd[6434]: enp0s10: waiting for Router Advertisement DAD to complete
dhcpcd[6434]: enp0s10: requesting DHCPv6 information
dhcpcd[6434]: enp0s10: delaying INFORM6 (xid 0x471739), next in 0.1 seconds
dhcpcd[6434]: enp0s10: broadcasting INFORM6 (xid 0x471739), next in 0.9 seconds
dhcpcd[6434]: enp0s10: REPLY6 received from [...]
dhcpcd[6434]: enp0s10: refresh in 86400 seconds
dhcpcd[6434]: enp0s10: writing lease `/var/lib/dhcpcd/dhcpcd-enp0s10.lease6'
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' INFORM6
dhcpcd[6434]: enp0s10: ARP probing 192.168.178.105 (2 of 3), next in 1.6 seconds
dhcpcd[6434]: enp0s10: Router Advertisement DAD completed
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' ROUTERADVERT
dhcpcd[6434]: enp0s10: ARP probing 192.168.178.105 (3 of 3), next in 2.0 seconds
dhcpcd[6434]: enp0s10: DAD completed for 192.168.178.105
dhcpcd[6434]: enp0s10: leased 192.168.178.105 for 864000 seconds
dhcpcd[6434]: enp0s10: renew in 432000 seconds, rebind in 756000 seconds
dhcpcd[6434]: enp0s10: writing lease `/var/lib/dhcpcd/dhcpcd-enp0s10.lease'
dhcpcd[6434]: enp0s10: adding IP address 192.168.178.105/24 broadcast 192.168.178.255
dhcpcd[6434]: enp0s10: adding route to 192.168.178.0/24
dhcpcd[6434]: enp0s10: adding default route via 192.168.178.1
dhcpcd[6434]: enp0s10: ARP announcing 192.168.178.105 (1 of 2), next in 2.0 seconds
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' BOUND
dhcpcd[6434]: enp0s10: ARP announcing 192.168.178.105 (2 of 2)
kernel: forcedeth 0000:00:0a.0 enp0s10: link down
dhcpcd[6434]: enp0s10: carrier lost
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' EXPIRE6
dhcpcd[6434]: enp0s10: deleting address [...]/64
dhcpcd[6434]: enp0s10: deleting default route via [...]
dhcpcd[6434]: enp0s10: deleting route to [...]/64
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' ROUTERADVERT
dhcpcd[6434]: enp0s10: deleting address [...]
dhcpcd[6434]: enp0s10: deleting default route via 192.168.178.1
dhcpcd[6434]: enp0s10: deleting route to 192.168.178.0/24
dhcpcd[6434]: enp0s10: deleting IP address 192.168.178.105/24
dhcpcd[6434]: enp0s10: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' EXPIRE

Running dhcpcd manually with the same options as in dhcpcd.service (i.e. /usr/bin/dhcpcd -q -b) is also not a problem.

Offline

#10 2018-01-28 18:57:28

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

Does a restarted service still cause the lockup?

Online

#11 2018-01-28 19:21:02

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

Yes, and it gets better: even after stopping the service and dhcpcd fully shutting down, I still get the lock up neutral

Offline

#12 2018-01-28 19:25:07

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

So dhcpcd has been a red herring?!

What module controlls the chip?

lspci -vs 00:0a.0

Online

#13 2018-01-28 19:26:47

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1)
        Subsystem: NVIDIA Corporation Apple iMac 9,1
        Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 26
        Memory at d3486000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 21e0 [size=8]
        Memory at d3489000 (32-bit, non-prefetchable) [size=256]
        Memory at d3489300 (32-bit, non-prefetchable) [size=16]
        Capabilities: [44] Power Management version 2
        Capabilities: [50] MSI: Enable+ Count=1/16 Maskable+ 64bit+
        Kernel driver in use: forcedeth
        Kernel modules: forcedeth

Offline

#14 2018-01-28 19:31:27

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

Ok, let's check the used parameters

systool -vm forcedeth

Online

#15 2018-01-28 19:35:03

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

Thank you for putting so much effort into this smile

Module = "forcedeth"

  Attributes:
    coresize            = "73728"
    initsize            = "0"
    initstate           = "live"
    refcnt              = "0"
    srcversion          = "1772C864C116D8F4FB2A3F3"
    taint               = ""
    uevent              = <store method only>

  Sections:
    .bss                = "0xffffffffc082d7c0"
    .data.unlikely      = "0xffffffffc082d450"
    .data               = "0xffffffffc082d0c0"
    .exit.text          = "0xffffffffc0827f22"
    .gnu.linkonce.this_module= "0xffffffffc082d480"
    .init.text          = "0xffffffffc06f5000"
    .note.gnu.build-id  = "0xffffffffc0828000"
    .orc_unwind         = "0xffffffffc082aeec"
    .orc_unwind_ip      = "0xffffffffc082a088"
    .parainstructions   = "0xffffffffc0828878"
    .rodata.str1.1      = "0xffffffffc08280b8"
    .rodata.str1.8      = "0xffffffffc08282f0"
    .rodata             = "0xffffffffc08288c0"
    .smp_locks          = "0xffffffffc0828024"
    .strtab             = "0xffffffffc06f80a0"
    .symtab             = "0xffffffffc06f6000"
    .text               = "0xffffffffc081e000"
    __bug_table         = "0xffffffffc082d1d8"
    __jump_table        = "0xffffffffc082d000"
    __mcount_loc        = "0xffffffffc082c488"
    __param             = "0xffffffffc0829f20"
    __verbose           = "0xffffffffc082d2c8"

Offline

#16 2018-01-28 19:42:40

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: Soft lockup - CPU stuck on ethernet plug in

Assuming the default parameters are used

module_param(max_interrupt_work, int, 0);
MODULE_PARM_DESC(max_interrupt_work, "forcedeth maximum events handled per interrupt");
module_param(optimization_mode, int, 0);
MODULE_PARM_DESC(optimization_mode, "In throughput mode (0), every tx & rx packet will generate an interrupt. In CPU mode (1), interrupts are controlled by a timer. In dynamic mode (2), the mode toggles between throughput and CPU mode based on network load.");
module_param(poll_interval, int, 0);
MODULE_PARM_DESC(poll_interval, "Interval determines how frequent timer interrupt is generated by [(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535.");
module_param(msi, int, 0);
MODULE_PARM_DESC(msi, "MSI interrupts are enabled by setting to 1 and disabled by setting to 0.");
module_param(msix, int, 0);
MODULE_PARM_DESC(msix, "MSIX interrupts are enabled by setting to 1 and disabled by setting to 0.");
module_param(dma_64bit, int, 0);
MODULE_PARM_DESC(dma_64bit, "High DMA is enabled by setting to 1 and disabled by setting to 0.");
module_param(phy_cross, int, 0);
MODULE_PARM_DESC(phy_cross, "Phy crossover detection for Realtek 8201 phy is enabled by setting to 1 and disabled by setting to 0.");
module_param(phy_power_down, int, 0);
MODULE_PARM_DESC(phy_power_down, "Power down phy and disable link when interface is down (1), or leave phy powered up (0).");
module_param(debug_tx_timeout, bool, 0);
MODULE_PARM_DESC(debug_tx_timeout,
"Dump tx related registers and ring when tx_timeout happens");

Edit:
https://git.kernel.org/pub/scm/linux/ke … net/nvidia last commit that is in stable is d99356797a8f3abaa57e13c5d1f50e4392eca037 2017-10-18
Edit2:
What version of the of the kernel is arch live system you tested on using?

Last edited by loqs (2018-01-28 19:51:52)

Offline

#17 2018-01-28 19:52:09

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

Let's see.
Pass "forcedeth.msi=1 forcedeth.msix=0" to the kernel (or reload the module w/ those parameters, but I'm not sure this is sufficient once you ran into the issue)

Edit: https://git.kernel.org/pub/scm/linux/ke … 54d220ea92 looks terribly suspicious.

Last edited by seth (2018-01-28 19:55:26)

Online

#18 2018-01-28 20:22:15

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

@loqs: the arch live system is using 4.13.12, but I don't think that that is the reason, since I encountered the problem in both a newer and an older kernel version.

@seth: with those kernel parameters I run into the same problem  (locks up), though the systool output changes

  Sections:
    .bss                = "0xffffffffc04017c0"
    .data.unlikely      = "0xffffffffc0401450"
    .data               = "0xffffffffc04010c0"
    .exit.text          = "0xffffffffc03fbf22"
    .gnu.linkonce.this_module= "0xffffffffc0401480"
    .init.text          = "0xffffffffc0405000"
    .note.gnu.build-id  = "0xffffffffc03fc000"
    .orc_unwind         = "0xffffffffc03feeec"
    .orc_unwind_ip      = "0xffffffffc03fe088"
    .parainstructions   = "0xffffffffc03fc878"
    .rodata.str1.1      = "0xffffffffc03fc0b8"
    .rodata.str1.8      = "0xffffffffc03fc2f0"
    .rodata             = "0xffffffffc03fc8c0"
    .smp_locks          = "0xffffffffc03fc024"
    .strtab             = "0xffffffffc04080a0"
    .symtab             = "0xffffffffc0406000"
    .text               = "0xffffffffc03f2000"
    __bug_table         = "0xffffffffc04011d8"
    __jump_table        = "0xffffffffc0401000"
    __mcount_loc        = "0xffffffffc0400488"
    __param             = "0xffffffffc03fdf20"
    __verbose           = "0xffffffffc04012c8"

Offline

#19 2018-01-28 20:31:08

seth
Member
Registered: 2012-09-03
Posts: 49,974

Re: Soft lockup - CPU stuck on ethernet plug in

Just to be sure: when you disable the dhcpcd service (that should not be running in the archiso, is it?) - does the issue then ever come up?
What if you run the service in the live iso?
Did you try the lts kernel?

Online

#20 2018-01-28 20:34:44

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: Soft lockup - CPU stuck on ethernet plug in

https://www.kernel.org/doc/Documentatio … chdogs.txt if I understand the documentation correctly if you get a lockup there should be a backtrace.
You could also look at the boot parameter nmi_watchdog https://www.kernel.org/doc/Documentatio … meters.txt

Offline

#21 2018-01-28 21:51:15

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

Okay, so everything work as expected in the LTS kernel (should've thought of trying that myself).

In the vanilla kernel, in case of the dhcpcd service being disabled, the thing that seems to determine, whether the lock up occurs is the shell, from where I start dhcpcd. If it's from inside X via a terminal emulator, there is no problem, but if it's from the login shell, then it'll lock up.
On the Arch iso I didn't run into any problems, though I belief there is a dhcpcd service running as well.

As for the backtrace: Yes, I was expecting those as well, but I haven't been able to get my system to print one. It always only shows the soft and hard lockup messages directly on screen, and for some reason, nothing is being written to the journal.

Offline

#22 2018-01-28 22:09:14

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: Soft lockup - CPU stuck on ethernet plug in

The backtrace should be in dmesg but it should also be in the journal by default,  would check dmesg anyway.
So from linux-lts its last commit is de55558dc4e6562197bf0ea0fe249cbd7ccebae5 2016-02-25  9 commits including the one seth highlighted.
Might be worth doing a kernel bisect just against drivers/net/ethernet/nvidia see if that finds the cause.

Offline

#23 2018-01-28 23:53:22

Hausdorff
Member
Registered: 2017-06-12
Posts: 11

Re: Soft lockup - CPU stuck on ethernet plug in

I'll do that. Thanks for all the help, you guys are great smile
I'll report back, when I know more.

Offline

#24 2018-01-30 17:10:28

SSTC
Member
From: Denmark
Registered: 2009-06-23
Posts: 15

Re: Soft lockup - CPU stuck on ethernet plug in

I have something that looks like this, I am running linux-lts, and I am using systemd-networkd:

[412904.895658] NMI watchdog: Watchdog detected hard LOCKUP on cpu 16dModules linked in:c vhost_netc vhostc macvtapc macvlanc tunc devlinkc w83795c w83627ehfc hwmon_vidc jc42c mgag200c ttmc intel_powerclampc drm_kms_helperc coretempc drmc kvm_intelc kvmc syscopyareac sysfillrectc sysimgbltc ipmi_ssifc irqbypassc fb_sys_fopsc input_ledsc led_classc joydevc igbc mousedevc intel_cstatec bridgec psmousec ptpc iTCO_wdtc pps_corec evdevc pcspkrc mac_hidc iTCO_vendor_supportc i2c_algo_bitc gpio_ichc ipmi_sic fjesc stpc llcc ipmi_msghandlerc buttonc ioatdmac i7core_edacc acpi_cpufreqc edac_corec shpchpc lpc_ichc i5500_tempc i2c_i801c dcac tpm_tisc i2c_smbusc tpm_tis_corec tpmc sch_fq_codelc ip_tablesc x_tablesc ext4c crc16c jbd2c fscryptoc mbcachec sd_modc hid_genericc usbhidc hidc uhci_hcdc ehci_pcic ata_genericc serio_rawc pata_acpic mpt3sasc raid_classc atkbdc ata_piixc ehci_hcdc libps2c libatac scsi_transport_
[412904.908938] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G             L  4.9.78-1-lts #1
[412904.908939] Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.0a    09/29/2010
[412904.908939] task: ffff880331b80d00 task.stack: ffffc90003208000
[412904.908940] RIP: 0010:[<ffffffff81604bbc>] c [<ffffffff81604bbc>] intel_idle+0x9c/0x110
[412904.908941] RSP: 0018:ffffc9000320be48  EFLAGS: 00000046
[412904.908941] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[412904.908942] RDX: 0000000000000000 RSI: ffffffff81aa4da0 RDI: 0000000000000010
[412904.908943] RBP: ffffc9000320be68 R08: cccccccccccccccd R09: 000000000000afa3
[412904.908943] R10: 0000000000000018 R11: 000000000000aa1c R12: 0000000000000003
[412904.908944] R13: 0000000000000004 R14: 0000000000000020 R15: 0000000000000000
[412904.908944] FS:  0000000000000000(0000) GS:ffff880333c80000(0000) knlGS:0000000000000000
[412904.908945] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[412904.908946] CR2: 00007fd5e0420000 CR3: 0000000001a08000 CR4: 0000000000022670
[412904.908946] Stack:
[412904.908947]  ffffffff81aa4da0c ffff880333c9fc00c 0000000000000004c ffffffff81aa4f38c
[412904.908948]  ffffc9000320beb8c ffffffff814b8934c ffff880333c9fc00c 7735940000038345c
[412904.908948]  000177868fa79e2dc 0000000000000010c ffffffff81b08850c ffffffff8192b0d6c
[412904.908949] Call Trace:
[412904.908949]  [<ffffffff814b8934>] cpuidle_enter_state+0x74/0x2d0
[412904.908950]  [<ffffffff814b8bc7>] cpuidle_enter+0x17/0x20
[412904.908950]  [<ffffffff810c2bd3>] call_cpuidle+0x23/0x40
[412904.908951]  [<ffffffff810c2e4f>] cpu_startup_entry+0x15f/0x240
[412904.908952]  [<ffffffff8105032f>] start_secondary+0x16f/0x1b0
[412904.908953] Code: c00 c00 c0f cae c38 c0f cae cf0 c31 cd2 c65 c48 c8b c04 c25 c00 cfb c00 c00 c48 c89 cd1 c0f c01 cc8 c48 c8b c00 ca8 c08 c75 c0b cb9 c01 c00 c00 c00 c4c c89 cf0 c0f c01 cc9 c<65> c48 c8b c04 c25 c00 cfb c00 c00 cf0 c80 c60 c02 cdf c0f cae cf0 c48 c8b c00 ca8 c
systemctl list-unit-files --state=enabled
UNIT FILE                             STATE  
autovt@.service                       enabled
dbus-org.freedesktop.network1.service enabled
getty@.service                        enabled
libvirt-guests.service                enabled
libvirtd.service                      enabled
lm_sensors.service                    enabled
systemd-networkd-wait-online.service  enabled
systemd-networkd.service              enabled
sshd.socket                           enabled
systemd-networkd.socket               enabled
virtlockd.socket                      enabled
virtlogd.socket                       enabled
remote-fs.target                      enabled

Last edited by SSTC (2018-01-30 17:11:53)

Offline

#25 2018-01-30 18:57:19

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: Soft lockup - CPU stuck on ethernet plug in

@SSTC that issue does not seem to have much in common:  different kernel network driver module,  different userspace network manager,  occurs on linux-lts the original issue does not.

Offline

Board footer

Powered by FluxBB