You are not logged in.

#1 2024-04-12 15:15:48

lior
Member
Registered: 2021-02-13
Posts: 8

[SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

After upgrading today my packages my machine started to act weird.
It started with my bluetooth keyboard not working. I tried to connect a USB one, but it also didn't work. Weirdly enough my mouse continued to work, and the GUI wasn't frozen.
Rebooting didn't help, but I noticed that the hangs start after the machine completes the boot.
That gave me enough time to look at the systemd journal logs.

kernel: e1000e 0000:00:1f.6 ___: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
NetworkManager[649]: <info>  [1712930066.1043] device (___): carrier: link connected
NetworkManager[649]: <info>  [1712930066.1049] device (___): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
NetworkManager[649]: <info>  [1712930066.1061] policy: auto-activating connection '___' (____________________________________)
NetworkManager[649]: <info>  [1712930066.1071] device (___): Activation: starting connection '___' (____________________________________)
NetworkManager[649]: <info>  [1712930066.1073] device (___): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
NetworkManager[649]: <info>  [1712930066.1078] manager: NetworkManager state is now CONNECTING
NetworkManager[649]: <info>  [1712930066.2009] device (___): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
NetworkManager[649]: <info>  [1712930066.2380] device (___): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
NetworkManager[649]: <info>  [1712930066.2381] dhcp4 (___): activation: beginning transaction (timeout in 45 seconds)
dbus-daemon[590]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.7' (uid=0 pid=649 comm="/usr/bin/NetworkManager --no-daemon")
dbus-daemon[590]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found.
dnsmasq[814]: reading /etc/resolv.conf
dnsmasq[814]: using nameserver _______#53
dnsmasq[814]: using nameserver _______#53
kernel: e1000e 0000:00:1f.6 ___: NIC Link is Up 100 Mbps Half Duplex, Flow Control: None
kernel: BUG: scheduling while atomic: kworker/2:1/119/0x00000002
kernel: Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT bridge stp llc ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security xt_tcpudp ip6table_filter ip6_tables iptable_filter uhid cmac algif_hash algif_skcipher af_alg bnep pktcdvd intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 psmouse aesni_intel snd_soc_avs eeepc_wmi serio_raw crypto_simd asus_wmi atkbd cryptd sparse_keymap snd_soc_hda_codec libps2 iTCO_wdt nls_iso8859_1 vivaldi_fmap platform_profile nvidia_drm(POE) rapl snd_hda_ext_core snd_hda_codec_realtek i8042 vfat intel_pmc_bxt
kernel:  intel_cstate nvidia_uvm(POE) serio nvidia_modeset(POE) mei_hdcp mei_pxp iTCO_vendor_support ee1004 fat wmi_bmof mxm_wmi snd_hda_codec_generic snd_soc_core ledtrig_audio snd_hda_codec_hdmi snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi btusb btrtl snd_hda_codec btintel snd_hda_core btbcm btmtk snd_hwdep snd_pcm intel_uncore snd_timer bluetooth snd i2c_i801 pcspkr video mei_me ecdh_generic mousedev joydev rfkill e1000e soundcore mei i2c_smbus wmi acpi_pad mac_hid nvidia(POE) crypto_user fuse loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_mod crc32c_intel sr_mod xhci_pci cdrom xhci_pci_renesas
kernel: CPU: 2 PID: 119 Comm: kworker/2:1 Tainted: P           OE      6.6.26-1-lts #1 c2323a06065870edd4b1ca8a9606d2671b8a266e
kernel: Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 0703 08/27/2015
kernel: Workqueue: events e1000_watchdog_task [e1000e]
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x47/0x60
kernel:  __schedule_bug+0x56/0x70
kernel:  __schedule+0x103c/0x1410
kernel:  ? ttwu_do_activate+0x64/0x220
kernel:  schedule+0x5e/0xd0
kernel:  schedule_hrtimeout_range_clock+0xbe/0x140
kernel:  ? __pfx_hrtimer_wakeup+0x10/0x10
kernel:  usleep_range_state+0x64/0x90
kernel:  e1000e_read_phy_reg_mdic+0x87/0x280 [e1000e f2907a1571d8ec926b8ffd7cee99fb02552e6e9e]
kernel:  e1000e_update_stats+0x513/0x730 [e1000e f2907a1571d8ec926b8ffd7cee99fb02552e6e9e]
kernel:  e1000_watchdog_task+0xe1/0xab0 [e1000e f2907a1571d8ec926b8ffd7cee99fb02552e6e9e]
kernel:  process_one_work+0x178/0x350
kernel:  worker_thread+0x30f/0x450
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
NetworkManager[649]: <info>  [1712930072.2026] device (___): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/2:3:278]
kernel: Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT bridge stp llc ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security xt_tcpudp ip6table_filter ip6_tables iptable_filter uhid cmac algif_hash algif_skcipher af_alg bnep pktcdvd intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 psmouse aesni_intel snd_soc_avs eeepc_wmi serio_raw crypto_simd asus_wmi atkbd cryptd sparse_keymap snd_soc_hda_codec libps2 iTCO_wdt nls_iso8859_1 vivaldi_fmap platform_profile nvidia_drm(POE) rapl snd_hda_ext_core snd_hda_codec_realtek i8042 vfat intel_pmc_bxt
kernel:  intel_cstate nvidia_uvm(POE) serio nvidia_modeset(POE) mei_hdcp mei_pxp iTCO_vendor_support ee1004 fat wmi_bmof mxm_wmi snd_hda_codec_generic snd_soc_core ledtrig_audio snd_hda_codec_hdmi snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi btusb btrtl snd_hda_codec btintel snd_hda_core btbcm btmtk snd_hwdep snd_pcm intel_uncore snd_timer bluetooth snd i2c_i801 pcspkr video mei_me ecdh_generic mousedev joydev rfkill e1000e soundcore mei i2c_smbus wmi acpi_pad mac_hid nvidia(POE) crypto_user fuse loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_mod crc32c_intel sr_mod xhci_pci cdrom xhci_pci_renesas
kernel: CPU: 2 PID: 278 Comm: kworker/2:3 Tainted: P        W  OE      6.6.26-1-lts #1 c2323a06065870edd4b1ca8a9606d2671b8a266e
kernel: Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 0703 08/27/2015
kernel: Workqueue: events linkwatch_event
kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2e0
kernel: Code: 77 7f f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 5b 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 65 48 ff 05 a3 c0 c5
kernel: RSP: 0018:ffffc90001c57bc0 EFLAGS: 00000202
kernel: RAX: 0000000000000001 RBX: ffff888116b77428 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888116b77428
kernel: RBP: ffff8881344b0138 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: ffff8881344b0200 R11: 0000000000000010 R12: ffff888116b77428
kernel: R13: ffff8881344b0000 R14: 0000000000000000 R15: 0000000000000000
kernel: FS:  0000000000000000(0000) GS:ffff888446480000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00005685751bf000 CR3: 00000002dea20005 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  <IRQ>
kernel:  ? watchdog_timer_fn+0x1b8/0x220
kernel:  ? __pfx_watchdog_timer_fn+0x10/0x10
kernel:  ? __hrtimer_run_queues+0x10f/0x2b0
kernel:  ? hrtimer_interrupt+0xf8/0x230
kernel:  ? __sysvec_apic_timer_interrupt+0x4d/0x140
kernel:  ? sysvec_apic_timer_interrupt+0x6d/0x90
kernel:  </IRQ>
kernel:  <TASK>
kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
kernel:  ? native_queued_spin_lock_slowpath+0x6e/0x2e0
kernel:  _raw_spin_lock+0x29/0x30
kernel:  e1000e_get_stats64+0x22/0x120 [e1000e f2907a1571d8ec926b8ffd7cee99fb02552e6e9e]
kernel:  dev_get_stats+0x60/0x110
kernel:  rtnl_fill_stats+0x3b/0x130
kernel:  rtnl_fill_ifinfo+0x868/0x1530
kernel:  rtmsg_ifinfo_build_skb+0xae/0x120
kernel:  rtmsg_ifinfo+0x3c/0x90
kernel:  netdev_state_change+0x89/0x90
kernel:  linkwatch_do_dev+0x49/0x60
kernel:  __linkwatch_run_queue+0xe1/0x260
kernel:  linkwatch_event+0x31/0x40
kernel:  process_one_work+0x178/0x350
kernel:  worker_thread+0x30f/0x450
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: usb 1-12: USB disconnect, device number 3

So it looks like it is related to the e1000 driver (ethernet).
It happens both in the regular kernel and the LTS one.

Disconnecting my ethernet cable stopped hangs from reoccurring.
After downgrading the kernel packages (including the header packages) the issue stopped.
linux (6.8.5.arch1-1 -> 6.8.1.arch1-1)
linux-lts (6.6.26-1 -> 6.6.22-1)

I went to the stable linux git (the one used for LTS) to check out if there are any commits that stand out.
Maybe this one is related to the issue? https://git.kernel.org/pub/scm/linux/ke … 6a61e486a1
It touches e1000e_read_phy_reg_mdic, which is in the stack trace.

Last edited by lior (2024-05-18 18:00:11)

Offline

#2 2024-04-12 15:33:13

loqs
Member
Registered: 2014-03-06
Posts: 17,907

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

1d16cd91cd319d5bf6230c8493feb56a61e486a1 was added in 6.8.5 so if you downgrade to 6.8.4 can you still reproduce the issue?  You can obtain that version from the ALA.  If 6.8.4 does not have the issue would it help if I built 6.8.5.arch1-1 with 1d16cd91cd319d5bf6230c8493feb56a61e486a1 reverted?

Offline

#3 2024-04-12 15:55:07

lior
Member
Registered: 2021-02-13
Posts: 8

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

Installed linux 6.8.4 from the archive, the issue does not reproduce in that version

Last edited by lior (2024-04-12 15:55:34)

Offline

#4 2024-04-12 16:47:09

loqs
Member
Registered: 2014-03-06
Posts: 17,907

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

6.8.5.arch1-1 with 1d16cd91cd319d5bf6230c8493feb56a61e486a1 and 1f4b78e04e886a73f6d4f8e308904d91ec087a06 reverted:
https://drive.google.com/file/d/1u3fppv … sp=sharing linux-6.8.5.arch1-1.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1nbNnS4 … sp=sharing linux-headers-6.8.5.arch1-1.1-x86_64.pkg.tar.zst

1d16cd91cd319d5bf623 would not revert cleanly without first reverting 1f4b78e04e886a73f6d4f8e308904d91ec087a06.

Offline

#5 2024-04-13 10:12:47

lior
Member
Registered: 2021-02-13
Posts: 8

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

From what I understand from the stacktrace, the issue is that "sleep" was called during a "critical section".
e1000e_get_stats locks a resource (https://git.kernel.org/pub/scm/linux/ke … 86a1#n5974)
while e1000e_read_phy_reg_mdic was calling sleep (https://git.kernel.org/pub/scm/linux/ke … 486a1#n264 one of the calls here, not sure which).
I assume that e1000e_get_stats was called by network manager during the network initialization.

If I'll have some free time I think I'll try to git bisect this myself, tho I hope that someone else will fix the bug before I'll have to resort to that.

There is a discussion in the kernel mailing list on e1000 recent changes, it could be that https://git.kernel.org/pub/scm/linux/ke … 91ec087a06 was the culprit
https://lore.kernel.org/all/20240413092 … l.com/T/#u

If there are any Intel peeps reading this thread who want to try to reproduce this issue, I forgot to mention my CPU is "Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz", in the stack trace my motherboard is listed as "Z170 PRO GAMING, BIOS 0703 08/27/2015", when I lspci, my ethernet controller is listed as "Intel Corporation Ethernet Connection (2) I219-V (rev 31)"

Last edited by lior (2024-04-13 10:14:05)

Offline

#6 2024-04-13 11:39:24

loqs
Member
Registered: 2014-03-06
Posts: 17,907

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

Did reverting both 1d16cd91cd319d5bf6230c8493feb56a61e486a1 and 1f4b78e04e886a73f6d4f8e308904d91ec087a06 resolve the issue?

Offline

#7 2024-04-18 19:56:13

fatbotgw
Member
From: Deck 36
Registered: 2013-09-10
Posts: 3

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

loqs wrote:

Did reverting both 1d16cd91cd319d5bf6230c8493feb56a61e486a1 and 1f4b78e04e886a73f6d4f8e308904d91ec087a06 resolve the issue?

Not the OP, but I am having the exact same issues with any kernel past 6.8.4 and an e1000 ethernet adapter.  I tried the 6.8.5 with the patches reverted and have not had the issue.

Offline

#8 2024-04-18 20:15:19

zersaa
Member
From: Pskov, Russia
Registered: 2009-02-09
Posts: 29

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

Offline

#9 2024-04-19 13:16:33

loqs
Member
Registered: 2014-03-06
Posts: 17,907

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

If you revert only 1d16cd91cd319d5bf6230c8493feb56a61e486a1 is the issue still reproducible?  Both commits are by the same author so I would suggest adding that author to the CC list of the kernel bugzilla report and marking it as a regression and mentioning both commits.
Does the proposed fix from the bug report resolve the issue?
https://drive.google.com/file/d/1_xrZd9 … sp=sharing linux-6.8.7.arch1-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1HjFlig … sp=sharing linux-headers-6.8.7.arch1-1.2-x86_64.pkg.tar.zst

Last edited by loqs (2024-04-19 17:02:04)

Offline

#10 2024-04-19 20:16:32

fatbotgw
Member
From: Deck 36
Registered: 2013-09-10
Posts: 3

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

loqs wrote:

If you revert only 1d16cd91cd319d5bf6230c8493feb56a61e486a1 is the issue still reproducible?  Both commits are by the same author so I would suggest adding that author to the CC list of the kernel bugzilla report and marking it as a regression and mentioning both commits.
Does the proposed fix from the bug report resolve the issue?
https://drive.google.com/file/d/1_xrZd9 … sp=sharing linux-6.8.7.arch1-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1HjFlig … sp=sharing linux-headers-6.8.7.arch1-1.2-x86_64.pkg.tar.zst

I am writing this from the linked kernel and am not having the issue.  The proposed patch appears to have fixed the problem.

Offline

#11 2024-05-01 16:00:29

indianahorst
Member
Registered: 2008-08-23
Posts: 129

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

Could this really annyoing bug be fixed in the near future? It's  a pain in the ass to have absolutely standard hardware and not to be able to upgrade to the current kernel version.
The bug is still not fixed in linux 6.8.8.arch1-1.

Last edited by indianahorst (2024-05-01 16:08:09)

Offline

#12 2024-05-01 18:06:52

loqs
Member
Registered: 2014-03-06
Posts: 17,907

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

indianahorst wrote:

Could this really annyoing bug be fixed in the near future? It's  a pain in the ass to have absolutely standard hardware and not to be able to upgrade to the current kernel version.

Open an issue on Arch's gitlab instance and ask for the fix to be applied ahead of the fix making its way to upstream stable?
Edit:
As far as I can determine the patch can not make its way to upstream stable as it has not been accepted by the upstream subsystem maintainer https://lore.kernel.org/intel-wired-lan … huis.info/

Last edited by loqs (2024-05-01 22:25:13)

Offline

#13 2024-05-02 09:56:56

karlheinz
Member
Registered: 2024-05-01
Posts: 1

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

I have a similar problem (driver e1000e, no wire Ethernet when the computer wakes up). Downgrading to kernel 6.8.4 solves the problem. I compiled kernel 6.8.8 with the proposed fix (https://patchwork.ozlabs.org/project/in … intel.com/) and the problem is no longer present.

Offline

#14 2024-05-18 17:59:47

lior
Member
Registered: 2021-02-13
Posts: 8

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

Looks like the issue was resolved in 6.9.1.arch1-1 and linux-lts 6.6.31-1

Offline

#15 2024-05-20 11:53:36

indianahorst
Member
Registered: 2008-08-23
Posts: 129

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

lior wrote:

Looks like the issue was resolved in 6.9.1.arch1-1 and linux-lts 6.6.31-1

Nope. Problem isn't fixed in 6.9.1.arch1-1.

Offline

#16 2024-05-20 12:53:28

loqs
Member
Registered: 2014-03-06
Posts: 17,907

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

indianahorst wrote:
lior wrote:

Looks like the issue was resolved in 6.9.1.arch1-1 and linux-lts 6.6.31-1

Nope. Problem isn't fixed in 6.9.1.arch1-1.

6.9.1 contains https://github.com/torvalds/linux/commi … cbd3621778

Offline

#17 2024-05-20 18:46:14

fatbotgw
Member
From: Deck 36
Registered: 2013-09-10
Posts: 3

Re: [SOLVED] e1000 hangs in linux 6.8.5.arch1-1 and linux-lts 6.6.26-1

The issue is resolved for me using 6.9.1.arch1-1 kernel.  The commit also states that it closes bug #218740 which was referenced in post #8 of this thread.  That makes me believe this is the "final" fix for this issue.  If you are still having the issue, it may be better to open a new bug report.

Offline

Board footer

Powered by FluxBB