You are not logged in.

#1 2024-12-10 11:55:58

hertz
Member
Registered: 2024-12-10
Posts: 12

Random mt7921e driver crashes leading to complete system freeze

Greetings everyone,

First time I write in a technical forum, this problem drives me crazy.
Since I bought this system one year ago I've been experimenting these crashes, in most sessions.
It's an Asus TUF A15 FA507XI. I've already seen there are a lot of problems with these WIFI cards and at this point I'm thinking to switch to a better Linux supported card.
The crashes seem random and almost every time they occur the system simply freezes, with the last second of audio looping and the caps-lock indicator blinking, can't even move the cursor. The only solution is to hard reboot from the power button.
Same solution even in cases where it doesn't freeze but manage to reboot it normally and still it hangs on reboot with a black screen.
Curious thing I noticed is that at least for the next session seems it doesn't crash.
It's dual booted with W11, not used for months and anyway it's not even bootable now without its EFI partition.
Last logs from the last freeze:

dic 10 10:25:27 A15 kernel: mt7921e 0000:04:00.0: Message 00020003 (seq 15) timeout
dic 10 10:25:28 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:29 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:30 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:31 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:37 A15 kernel: mt7921e 0000:04:00.0: Message 00002ced (seq 1) timeout
dic 10 10:25:38 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:39 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:40 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:41 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:48 A15 kernel: mt7921e 0000:04:00.0: Message 00020003 (seq 2) timeout
dic 10 10:25:54 A15 kernel: mt7921e 0000:04:00.0: Message 00002ced (seq 3) timeout
dic 10 10:25:55 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:55 A15 kernel: ------------[ cut here ]------------
dic 10 10:25:55 A15 kernel: refcount_t: underflow; use-after-free.
dic 10 10:25:55 A15 kernel: WARNING: CPU: 7 PID: 783 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_tcpudp xt_con>
dic 10 10:25:55 A15 kernel:  snd_pcm_dmaengine snd_hda_codec snd_rpl_pci_acp6x uvcvideo mac80211 snd_acp_pci snd>
dic 10 10:25:55 A15 kernel:  drm_suballoc_helper crypto_simd vivaldi_fmap drm_buddy cryptd drm_display_helper nv>
dic 10 10:25:55 A15 kernel: CPU: 7 UID: 0 PID: 783 Comm: kworker/u64:17 Tainted: G           OE      6.12.3-arch>
dic 10 10:25:55 A15 kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
dic 10 10:25:55 A15 kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA507XI_FA507XI/FA507XI, BI>
dic 10 10:25:55 A15 kernel: Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
dic 10 10:25:55 A15 kernel: RIP: 0010:refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel: Code: 01 01 e8 15 37 9f ff 0f 0b e9 49 7c a3 00 80 3d 50 ae b7 01 00 75 85 48 c7 c7 >
dic 10 10:25:55 A15 kernel: RSP: 0018:ffffbcb8c25dbd00 EFLAGS: 00010282
dic 10 10:25:55 A15 kernel: RAX: 0000000000000000 RBX: ffff9de50a0006e8 RCX: 0000000000000027
dic 10 10:25:55 A15 kernel: RDX: ffff9de82e7a18c8 RSI: 0000000000000001 RDI: ffff9de82e7a18c0
dic 10 10:25:55 A15 kernel: RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffbcb8c25dbb80
dic 10 10:25:55 A15 kernel: R10: ffffffffa5cb54a8 R11: 0000000000000003 R12: 0000000000000001
dic 10 10:25:55 A15 kernel: R13: 0000000000000001 R14: ffff9de504d50000 R15: ffff9de50a0006e8
dic 10 10:25:55 A15 kernel: FS:  0000000000000000(0000) GS:ffff9de82e780000(0000) knlGS:0000000000000000
dic 10 10:25:55 A15 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
dic 10 10:25:55 A15 kernel: CR2: 0000735800919000 CR3: 0000000355222000 CR4: 0000000000f50ef0
dic 10 10:25:55 A15 kernel: PKRU: 55555554
dic 10 10:25:55 A15 kernel: Call Trace:
dic 10 10:25:55 A15 kernel:  <TASK>
dic 10 10:25:55 A15 kernel:  ? refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel:  ? __warn.cold+0x93/0xf6
dic 10 10:25:55 A15 kernel:  ? refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel:  ? report_bug+0xff/0x140
dic 10 10:25:55 A15 kernel:  ? handle_bug+0x58/0x90
dic 10 10:25:55 A15 kernel:  ? exc_invalid_op+0x17/0x70
dic 10 10:25:55 A15 kernel:  ? asm_exc_invalid_op+0x1a/0x20
dic 10 10:25:55 A15 kernel:  ? refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel:  mt76_queue_tx_complete+0x24/0x50 [mt76 76eca38781d45987826792671a4b9dc5bb2f6abc]
dic 10 10:25:55 A15 kernel:  mt76_dma_tx_cleanup+0x1e0/0x2e0 [mt76 76eca38781d45987826792671a4b9dc5bb2f6abc]
dic 10 10:25:55 A15 kernel:  mt792x_wpdma_reset+0x87/0x1e0 [mt792x_lib a868f11f53ed9255b905f0e66cad4a82d8191613]
dic 10 10:25:55 A15 kernel:  mt7921e_mac_reset+0x134/0x310 [mt7921e e9ae0b279e3b13a010cfbbe72317dc12f2225823]
dic 10 10:25:55 A15 kernel:  mt7921_mac_reset_work+0x9d/0x180 [mt7921_common 9dc27b2e0815560ffd32177b461e896e206>
dic 10 10:25:55 A15 kernel:  process_one_work+0x17b/0x330
dic 10 10:25:55 A15 kernel:  worker_thread+0x2ce/0x3f0
dic 10 10:25:55 A15 kernel:  ? __pfx_worker_thread+0x10/0x10
dic 10 10:25:55 A15 kernel:  kthread+0xcf/0x100
dic 10 10:25:55 A15 kernel:  ? __pfx_kthread+0x10/0x10
dic 10 10:25:55 A15 kernel:  ret_from_fork+0x31/0x50
dic 10 10:25:55 A15 kernel:  ? __pfx_kthread+0x10/0x10
dic 10 10:25:55 A15 kernel:  ret_from_fork_asm+0x1a/0x30
dic 10 10:25:55 A15 kernel:  </TASK>
dic 10 10:25:55 A15 kernel: ---[ end trace 0000000000000000 ]---
dic 10 10:25:56 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:57 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:58 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:58 A15 kernel: mt7921e 0000:04:00.0: chip reset failed
dic 10 10:26:01 A15 kernel: mt7921e 0000:04:00.0: Message 00020001 (seq 4) timeout
dic 10 10:26:02 A15 kernel: mt7921e 0000:04:00.0: driver own failed

Note that the trace like the one in the log is not always produced. Often it logs only the "driver own" failing messages for hours without freezing, breaking only the wifi.
Never used Bluetooth, so can't say anything about it.
What I've tried so far:

The most similar report found: https://bbs.archlinux.org/viewtopic.php?id=292150
Thanks.

Offline

#2 2024-12-10 18:30:39

jonno2002
Member
Registered: 2016-11-21
Posts: 735

Re: Random mt7921e driver crashes leading to complete system freeze

have a read of this thread: https://bbs.archlinux.org/viewtopic.php?id=292150
looks like he fixed it by disabling fast start in windows and removing the laptop battery for a period of time to reset things.

Online

#3 2024-12-11 04:21:55

ReDress
Member
From: Nairobi
Registered: 2024-11-30
Posts: 96

Re: Random mt7921e driver crashes leading to complete system freeze

hertz wrote:

Greetings everyone,

Curious thing I noticed is that at least for the next session seems it doesn't crash.

Yeah, kernel crash is leaving behind some device(s) state which somehow helps with the crash not happening next time.

Offline

#4 2024-12-12 09:27:21

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

jonno2002 wrote:

have a read of this thread: https://bbs.archlinux.org/viewtopic.php?id=292150
looks like he fixed it by disabling fast start in windows and removing the laptop battery for a period of time to reset things.

Yeah, I had already found it but I didn't think it could work. In fact I tried it anyway yesterday and just now at the first session it froze after 20 minutes of uptime...
Seems some serious low level stuff, never found anyone that has some similar crashes.
Even seth doesn't seem to be able to help me! yikes

I have no more valid solutions at this point other than changing the card?
May you guys recommend me one? Classic Intel AX210? Unfortunately Intel doesn't seem to want to support AMD CPUs any further for now (BE200), and don't know the Qualcomm alternative QCNCM865 how good it is.
Thank you all!

Offline

#5 2024-12-12 10:25:59

xerxes_
Member
Registered: 2018-04-29
Posts: 846

Re: Random mt7921e driver crashes leading to complete system freeze

You may try to add as kernel parameter 'mt7921e.disable_aspm=Y' or 'mt7921e.disable_aspm=1'.

Also you may try something like here: https://bbs.archlinux.org/viewtopic.php … 4#p1978614 : blacklist mt76 and mt7601u and modprobe mt7921e.

Offline

#6 2024-12-12 12:43:38

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

The OP wrote:

Disabled ASPM as indicated in https://wiki.archlinux.org/title/Networ … ess#mt7921, even if not directly related.

Even seth doesn't seem to be able to help me! yikes

You cannot summon me, this way or another. Mortal fool.

tongue

The kernel backtrace showcases the status after the NIC has already timed out.
Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

so we can hopefully get an idea how it arrived there.

Fwwi

It's dual booted with W11, not used for months and anyway it's not even bootable now without its EFI partition.

Fast boot and secure boot are disabled.

The correct term is "fast start" (fast boot is a BIOS/UEFI feature and not relevant here)  - see the 3rd link below for details.

Offline

#7 2024-12-12 17:57:59

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

xerxes_ wrote:

You may try to add as kernel parameter 'mt7921e.disable_aspm=Y' or 'mt7921e.disable_aspm=1'.

Isn't it the same as https://wiki.archlinux.org/title/Networ … ess#mt7921? From the journal seems disabled. I'll give a try to blacklisting tho, thanks.

seth wrote:

The correct term is "fast start" (fast boot is a BIOS/UEFI feature and not relevant here)  - see the 3rd link below for details.

Yes, I understood the difference as you've also explained here https://bbs.archlinux.org/viewtopic.php?id=292150, then is still relevant even if I will not boot again Windows after the battery reset?

Here the output of the request: http://0x0.st/XFXS.txt
Here from the last crash, again (not even an hour ago...): http://0x0.st/XFXe.txt
Nothing special from the driver, maybe I'm wrong.

Offline

#8 2024-12-12 18:50:18

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

is still relevant even if I will not boot again Windows after the battery reset?

If window is currently hibernating (explicitly or disguisd as fast-start), it's technically still running, can boot itself, make changes etc. while you're sleeping.
So: yes.

dic 12 17:54:24 A15 NetworkManager[799]: <info>  [1734022464.8437] dhcp4 (wlan0): state changed new lease, address=192.168.1.154, acd pending
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0333] dhcp4 (wlan0): state changed new lease, address=192.168.1.154
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0339] policy: set 'HertZ-Mod' (wlan0) as default for IPv4 routing and DNS
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0599] device (wlan0): state change: ip-config -> ip-check (reason 'none', managed-type: 'full')
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0611] device (wlan0): state change: ip-check -> secondaries (reason 'none', managed-type: 'full')
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0611] device (wlan0): state change: secondaries -> activated (reason 'none', managed-type: 'full')
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0612] manager: NetworkManager state is now CONNECTED_SITE
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.0613] device (wlan0): Activation: successful, device activated.
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.1708] manager: NetworkManager state is now CONNECTED_GLOBAL
dic 12 17:54:25 A15 NetworkManager[799]: <info>  [1734022465.9945] manager: startup complete
dic 12 17:54:29 A15 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1154] manager: (proton0): new Tun device (/org/freedesktop/NetworkManager/Devices/5)
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1232] device (proton0): state change: unmanaged -> unavailable (reason 'connection-assumed', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1237] device (proton0): state change: unavailable -> disconnected (reason 'connection-assumed', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1245] device (proton0): Activation: starting connection 'proton0' (c2b4718c-ad9f-4a41-9997-9e97ddcfd0e8)
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1246] device (proton0): state change: disconnected -> prepare (reason 'none', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1248] device (proton0): state change: prepare -> config (reason 'none', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1249] device (proton0): state change: config -> ip-config (reason 'none', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1250] device (proton0): state change: ip-config -> ip-check (reason 'none', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1894] device (proton0): state change: ip-check -> secondaries (reason 'none', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1896] device (proton0): state change: secondaries -> activated (reason 'none', managed-type: 'external')
dic 12 17:54:31 A15 NetworkManager[799]: <info>  [1734022471.1901] device (proton0): Activation: successful, device activated.
dic 12 17:54:34 A15 NetworkManager[799]: <warn>  [1734022474.8063] ndisc[0x55dc4cf2a9b0,"wlan0"]: solicit: failure sending router solicitation: Operation not permitted (1)
dic 12 17:54:41 A15 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
dic 12 17:59:45 A15 NetworkManager[799]: <info>  [1734022785.8067] manager: NetworkManager state is now CONNECTED_SITE
dic 12 17:59:45 A15 systemd[1]: Starting Network Manager Script Dispatcher Service...
dic 12 17:59:45 A15 systemd[1]: Started Network Manager Script Dispatcher Service.
dic 12 17:59:55 A15 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
dic 12 18:01:53 A15 root[8945]: ACPI action undefined: PNP0C0A:00
dic 12 18:01:58 A15 kernel: mt7921e 0000:04:00.0: driver own failed

Does this only happen if you run the proton VPN over the wifi NIC?
(The other journal has it on the wired NIC)

If so, have you tried to lower the MTU of the proton0 device (to eg. 1280)?

Offline

#9 2024-12-12 20:26:04

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

seth wrote:

If window is currently hibernating (explicitly or disguisd as fast-start), it's technically still running, can boot itself, make changes etc. while you're sleeping.
So: yes.

Is this even legal...

seth wrote:

Does this only happen if you run the proton VPN over the wifi NIC?
(The other journal has it on the wired NIC)

If so, have you tried to lower the MTU of the proton0 device (to eg. 1280)?

Absolutely not, actually the opposite, but maybe because I'm using it mostly wired.
Never changed the MTU.

Offline

#10 2024-12-13 09:19:02

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

Do you have journals w/ mt7921e bailing w/o running the proton VPN on it (or at all)

Never changed the MTU.
ip l; sudo ip l set proton0 mtu 1280; ip l

VPNs typically need a lower MTU because they add some overhead - not sure whether that would knock out the mt7921e module, though (typically your traffic just breaks)

Offline

#11 2024-12-13 16:17:56

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

seth wrote:

Do you have journals w/ mt7921e bailing w/o running the proton VPN on it (or at all)

Unfortunately not, it's always-on. Do you think it would be useful to test without it?

seth wrote:

(typically your traffic just breaks)

Yeah, rarely happened, but nothing serious like this problem.

Anyway I lowered the MTU but already had a freeze (later in the same session), even if not instantly after the driver crash. I know this because I'm continuously checking the journal in real time (journalctl -f) for a month now; the latest logs just before the system crash are not always saved for the next sessions, so I'm not sure there is everything since I didn't have them on screen at the moment of freeze: http://0x0.st/XFAR.txt
When the driver's crashed, the connection stopped as well (VPN I think). Just reconnected to VPN and managed to use it a couple of minutes before death. I thought I had escaped from the freeze, as sometimes happens...
Really I can't grasp the precise cause.

Offline

#12 2024-12-13 16:24:37

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

Sanity checks:

1. Why are enp3s0 and wlan0 at in use at the same time?
Are they connected to the same  switch/AP? (They're in the same /24 segment)
Why does the wifi not shutdown when you plug the rj45 connection?

2. Do you get the same behavior w/ wpa_supplicant instead of iwd?

Offline

#13 2024-12-13 18:01:53

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

1. I noticed too but thought it's an automatic behavior, at least deprioritizing wlan. Connected to the same switch/AP.
I'm not competent enough on this sad

2. Yes, I was using it months ago.

Offline

#14 2024-12-13 18:56:52

seth
Member
Registered: 2012-09-03
Posts: 60,813

Offline

#15 2024-12-14 00:27:54

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

Bad that I didn't know this.
Still when I'll be on wifi, crashes will occur again, right? If driver is the cause.
Thanks a lot!

Offline

#16 2024-12-14 00:38:54

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

Maybe, maybe not - it's not clear why the driver starts to act up, but it's possible that it's because of how the router/switch handles the dual connection.
So even if this is a bug in the mt7921 module, you might be able to side-step it by avoiding this condition.
(You can also just try to make the mt7921 module crash w/o ever plugging the ethernet cable)

Offline

#17 2024-12-15 17:16:38

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

Didn't work this time either. The connection switch works, wlan is unavailable when wired and rfkill also shows it's soft blocked. Still the drivers crashed (even if without freezing the system), I guess because the module is still loaded into the kernel and that's enough to make it crash (?).

I've also tried the following suggestion without success, blacklisting mt76 and mt7601u. Don't know exactly the reason of this since mt7921e continues to work.

xerxes_ wrote:

Also you may try something like here: https://bbs.archlinux.org/viewtopic.php … 4#p1978614 : blacklist mt76 and mt7601u and modprobe mt7921e.

Offline

#18 2024-12-15 20:27:50

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

I guess because the module is still loaded into the kernel and that's enough to make it crash (?).

The module won't crash when it's unloaded - you could try the latter and then reload it as extension to the ethernet toggle dispatcher.

It's of course a kludge, but if it allows stable usage…

Offline

#19 2024-12-15 23:08:24

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

So we have come to the end of the line...
Then I think I'll replace the card anyway, to definitely solve the thing.

As far as I know, the AX210 has the best Linux support, hasn't it? I don't trust switching to wifi 7 yet.

Offline

#20 2024-12-16 08:42:22

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

There's nothing such as "best", pretty much every driver has their ups and downs. Intel recently had a lot of problems w/ power saving.
Just make sure it's not broadcom. There's no "best", but ther certainly is a worst choice for linux wifi and that's broadcom. Mostly because the driver situation (various open and the closed source driver) is a complete mess.

Did you try un/loading the module w/ the dispatcher script?

Offline

#21 2024-12-21 05:03:01

ReDress
Member
From: Nairobi
Registered: 2024-11-30
Posts: 96

Re: Random mt7921e driver crashes leading to complete system freeze

hertz wrote:

Greetings everyone,

dic 10 10:25:27 A15 kernel: mt7921e 0000:04:00.0: Message 00020003 (seq 15) timeout
dic 10 10:25:28 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:29 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:30 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:31 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:37 A15 kernel: mt7921e 0000:04:00.0: Message 00002ced (seq 1) timeout
dic 10 10:25:38 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:39 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:40 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:41 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:48 A15 kernel: mt7921e 0000:04:00.0: Message 00020003 (seq 2) timeout
dic 10 10:25:54 A15 kernel: mt7921e 0000:04:00.0: Message 00002ced (seq 3) timeout
dic 10 10:25:55 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:55 A15 kernel: ------------[ cut here ]------------
dic 10 10:25:55 A15 kernel: refcount_t: underflow; use-after-free.
dic 10 10:25:55 A15 kernel: WARNING: CPU: 7 PID: 783 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_tcpudp xt_con>
dic 10 10:25:55 A15 kernel:  snd_pcm_dmaengine snd_hda_codec snd_rpl_pci_acp6x uvcvideo mac80211 snd_acp_pci snd>
dic 10 10:25:55 A15 kernel:  drm_suballoc_helper crypto_simd vivaldi_fmap drm_buddy cryptd drm_display_helper nv>
dic 10 10:25:55 A15 kernel: CPU: 7 UID: 0 PID: 783 Comm: kworker/u64:17 Tainted: G           OE      6.12.3-arch>
dic 10 10:25:55 A15 kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
dic 10 10:25:55 A15 kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA507XI_FA507XI/FA507XI, BI>
dic 10 10:25:55 A15 kernel: Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
dic 10 10:25:55 A15 kernel: RIP: 0010:refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel: Code: 01 01 e8 15 37 9f ff 0f 0b e9 49 7c a3 00 80 3d 50 ae b7 01 00 75 85 48 c7 c7 >
dic 10 10:25:55 A15 kernel: RSP: 0018:ffffbcb8c25dbd00 EFLAGS: 00010282
dic 10 10:25:55 A15 kernel: RAX: 0000000000000000 RBX: ffff9de50a0006e8 RCX: 0000000000000027
dic 10 10:25:55 A15 kernel: RDX: ffff9de82e7a18c8 RSI: 0000000000000001 RDI: ffff9de82e7a18c0
dic 10 10:25:55 A15 kernel: RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffbcb8c25dbb80
dic 10 10:25:55 A15 kernel: R10: ffffffffa5cb54a8 R11: 0000000000000003 R12: 0000000000000001
dic 10 10:25:55 A15 kernel: R13: 0000000000000001 R14: ffff9de504d50000 R15: ffff9de50a0006e8
dic 10 10:25:55 A15 kernel: FS:  0000000000000000(0000) GS:ffff9de82e780000(0000) knlGS:0000000000000000
dic 10 10:25:55 A15 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
dic 10 10:25:55 A15 kernel: CR2: 0000735800919000 CR3: 0000000355222000 CR4: 0000000000f50ef0
dic 10 10:25:55 A15 kernel: PKRU: 55555554
dic 10 10:25:55 A15 kernel: Call Trace:
dic 10 10:25:55 A15 kernel:  <TASK>
dic 10 10:25:55 A15 kernel:  ? refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel:  ? __warn.cold+0x93/0xf6
dic 10 10:25:55 A15 kernel:  ? refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel:  ? report_bug+0xff/0x140
dic 10 10:25:55 A15 kernel:  ? handle_bug+0x58/0x90
dic 10 10:25:55 A15 kernel:  ? exc_invalid_op+0x17/0x70
dic 10 10:25:55 A15 kernel:  ? asm_exc_invalid_op+0x1a/0x20
dic 10 10:25:55 A15 kernel:  ? refcount_warn_saturate+0xbe/0x110
dic 10 10:25:55 A15 kernel:  mt76_queue_tx_complete+0x24/0x50 [mt76 76eca38781d45987826792671a4b9dc5bb2f6abc]
dic 10 10:25:55 A15 kernel:  mt76_dma_tx_cleanup+0x1e0/0x2e0 [mt76 76eca38781d45987826792671a4b9dc5bb2f6abc]
dic 10 10:25:55 A15 kernel:  mt792x_wpdma_reset+0x87/0x1e0 [mt792x_lib a868f11f53ed9255b905f0e66cad4a82d8191613]
dic 10 10:25:55 A15 kernel:  mt7921e_mac_reset+0x134/0x310 [mt7921e e9ae0b279e3b13a010cfbbe72317dc12f2225823]
dic 10 10:25:55 A15 kernel:  mt7921_mac_reset_work+0x9d/0x180 [mt7921_common 9dc27b2e0815560ffd32177b461e896e206>
dic 10 10:25:55 A15 kernel:  process_one_work+0x17b/0x330
dic 10 10:25:55 A15 kernel:  worker_thread+0x2ce/0x3f0
dic 10 10:25:55 A15 kernel:  ? __pfx_worker_thread+0x10/0x10
dic 10 10:25:55 A15 kernel:  kthread+0xcf/0x100
dic 10 10:25:55 A15 kernel:  ? __pfx_kthread+0x10/0x10
dic 10 10:25:55 A15 kernel:  ret_from_fork+0x31/0x50
dic 10 10:25:55 A15 kernel:  ? __pfx_kthread+0x10/0x10
dic 10 10:25:55 A15 kernel:  ret_from_fork_asm+0x1a/0x30
dic 10 10:25:55 A15 kernel:  </TASK>
dic 10 10:25:55 A15 kernel: ---[ end trace 0000000000000000 ]---
dic 10 10:25:56 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:57 A15 kernel: mt7921e 0000:04:00.0: driver own failed
dic 10 10:25:58 A15 kernel: mt7921e 0000:04:00.0: Timeout for driver own
dic 10 10:25:58 A15 kernel: mt7921e 0000:04:00.0: chip reset failed
dic 10 10:26:01 A15 kernel: mt7921e 0000:04:00.0: Message 00020001 (seq 4) timeout
dic 10 10:26:02 A15 kernel: mt7921e 0000:04:00.0: driver own failed

The most similar report found: https://bbs.archlinux.org/viewtopic.php?id=292150
Thanks.

It doesn't look like RCU bug which would have made this very interesting. True to that, it seems this driver doesn't even make use of RCU at all.

RCU has some of the simplest kernel interfaces out there but bugs seem to happen all the same. For some strange reason.

Last edited by ReDress (2024-12-21 05:05:13)

Offline

#22 2024-12-28 10:55:56

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

Ok, so I have enough uptime to say that the extended dispatcher script solves the problem when wired.
I've tried to unplug ethernet and the driver crashed in the same session.

Should I report the bug directly to the driver source as well at this point?
I don't know what else I might try or if it's just a local problem on my side.

Offline

#23 2024-12-28 20:23:01

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

mt7921e is meanwhile all over the place, https://bbs.archlinux.org/viewtopic.php … 3#p2216153 and some open bugs on https://bugzilla.kernel.org/buglist.cgi … k-wireless
The cause is actually in the BT component so

Never used Bluetooth, so can't say anything about it.

you might wanna try to deactivate that.

Offline

#24 2024-12-30 16:36:22

hertz
Member
Registered: 2024-12-10
Posts: 12

Re: Random mt7921e driver crashes leading to complete system freeze

Nice collection for this driver, I saw more and more reports on it too.
Glad to know the cause at least. I've already soft killed bluetooth a while ago, and actually never installed bluez. I'm going to blacklist the module then, btmtk/btusb I guess, and all the others at this point for now.
Hoping for a fix then or I'll simply consider replacing the card, staying away from Mediatek and Broadcom.
Should I mark the topic as solved? lol

Thanks again seth, learned new stuff! Congratz for your 60k posts too!! yikes

Offline

#25 2024-12-31 08:57:01

seth
Member
Registered: 2012-09-03
Posts: 60,813

Re: Random mt7921e driver crashes leading to complete system freeze

Should I mark the topic as solved?

If disabling (the unrequired) BT indeed addresses it, yes - so others will know that there's no task left, but maybe a solution to find.

Congratz for your 60k posts too!!

Yeah… I guess that has happened. Let's see whether I can make it wrap before the board shuts down wink

Offline

Board footer

Powered by FluxBB