[solved] kernel 5.15.2 - general protection fault

hamelg · 2021-11-18 22:18:07

2 days ago, I upgrade to linux 5.15.2 from 5.14.16.
Tonight, my system was semi-hanged and I had to push the button reset.
In the journal, I see many kernel errors during the semi-hang.
Anyone else have seen this issue ?

kernel: general protection fault, probably for non-canonical address 0xf7ff9bdac3bf8f38: 0000 [#1] PREEMPT SM>
kernel: CPU: 8 PID: 5454 Comm: DOM Worker Tainted: P           OE     5.15.2-arch1-1 #1 e3bfbeb633edc604ba956>
kernel: Hardware name: Micro-Star International Co., Ltd MS-7B86/B450 GAMING PLUS (MS-7B86), BIOS 1.E0 06/11/>
kernel: RIP: 0010:kmem_cache_alloc+0x10a/0x320
kernel: Code: 5e 48 8b 51 08 48 8b 01 48 83 79 10 00 48 89 04 24 0f 84 b2 01 00 00 48 85 c0 0f 84 a9 01 00 00>
kernel: RSP: 0018:ffffa70002ebfbe8 EFLAGS: 00010282
kernel: RAX: f7ff9bdac3bf8f08 RBX: 0000000000000068 RCX: f7ff9bdac3bf8f38
kernel: RDX: 0000000007791408 RSI: 0000000000000000 RDI: 0000000000030b60
kernel: RBP: ffff9bd940206000 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9bdb50fb9900
kernel: R13: 0000000000408d40 R14: ffffffffa11813fb R15: 0000000000408d40
kernel: FS:  00007f90f41ff640(0000) GS:ffff9bdc4ea00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f90f50fc000 CR3: 000000034d074000 CR4: 00000000003506e0
kernel: Call Trace:
kernel:  alloc_buffer_head+0x1b/0x80
kernel:  alloc_page_buffers+0x9e/0x150
kernel:  create_empty_buffers+0x19/0x110
kernel:  ext4_block_write_begin+0x34c/0x430 [ext4 d8e45ae5e5e63c17f79829492c61819e8ba40ecf]
kernel:  ? ext4_da_release_space+0x120/0x120 [ext4 d8e45ae5e5e63c17f79829492c61819e8ba40ecf]
kernel:  ext4_da_write_begin+0x119/0x2f0 [ext4 d8e45ae5e5e63c17f79829492c61819e8ba40ecf]
kernel:  generic_perform_write+0xd0/0x220
kernel:  ext4_buffered_write_iter+0xa7/0x190 [ext4 d8e45ae5e5e63c17f79829492c61819e8ba40ecf]
kernel:  new_sync_write+0x15c/0x200
kernel:  vfs_write+0x203/0x2a0
kernel:  ksys_write+0x67/0xf0
kernel:  do_syscall_64+0x5c/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f913aa986ff
kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 69 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b>
kernel: RSP: 002b:00007f90f41fa9f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 00007f913aa986b0 RCX: 00007f913aa986ff
kernel: RDX: 00000000003310bb RSI: 00007f8fc6400000 RDI: 000000000000009d
kernel: RBP: 00007f90f41faa20 R08: 0000000000000000 R09: 00007f90fb93c000
kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007f90ec105f68
kernel: R13: 0000000000000003 R14: 00007f90f41fa960 R15: 0000000000000000
kernel: Modules linked in: rpcsec_gss_krb5 tun nf_conntrack_netlink nfnetlink ipt_REJECT nf_reject_ipv4 rpcrd>
kernel:  cryptd usbhid pcspkr mc i2c_piix4 k10temp rng_core rapl soundcore libphy parport_pc parport gpio_amd>
kernel: ---[ end trace dea28a3638fe36a7 ]---
kernel: RIP: 0010:kmem_cache_alloc+0x10a/0x320
kernel: Code: 5e 48 8b 51 08 48 8b 01 48 83 79 10 00 48 89 04 24 0f 84 b2 01 00 00 48 85 c0 0f 84 a9 01 00 00>
kernel: RSP: 0018:ffffa70002ebfbe8 EFLAGS: 00010282
kernel: RAX: f7ff9bdac3bf8f08 RBX: 0000000000000068 RCX: f7ff9bdac3bf8f38
kernel: RDX: 0000000007791408 RSI: 0000000000000000 RDI: 0000000000030b60
kernel: RBP: ffff9bd940206000 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9bdb50fb9900
kernel: R13: 0000000000408d40 R14: ffffffffa11813fb R15: 0000000000408d40
kernel: FS:  00007f90f41ff640(0000) GS:ffff9bdc4ea00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f90f50fc000 CR3: 000000034d074000 CR4: 00000000003506e0

Last edited by hamelg (2021-11-20 16:19:10)

seth · 2021-11-19 06:40:03

many kernel errors during the semi-hang

So why did you post only one?

This is a memory allocation error when ext4 tries to sync a mount, typically because it's unmounted for the shutdown.
The problem is *likely* ahead of this.
Please post the entire journal for a boot w/ hanging shutdown (and please don't copy it out of the pager. redirect it into a file or pastebin service)

hamelg · 2021-11-19 07:27:19

https://app.box.com/s/usdspdogwcyxwoegkp1ewzyjrqygfjz6

seth · 2021-11-19 14:42:38

There's no context (no preceeding erros), for 25 minutes you keep getting crashes in (predominantly) kmem_cache_alloc on attempted file access.
Reproducible? Possibly OOM? You don't have swap, do you?

hamelg · 2021-11-19 16:37:45

Yes, no needing swap and no specific context, no errors or OOM conditions preceding the kernel errors.
Just before rebooting, I ran the free command to check memory and I didn't see nothing shocking.
If this issue is happening again, what can I check ?

seth · 2021-11-20 08:20:38

I meant that there's no context to the crashes in the logs (like FS access errors etc, devices falling off the bus etc)

Do you use "non-standard" parameters for https://wiki.archlinux.org/title/Zswap ?
You could also run https://wiki.archlinux.org/title/Stress … ing_memory overnight to ensure the RAM is intact.

hamelg · 2021-11-20 16:18:45

seth wrote:

You could also run https://wiki.archlinux.org/title/Stress … ing_memory overnight to ensure the RAM is intact.

well spotted !
One of my memory module is faulty.
Thanks much

Arch Linux

#1 2021-11-18 22:18:07

[solved] kernel 5.15.2 - general protection fault

#2 2021-11-19 06:40:03

Re: [solved] kernel 5.15.2 - general protection fault

#3 2021-11-19 07:27:19

Re: [solved] kernel 5.15.2 - general protection fault

#4 2021-11-19 14:42:38

Re: [solved] kernel 5.15.2 - general protection fault

#5 2021-11-19 16:37:45

Re: [solved] kernel 5.15.2 - general protection fault

#6 2021-11-20 08:20:38

Re: [solved] kernel 5.15.2 - general protection fault

#7 2021-11-20 16:18:45

Re: [solved] kernel 5.15.2 - general protection fault

Board footer