You are not logged in.

#1 2019-09-29 18:53:27

emacsomancer
Member
Registered: 2014-09-20
Posts: 211

System hard freeze / "BUG: kernel NULL pointer dereference, address.."

For the past few weeks, I'm getting hard system freezes (i.e. can't switch to ttys, can't use magic sysrq keys), when I check `journalclt -b-1`, this seems to be the relevant issue:

Sep 29 12:36:07 sindhu kernel: BUG: kernel NULL pointer dereference, address: 00000000000002ac
Sep 29 12:36:07 sindhu kernel: #PF: supervisor read access in kernel mode
Sep 29 12:36:07 sindhu kernel: #PF: error_code(0x0000) - not-present page
Sep 29 12:36:07 sindhu kernel: PGD 0 P4D 0
Sep 29 12:36:07 sindhu kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Sep 29 12:36:07 sindhu kernel: CPU: 1 PID: 3359537 Comm: kworker/u8:0 Tainted: P           OE     5.3.1-arch1-1-ARCH #1
Sep 29 12:36:07 sindhu kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Anniversary, BIOS P2.10 03/08/>
Sep 29 12:36:07 sindhu kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Sep 29 12:36:07 sindhu kernel: RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Sep 29 12:36:07 sindhu kernel: Code: f6 09 d1 8b 50 58 83 f1 01 0f b6 c9 e8 e3 10 34 f3 e9 70 ff ff ff 66 66 2e 0f 1f 84 00 00>
Sep 29 12:36:07 sindhu kernel: RSP: 0000:ffff95a48e413a70 EFLAGS: 00010202
Sep 29 12:36:07 sindhu kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000003
Sep 29 12:36:07 sindhu kernel: RDX: ffffffffc0768250 RSI: 0000000000000000 RDI: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: RBP: ffff8ce201d90000 R08: ffff8ce201d90000 R09: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R10: ffff8ce201d90000 R11: ffff8ce50ebaa5b0 R12: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R13: ffff8ce50728dc48 R14: 0000000000000006 R15: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: FS:  0000000000000000(0000) GS:ffff8ce50ea80000(0000) knlGS:0000000000000000
Sep 29 12:36:07 sindhu kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 12:36:07 sindhu kernel: CR2: 00000000000002ac CR3: 00000003890ea002 CR4: 00000000001606e0
Sep 29 12:36:07 sindhu kernel: Call Trace:
Sep 29 12:36:07 sindhu kernel:  dc_commit_state+0x99/0x590 [amdgpu]
Sep 29 12:36:07 sindhu kernel:  amdgpu_dm_atomic_commit_tail+0xfd1/0x1d00 [amdgpu]
Sep 29 12:36:07 sindhu kernel:  ? __update_load_avg_se+0x217/0x310
Sep 29 12:36:07 sindhu kernel:  ? enqueue_entity+0x627/0xcc0
Sep 29 12:36:07 sindhu kernel:  ? commit_tail+0x3c/0x70 [drm_kms_helper]
Sep 29 12:36:07 sindhu kernel:  commit_tail+0x3c/0x70 [drm_kms_helper]
Sep 29 12:36:07 sindhu kernel:  process_one_work+0x1d1/0x3a0
Sep 29 12:36:07 sindhu kernel:  worker_thread+0x4a/0x3d0
Sep 29 12:36:07 sindhu kernel:  kthread+0xfb/0x130
Sep 29 12:36:07 sindhu kernel:  ? process_one_work+0x3a0/0x3a0
Sep 29 12:36:07 sindhu kernel:  ? kthread_park+0x80/0x80
Sep 29 12:36:07 sindhu kernel:  ret_from_fork+0x35/0x40
Sep 29 12:36:07 sindhu kernel: Modules linked in: tun rfcomm fuse xt_tcpudp ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ip>
Sep 29 12:36:07 sindhu kernel:  snd_pcm_dmaengine snd_hwdep e1000e i2c_i801 lpc_ich mei snd_pcm snd_timer snd evdev soundcore >
Sep 29 12:36:07 sindhu kernel: CR2: 00000000000002ac
Sep 29 12:36:07 sindhu kernel: ---[ end trace 569b890ac624cd7b ]---
Sep 29 12:36:07 sindhu kernel: RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Sep 29 12:36:07 sindhu kernel: Code: f6 09 d1 8b 50 58 83 f1 01 0f b6 c9 e8 e3 10 34 f3 e9 70 ff ff ff 66 66 2e 0f 1f 84 00 00>
Sep 29 12:36:07 sindhu kernel: RSP: 0000:ffff95a48e413a70 EFLAGS: 00010202
Sep 29 12:36:07 sindhu kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000003
Sep 29 12:36:07 sindhu kernel: RDX: ffffffffc0768250 RSI: 0000000000000000 RDI: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: RBP: ffff8ce201d90000 R08: ffff8ce201d90000 R09: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R10: ffff8ce201d90000 R11: ffff8ce50ebaa5b0 R12: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R13: ffff8ce50728dc48 R14: 0000000000000006 R15: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: FS:  0000000000000000(0000) GS:ffff8ce50ea80000(0000) knlGS:0000000000000000
Sep 29 12:36:07 sindhu kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 12:36:07 sindhu kernel: CR2: 00000000000002ac CR3: 00000003890ea002 CR4: 00000000001606e0

I'm not sure how to proceed with debugging.

Offline

#2 2019-09-29 19:01:58

paulkerry
Member
From: Sheffield, UK
Registered: 2014-10-02
Posts: 611

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

emacsomancer wrote:

I'm not sure how to proceed with debugging.

and neither can we when you post truncated logs and no other information, like hardware for instance :-)

Offline

#3 2019-09-29 19:14:24

emacsomancer
Member
Registered: 2014-09-20
Posts: 211

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

paulkerry wrote:
emacsomancer wrote:

I'm not sure how to proceed with debugging.

and neither can we when you post truncated logs and no other information, like hardware for instance :-)

What other other log information would be relevant? Everything else in journalctl is mbsync fetching messages or znapzend backing up snapshots. Should I be pulling log information from elsewhere in addition?

Here is the system information:

CPU: Quad Core Intel Core i5-4460 (-MCP-) speed/min/max: 2313/800/3400 MHz Kernel: 5.3.1-arch1-1-ARCH x86_64 Up: 34m 
Mem: 11341.7/15965.0 MiB (71.0%) Storage: 17.06 TiB (66.7% used) Procs: 543 Shell: bash 5.0.9 inxi: 3.0.36 
slade@sindhu:~$ inxi -Fx
System:    Host: sindhu Kernel: 5.3.1-arch1-1-ARCH x86_64 bits: 64 compiler: gcc v: 9.1.0 Desktop: KDE Plasma 5.16.5 
           Distro: Arch Linux 
Machine:   Type: Desktop Mobo: ASRock model: Z97 Anniversary serial: <root required> UEFI: American Megatrends v: P2.10 
           date: 03/08/2018 
Battery:   Device-1: hidpp_battery_0 model: Logitech M510 charge: 55% (should be ignored) status: Discharging 
CPU:       Topology: Quad Core model: Intel Core i5-4460 bits: 64 type: MCP arch: Haswell rev: 3 L2 cache: 6144 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 25555 
           Speed: 1038 MHz min/max: 800/3400 MHz Core speeds (MHz): 1: 1038 2: 1298 3: 1183 4: 1037 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] 
           vendor: Micro-Star MSI driver: amdgpu v: kernel bus ID: 01:00.0 
           Display: x11 server: X.Org 1.20.5 driver: amdgpu unloaded: modesetting resolution: 1920x1080~60Hz 
           OpenGL: renderer: AMD Radeon RX 470 Graphics (POLARIS10 DRM 3.33.0 5.3.1-arch1-1-ARCH LLVM 8.0.1) 
           v: 4.5 Mesa 19.1.7 direct render: Yes 
Audio:     Device-1: Intel 9 Series Family HD Audio vendor: ASRock driver: snd_hda_intel v: kernel bus ID: 00:1b.0 
           Device-2: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: Micro-Star MSI driver: snd_hda_intel 
           v: kernel bus ID: 01:00.1 
           Sound Server: ALSA v: k5.3.1-arch1-1-ARCH 
Network:   Device-1: Intel Ethernet I218-V vendor: ASRock driver: e1000e v: 3.2.6-k port: f040 bus ID: 00:19.0 
           IF: enp0s25 state: up speed: 1000 Mbps duplex: full mac: d0:50:99:68:e1:3d 
           IF-ID-1: tun0 state: unknown speed: 10 Mbps duplex: full mac: N/A 
Sensors:   System Temperatures: cpu: 37.0 C mobo: N/A gpu: amdgpu temp: 46 C 
           Fan Speeds (RPM): N/A gpu: amdgpu fan: 1214 
Info:      Processes: 544 Uptime: 35m Memory: 15.59 GiB used: 11.08 GiB (71.1%) Init: systemd Compilers: gcc: 9.1.0 
           clang: 8.0.1 Shell: bash v: 5.0.9 inxi: 3.0.36 

The GPU is a Radeon RX 470 (since inxi reports the whole range of cards).

Offline

#4 2019-09-29 19:40:34

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

pass "amdgpu.dc=0" to the kernel to likely sidestep the issue.
Since you're on KDE and likely X11 there's a chance that the bug is induced by either the xf86-video-amdgpu or modesetting driver so you could also try the one you're not using atm.

Offline

#5 2019-09-29 20:05:54

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,428

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

@emacsomancer You've hit report instead of reply:

emacsomancer wrote:

Thanks.

So `amdgpu.dc=0` disables the Display Core (according to https://www.kernel.org/doc/html/latest/ … pu-dc.html ). Are there downsides to doing this?

(I recall that using modesetting was non-ideal for some reason or other.)

Offline

#6 2019-09-29 20:12:12

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

You mean "Downsides" compared to a halted kernel?
Iirc some HiDPI features rely on it and it will probably perform better even on older chips - but not when it halts the kernel.

I'd suggest to first try and see whether the problem is gone and whether you notice some actual downsides on your configuration…

Offline

#7 2019-09-29 22:55:20

emacsomancer
Member
Registered: 2014-09-20
Posts: 211

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

seth wrote:

You mean "Downsides" compared to a halted kernel?
Iirc some HiDPI features rely on it and it will probably perform better even on older chips - but not when it halts the kernel.

I'd suggest to first try and see whether the problem is gone and whether you notice some actual downsides on your configuration…

I mean it only freezes once every few days, so... but in general I just wanted to have a better understanding of how the configuration option was doing (whether it was more of a short-term fix or a long-term fix &c.)

But since I'm not using HiDPI, maybe it's not a bad interim solution in any event.

(Apologies for hitting report rather than reply by mistake.)

Offline

#8 2019-09-29 23:15:45

loqs
Member
Registered: 2014-03-06
Posts: 17,197

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

Does the time frame coincide with the system being updated to linux 5.3?
Edit:
If it is https://bugzilla.kernel.org/show_bug.cgi?id=204181 then the fix should be in 5.3.2 which includes the patches from https://patchwork.freedesktop.org/series/64505/

Last edited by loqs (2019-09-29 23:19:49)

Offline

#9 2019-09-30 00:47:52

emacsomancer
Member
Registered: 2014-09-20
Posts: 211

Re: System hard freeze / "BUG: kernel NULL pointer dereference, address.."

loqs wrote:

Does the time frame coincide with the system being updated to linux 5.3?
Edit:
If it is https://bugzilla.kernel.org/show_bug.cgi?id=204181 then the fix should be in 5.3.2 which includes the patches from https://patchwork.freedesktop.org/series/64505/

No, it occurred with 5.2 as well.

Offline

Board footer

Powered by FluxBB