You are not logged in.
For the past few weeks, I'm getting hard system freezes (i.e. can't switch to ttys, can't use magic sysrq keys), when I check `journalclt -b-1`, this seems to be the relevant issue:
Sep 29 12:36:07 sindhu kernel: BUG: kernel NULL pointer dereference, address: 00000000000002ac
Sep 29 12:36:07 sindhu kernel: #PF: supervisor read access in kernel mode
Sep 29 12:36:07 sindhu kernel: #PF: error_code(0x0000) - not-present page
Sep 29 12:36:07 sindhu kernel: PGD 0 P4D 0
Sep 29 12:36:07 sindhu kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Sep 29 12:36:07 sindhu kernel: CPU: 1 PID: 3359537 Comm: kworker/u8:0 Tainted: P OE 5.3.1-arch1-1-ARCH #1
Sep 29 12:36:07 sindhu kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Anniversary, BIOS P2.10 03/08/>
Sep 29 12:36:07 sindhu kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Sep 29 12:36:07 sindhu kernel: RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Sep 29 12:36:07 sindhu kernel: Code: f6 09 d1 8b 50 58 83 f1 01 0f b6 c9 e8 e3 10 34 f3 e9 70 ff ff ff 66 66 2e 0f 1f 84 00 00>
Sep 29 12:36:07 sindhu kernel: RSP: 0000:ffff95a48e413a70 EFLAGS: 00010202
Sep 29 12:36:07 sindhu kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000003
Sep 29 12:36:07 sindhu kernel: RDX: ffffffffc0768250 RSI: 0000000000000000 RDI: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: RBP: ffff8ce201d90000 R08: ffff8ce201d90000 R09: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R10: ffff8ce201d90000 R11: ffff8ce50ebaa5b0 R12: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R13: ffff8ce50728dc48 R14: 0000000000000006 R15: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: FS: 0000000000000000(0000) GS:ffff8ce50ea80000(0000) knlGS:0000000000000000
Sep 29 12:36:07 sindhu kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 12:36:07 sindhu kernel: CR2: 00000000000002ac CR3: 00000003890ea002 CR4: 00000000001606e0
Sep 29 12:36:07 sindhu kernel: Call Trace:
Sep 29 12:36:07 sindhu kernel: dc_commit_state+0x99/0x590 [amdgpu]
Sep 29 12:36:07 sindhu kernel: amdgpu_dm_atomic_commit_tail+0xfd1/0x1d00 [amdgpu]
Sep 29 12:36:07 sindhu kernel: ? __update_load_avg_se+0x217/0x310
Sep 29 12:36:07 sindhu kernel: ? enqueue_entity+0x627/0xcc0
Sep 29 12:36:07 sindhu kernel: ? commit_tail+0x3c/0x70 [drm_kms_helper]
Sep 29 12:36:07 sindhu kernel: commit_tail+0x3c/0x70 [drm_kms_helper]
Sep 29 12:36:07 sindhu kernel: process_one_work+0x1d1/0x3a0
Sep 29 12:36:07 sindhu kernel: worker_thread+0x4a/0x3d0
Sep 29 12:36:07 sindhu kernel: kthread+0xfb/0x130
Sep 29 12:36:07 sindhu kernel: ? process_one_work+0x3a0/0x3a0
Sep 29 12:36:07 sindhu kernel: ? kthread_park+0x80/0x80
Sep 29 12:36:07 sindhu kernel: ret_from_fork+0x35/0x40
Sep 29 12:36:07 sindhu kernel: Modules linked in: tun rfcomm fuse xt_tcpudp ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ip>
Sep 29 12:36:07 sindhu kernel: snd_pcm_dmaengine snd_hwdep e1000e i2c_i801 lpc_ich mei snd_pcm snd_timer snd evdev soundcore >
Sep 29 12:36:07 sindhu kernel: CR2: 00000000000002ac
Sep 29 12:36:07 sindhu kernel: ---[ end trace 569b890ac624cd7b ]---
Sep 29 12:36:07 sindhu kernel: RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Sep 29 12:36:07 sindhu kernel: Code: f6 09 d1 8b 50 58 83 f1 01 0f b6 c9 e8 e3 10 34 f3 e9 70 ff ff ff 66 66 2e 0f 1f 84 00 00>
Sep 29 12:36:07 sindhu kernel: RSP: 0000:ffff95a48e413a70 EFLAGS: 00010202
Sep 29 12:36:07 sindhu kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000003
Sep 29 12:36:07 sindhu kernel: RDX: ffffffffc0768250 RSI: 0000000000000000 RDI: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: RBP: ffff8ce201d90000 R08: ffff8ce201d90000 R09: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R10: ffff8ce201d90000 R11: ffff8ce50ebaa5b0 R12: 0000000000000000
Sep 29 12:36:07 sindhu kernel: R13: ffff8ce50728dc48 R14: 0000000000000006 R15: ffff8ce507050000
Sep 29 12:36:07 sindhu kernel: FS: 0000000000000000(0000) GS:ffff8ce50ea80000(0000) knlGS:0000000000000000
Sep 29 12:36:07 sindhu kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 12:36:07 sindhu kernel: CR2: 00000000000002ac CR3: 00000003890ea002 CR4: 00000000001606e0
I'm not sure how to proceed with debugging.
Offline
I'm not sure how to proceed with debugging.
and neither can we when you post truncated logs and no other information, like hardware for instance :-)
Offline
emacsomancer wrote:I'm not sure how to proceed with debugging.
and neither can we when you post truncated logs and no other information, like hardware for instance :-)
What other other log information would be relevant? Everything else in journalctl is mbsync fetching messages or znapzend backing up snapshots. Should I be pulling log information from elsewhere in addition?
Here is the system information:
CPU: Quad Core Intel Core i5-4460 (-MCP-) speed/min/max: 2313/800/3400 MHz Kernel: 5.3.1-arch1-1-ARCH x86_64 Up: 34m
Mem: 11341.7/15965.0 MiB (71.0%) Storage: 17.06 TiB (66.7% used) Procs: 543 Shell: bash 5.0.9 inxi: 3.0.36
slade@sindhu:~$ inxi -Fx
System: Host: sindhu Kernel: 5.3.1-arch1-1-ARCH x86_64 bits: 64 compiler: gcc v: 9.1.0 Desktop: KDE Plasma 5.16.5
Distro: Arch Linux
Machine: Type: Desktop Mobo: ASRock model: Z97 Anniversary serial: <root required> UEFI: American Megatrends v: P2.10
date: 03/08/2018
Battery: Device-1: hidpp_battery_0 model: Logitech M510 charge: 55% (should be ignored) status: Discharging
CPU: Topology: Quad Core model: Intel Core i5-4460 bits: 64 type: MCP arch: Haswell rev: 3 L2 cache: 6144 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 25555
Speed: 1038 MHz min/max: 800/3400 MHz Core speeds (MHz): 1: 1038 2: 1298 3: 1183 4: 1037
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
vendor: Micro-Star MSI driver: amdgpu v: kernel bus ID: 01:00.0
Display: x11 server: X.Org 1.20.5 driver: amdgpu unloaded: modesetting resolution: 1920x1080~60Hz
OpenGL: renderer: AMD Radeon RX 470 Graphics (POLARIS10 DRM 3.33.0 5.3.1-arch1-1-ARCH LLVM 8.0.1)
v: 4.5 Mesa 19.1.7 direct render: Yes
Audio: Device-1: Intel 9 Series Family HD Audio vendor: ASRock driver: snd_hda_intel v: kernel bus ID: 00:1b.0
Device-2: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: Micro-Star MSI driver: snd_hda_intel
v: kernel bus ID: 01:00.1
Sound Server: ALSA v: k5.3.1-arch1-1-ARCH
Network: Device-1: Intel Ethernet I218-V vendor: ASRock driver: e1000e v: 3.2.6-k port: f040 bus ID: 00:19.0
IF: enp0s25 state: up speed: 1000 Mbps duplex: full mac: d0:50:99:68:e1:3d
IF-ID-1: tun0 state: unknown speed: 10 Mbps duplex: full mac: N/A
Sensors: System Temperatures: cpu: 37.0 C mobo: N/A gpu: amdgpu temp: 46 C
Fan Speeds (RPM): N/A gpu: amdgpu fan: 1214
Info: Processes: 544 Uptime: 35m Memory: 15.59 GiB used: 11.08 GiB (71.1%) Init: systemd Compilers: gcc: 9.1.0
clang: 8.0.1 Shell: bash v: 5.0.9 inxi: 3.0.36
The GPU is a Radeon RX 470 (since inxi reports the whole range of cards).
Offline
pass "amdgpu.dc=0" to the kernel to likely sidestep the issue.
Since you're on KDE and likely X11 there's a chance that the bug is induced by either the xf86-video-amdgpu or modesetting driver so you could also try the one you're not using atm.
Online
@emacsomancer You've hit report instead of reply:
Thanks.
So `amdgpu.dc=0` disables the Display Core (according to https://www.kernel.org/doc/html/latest/ … pu-dc.html ). Are there downsides to doing this?
(I recall that using modesetting was non-ideal for some reason or other.)
Offline
You mean "Downsides" compared to a halted kernel?
Iirc some HiDPI features rely on it and it will probably perform better even on older chips - but not when it halts the kernel.
I'd suggest to first try and see whether the problem is gone and whether you notice some actual downsides on your configuration…
Online
You mean "Downsides" compared to a halted kernel?
Iirc some HiDPI features rely on it and it will probably perform better even on older chips - but not when it halts the kernel.I'd suggest to first try and see whether the problem is gone and whether you notice some actual downsides on your configuration…
I mean it only freezes once every few days, so... but in general I just wanted to have a better understanding of how the configuration option was doing (whether it was more of a short-term fix or a long-term fix &c.)
But since I'm not using HiDPI, maybe it's not a bad interim solution in any event.
(Apologies for hitting report rather than reply by mistake.)
Offline
Does the time frame coincide with the system being updated to linux 5.3?
Edit:
If it is https://bugzilla.kernel.org/show_bug.cgi?id=204181 then the fix should be in 5.3.2 which includes the patches from https://patchwork.freedesktop.org/series/64505/
Last edited by loqs (2019-09-29 23:19:49)
Offline
Does the time frame coincide with the system being updated to linux 5.3?
Edit:
If it is https://bugzilla.kernel.org/show_bug.cgi?id=204181 then the fix should be in 5.3.2 which includes the patches from https://patchwork.freedesktop.org/series/64505/
No, it occurred with 5.2 as well.
Offline