You are not logged in.

#1 2020-01-13 05:54:43

Nagefire
Member
Registered: 2016-03-21
Posts: 11

Monitor enters sleep state without warning

After a recent update, my monitors suddenly black out at random intervals. After each blackout, the monitor
refuses to wake back up and I'm forced to do a hard reboot. At first, it only occured when I left my desktop
unattended for ~20 minutes, leading me to believe that I had on dpms or screensaver. However, xset
showed that I had disabled dpms and screensaver (relevant output shown below)

Screen Saver:
  prefer blanking:  yes    allow exposures:  yes
  timeout:  0    cycle:  600
DPMS (Energy Star):
  Standby: 600    Suspend: 600    Off: 600
  DPMS is Disabled

The next time my monitor blacked, I was writing a document in Latex, so I've ruled out dpms and screen saver as
possibilities. After rebooting, I reviewed the journal and found some kernel messages for the graphics card

Jan 12 20:59:06 NageArch kernel: radeon 0000:03:00.0: ring 0 stalled for more than 10330msec
Jan 12 20:59:06 NageArch kernel: radeon 0000:03:00.0: GPU lockup (current fence id 0x000000000005e2b1 last fence id 0x000000000005e2cc on ring 0)
Jan 12 20:59:06 NageArch kernel: radeon 0000:03:00.0: failed to get a new IB (-35)
Jan 12 20:59:06 NageArch kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
Jan 12 20:59:06 NageArch kernel: radeon 0000:03:00.0: failed to get a new IB (-35)
Jan 12 20:59:06 NageArch kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
Jan 12 20:59:06 NageArch kernel: BUG: unable to handle page fault for address: ffffb3cac0a65ffc
Jan 12 20:59:06 NageArch kernel: #PF: supervisor read access in kernel mode
Jan 12 20:59:06 NageArch kernel: #PF: error_code(0x0000) - not-present page

followed by a dump of the kernel state and a call trace, which looks like this

Jan 12 20:59:06 NageArch kernel: Call Trace:
Jan 12 20:59:06 NageArch kernel:  radeon_gpu_reset+0xc7/0x2f0 [radeon]
Jan 12 20:59:06 NageArch kernel:  radeon_cs_ioctl+0x28d/0x7d0 [radeon]
Jan 12 20:59:06 NageArch kernel:  ? __switch_to_asm+0x34/0x70
Jan 12 20:59:06 NageArch kernel:  ? radeon_cs_parser_init+0x500/0x500 [radeon]
Jan 12 20:59:06 NageArch kernel:  drm_ioctl_kernel+0xb2/0x100 [drm]
Jan 12 20:59:06 NageArch kernel:  drm_ioctl+0x209/0x360 [drm]
Jan 12 20:59:06 NageArch kernel:  ? radeon_cs_parser_init+0x500/0x500 [radeon]
Jan 12 20:59:06 NageArch kernel:  radeon_drm_ioctl+0x49/0x80 [radeon]
Jan 12 20:59:06 NageArch kernel:  do_vfs_ioctl+0x43d/0x6c0
Jan 12 20:59:06 NageArch kernel:  ksys_ioctl+0x5e/0x90
Jan 12 20:59:06 NageArch kernel:  __x64_sys_ioctl+0x16/0x20
Jan 12 20:59:06 NageArch kernel:  do_syscall_64+0x4e/0x140
Jan 12 20:59:06 NageArch kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9

After reading this, I had narrowed it down to a driver error, so I reviewed the details of my provider (xf86-video-amdgpu)
but I noticed that the last update had occurred in October of 2019 and this bug had just started, so I'm not sure
if it's the behavior of the driver or something else. I haven't been able to reproduce the error on purpose but I believe
that it may be linked to glrnvim, since I was running it before the bug occurred both times. Any help is welcome

Offline

#2 2020-01-13 09:15:34

seth
Member
Registered: 2012-09-03
Posts: 51,017

Re: Monitor enters sleep state without warning

It's rather the kernel module than the X11 driver.
Try passing

radeon.audio=0 radeon.dpm=0 radeon.aspm=0 radeon.runpm=0 radeon.bapm=0 radeon.backlight=0

to the kernel.
https://wiki.archlinux.org/index.php/Kernel_parameters

This will disable audio, a bunch of powermanagement features and backlight handling - see if the problem remains and if not, try to isolate the crucial one (probably one of the PM features)

There's also a chance that this is induced by the xf86 driver, but xf86-video-amdgpu doesn't control the radeon module anyway (but the newer amdgpu one for southern island chips and newer)

Offline

#3 2020-01-14 00:08:02

Nagefire
Member
Registered: 2016-03-21
Posts: 11

Re: Monitor enters sleep state without warning

I rebooted and passed the parameters to the kernel but this seemed to make the bug worse. The monitor
blacked in ~7 minutes both times that I used these parameters with error logs almost identical. Notably,
I was watching youtube both times and within seconds of the video begininning the blackout happened.
Previously, this bug did not occur while I was watching videos so I can only conclude that one of the parameters
caused this behavior, I will be testing to isolate which one.

Offline

#4 2020-01-16 02:02:02

Nagefire
Member
Registered: 2016-03-21
Posts: 11

Re: Monitor enters sleep state without warning

So, I did a little more digging and it seems that AMD GPUs have been having trouble with the kernel since version 5.1.14. I looked all the way back to my logs the day the GPU error first
occurred (2019-1-12) and found that the module is causing a kernel oops. I'm not sure what is causing it, but looking through logs of a similiar error with other AMD GPUs shows that
the others attempt a soft reboot. Following advice from here advises passing the kernel parameters

idle=nowait rcu_nocbs=0-3

I've rebooted and passed the parameters to the kernel, will add an further results to the thread.
On another note, I found the first kernel oops:

Jan 12 16:13:52 NageArch kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 12 16:13:52 NageArch kernel: CPU: 8 PID: 575 Comm: Xorg:rcs0 Not tainted 5.4.10-arch1-1 #1
Jan 12 16:13:52 NageArch kernel: Hardware name: Dell Inc. Precision T3600/08HPGT, BIOS A14 09/29/2014
Jan 12 16:13:52 NageArch kernel: RIP: 0010:radeon_ring_backup+0xc0/0x140 [radeon]
Jan 12 16:13:52 NageArch kernel: Code: db 49 89 06 48 85 c0 74 7b 41 8d 7c 24 ff 31 d2 48 c1 e7 02 eb 07 49 8b 06 48 83 c2 04 48 8b 75 08 8d 4b 01 89 db 48 8d 34 9e <8b> 36 89 34 10 23 4d 54 89 cb 48 39 d7 75 dd 4c 89 ef e8 e9 ef 34
Jan 12 16:13:52 NageArch kernel: RSP: 0018:ffffa085c08079d8 EFLAGS: 00010206
Jan 12 16:13:52 NageArch kernel: RAX: ffff8ba6b1700000 RBX: 00000000ffffffff RCX: 0000000000000000
Jan 12 16:13:52 NageArch kernel: RDX: 0000000000000000 RSI: ffffa089c16d4ffc RDI: 00000000000d4b00
Jan 12 16:13:52 NageArch kernel: RBP: ffff8ba7f13b94c8 R08: 00000000000301c7 R09: 00000000001eb000
Jan 12 16:13:52 NageArch kernel: R10: 00000000000301c0 R11: 0000000000000000 R12: 00000000000352c1
Jan 12 16:13:52 NageArch kernel: R13: ffff8ba7f13b94a8 R14: ffffa085c0807a40 R15: ffff8ba7f13b8000
Jan 12 16:13:52 NageArch kernel: FS:  00007fa46fdcd700(0000) GS:ffff8ba7ff600000(0000) knlGS:0000000000000000
Jan 12 16:13:52 NageArch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 16:13:52 NageArch kernel: CR2: ffffa089c16d4ffc CR3: 0000000438d5c004 CR4: 00000000000606e0

Reviewing my pacman.log file, I found that I had upgraded to the latest version of mesa just prior to this event, which seems to be the most likely candidate for the crash.
The log shows that libpulse, shaderc, and pulseaudio were also upgraded, although it seems less likely that one of these would cause screen blackouts.

Offline

#5 2020-01-16 04:50:15

Nagefire
Member
Registered: 2016-03-21
Posts: 11

Re: Monitor enters sleep state without warning

After performing the reboot, a screen blackout occurred shortly after I opened libreoffice, I passed the parameter

iommu=pt

after I had rebooted, so far I've opened libreoffice again without problems.

Offline

Board footer

Powered by FluxBB