You are not logged in.

#1 2023-07-14 14:47:39

bradwiggo
Member
Registered: 2023-06-05
Posts: 4

GPU Hang Issue, Laptop completely freezes except cursor, hard reset

A while ago I was having an issue on my laptop where it would experience GPU Hangs which caused the entire system to freeze up. I ended up fixing it by moving from Ubuntu which I was using at the time, to Arch, so I assumed the newer kernel updates helped as my laptop is quite new (specs at bottom of post). However today the issue suddenly happened again. It happened while I was playing a game (Terraria) and watching a video.

I will post the relevant lines from journalctl, if any other log files would be of use I can post them also.

Relevant journalctl lines:

Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Main Thread [11143]
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] Resetting chip for GuC failed to reset engine mask=0x1
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] Main Thread[11143] context reset due to GPU hang
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated!
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Jul 14 15:25:10 laptop kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Jul 14 15:25:21 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b8a timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:25:21 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b8a timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:25:22 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b8a timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:25:30 laptop kernel: Fence expiration time out i915-0000:00:02.0:Main Thread<11143>:22a49e!
Jul 14 15:25:33 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b96 timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:25:33 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b96 timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:25:33 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b96 timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:25:53 laptop dbus-daemon[627]: [session uid=1000 pid=627] Activating service name='org.xfce.Xfconf' requested by ':1.14' (uid=1000 pid=751 comm="xfce4-panel --display :0.0 --sm-client-id 200e9553")
Jul 14 15:25:53 laptop dbus-daemon[627]: [session uid=1000 pid=627] Successfully activated service 'org.xfce.Xfconf'
Jul 14 15:26:07 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b96 timed out (hint:intel_atomic_commit_ready [i915])
Jul 14 15:26:32 laptop kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[462]:376b96 timed out (hint:intel_atomic_commit_ready [i915])

What could the issue be? Has anybody else experienced this and if so, did/how did you fix it?

Important Info:

Laptop: Lenovo Yoga 7i Slim Pro
CPU: i7-1260p
RAM: 16GB DDR5
GPU: Integrated
OS: Arch Latest (last fully updated a few days ago)
DE: Xfce 4 with Xfwm
Kernel: 6.4.2-arch1-1

Offline

#2 2023-09-10 20:24:31

Moxon
Member
Registered: 2017-01-30
Posts: 10

Re: GPU Hang Issue, Laptop completely freezes except cursor, hard reset

I have the same issue since a couple of days.  Interesstingly also while playing Terraria.  I tried a 6.4, a 6.5 and also the linux-next kernel, all locking up X after a while of gameplay.

Unfortunately I have no solution.

$ inxi -a
CPU: 14-core (6-mt/8-st) 13th Gen Intel Core i9-13900H (-MST AMCP-)
speed/min/max: 454/400/5200:5400:4100 MHz
Kernel: 6.5.0-next-20230908-1-next-git-14143-gaf3c30d33476 x86_64 Up: 30m
Mem: 4.85/30.99 GiB (15.6%) Storage: 5.5 TiB (46.5% used) Procs: 447
Shell: Bash 5.1.16 inxi: 3.3.29

journalctl:

$ journalctl  --boot -t kernel | rg i915
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] VT-d active for gfx access
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] Using Transparent Hugepages
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
Sep 10 21:50:52 xox kernel: [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
Sep 10 21:50:52 xox kernel: fbcon: i915drmfb (fb0) is primary device
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
Sep 10 21:50:53 xox kernel: mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
Sep 10 21:50:53 xox kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
Sep 10 21:50:53 xox kernel: snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Main Thread [5298]
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] Resetting chip for GuC failed to reset engine mask=0x1
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] Main Thread[5298] context reset due to GPU hang
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Sep 10 22:15:30 xox kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dfbfff, in Main Thread [5298]
Sep 10 22:15:30 xox kernel: i915 0000:00:02.0: [drm] Main Thread[5298] context reset due to GPU hang
Sep 10 22:15:41 xox kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[3167]:24594 timed out (hint:intel_atomic_commit_ready [i915])

Offline

#3 2023-11-04 15:46:23

bradwiggo
Member
Registered: 2023-06-05
Posts: 4

Re: GPU Hang Issue, Laptop completely freezes except cursor, hard reset

Moxon wrote:

I have the same issue since a couple of days.  Interesstingly also while playing Terraria.  I tried a 6.4, a 6.5 and also the linux-next kernel, all locking up X after a while of gameplay.

Unfortunately I have no solution.

$ inxi -a
CPU: 14-core (6-mt/8-st) 13th Gen Intel Core i9-13900H (-MST AMCP-)
speed/min/max: 454/400/5200:5400:4100 MHz
Kernel: 6.5.0-next-20230908-1-next-git-14143-gaf3c30d33476 x86_64 Up: 30m
Mem: 4.85/30.99 GiB (15.6%) Storage: 5.5 TiB (46.5% used) Procs: 447
Shell: Bash 5.1.16 inxi: 3.3.29

journalctl:

$ journalctl  --boot -t kernel | rg i915
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] VT-d active for gfx access
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] Using Transparent Hugepages
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
Sep 10 21:50:52 xox kernel: [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
Sep 10 21:50:52 xox kernel: fbcon: i915drmfb (fb0) is primary device
Sep 10 21:50:52 xox kernel: i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
Sep 10 21:50:53 xox kernel: mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
Sep 10 21:50:53 xox kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
Sep 10 21:50:53 xox kernel: snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Main Thread [5298]
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] Resetting chip for GuC failed to reset engine mask=0x1
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] Main Thread[5298] context reset due to GPU hang
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Sep 10 22:15:22 xox kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Sep 10 22:15:30 xox kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dfbfff, in Main Thread [5298]
Sep 10 22:15:30 xox kernel: i915 0000:00:02.0: [drm] Main Thread[5298] context reset due to GPU hang
Sep 10 22:15:41 xox kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[3167]:24594 timed out (hint:intel_atomic_commit_ready [i915])

Interesting that it was also happening with Terraria.

Have you played other games and it hasn't crashed? For me it would happen after 2-3+ hours of playing the game, was that the same for you?

The issue I ran into was there is no way to reliably reproduce the issue *without* just playing the game for a couple of hours, which makes testing rather difficult. I even tried setting up a simple script to move the character around and fire a weapon in the game, but after about 4 hours of testing it never crashed (this was on the same kernel, same everything). Haven't tested it recently (ended up playing on Switch instead), but maybe I'll give it a go again with a script, see if I can get a reproducible crash.

Last edited by bradwiggo (2023-11-04 15:46:34)

Offline

#4 2023-11-24 15:22:16

saveman71
Member
Registered: 2017-01-03
Posts: 9

Re: GPU Hang Issue, Laptop completely freezes except cursor, hard reset

I had the same thing happen to me today:

i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Renderer [4652]
i915 0000:00:02.0: [drm] Resetting chip for GuC failed to reset engine mask=0x1
i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
i915 0000:00:02.0: [drm] Renderer[4652] context reset due to GPU hang
i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled

However, in my case the GPU hang was successfully recovered (I'm on wayland/sway) and things got back to responsive within 1s or 2.

I've had a few dozen freezes already with this laptop, sometimes once per day since removing these kernel parameters `i915.enable_psr=0 i915.enable_dc=0 intel_idle.max_cstate=2 i915.enable_fbc=0`. With them, I had NO crash BUT it comes with poorer performance and awful battery life while sleeping + noiser / hotter in general, so I don't recommend them and neither the arch wiki (can't find the source, unfortunately).

I just updated the kernel 3 hours ago and am just back on ~mainline~ zen (before I was on LTS to try to avoid these freezes), and this is the first time this happens to me, so either I'm lucky to catch it in the journal, or this is a completely different problem... Usually I have to hard reset the laptop, so the disk doesn't have time to persist the logs... It's been hard to debug, so I hope it's the same freeze.

Let's hope now that they always recover, and I can avoid these hard resets...

Specs: Dell XPS 15 9520

CPU: 14-core (6-mt/8-st) 12th Gen Intel Core i9-12900HK (-MST AMCP-)
speed/min/max: 890/400/4900:5000:3800 MHz Kernel: 6.6.2-zen1-1-zen x86_64
Up: 6h 8m Mem: 16.21/31.02 GiB (52.2%) Storage: 953.87 GiB (51.5% used)
Procs: 560 Shell: Zsh inxi: 3.3.31

Last edited by saveman71 (2023-11-24 15:24:15)

Offline

#5 2024-01-02 11:42:09

Dingisoul
Member
Registered: 2023-12-22
Posts: 4

Re: GPU Hang Issue, Laptop completely freezes except cursor, hard reset

I have the same issue. The Intel integrated GPU hangs in idle mode and doesn't resume. Additionally, the external monitor shows only a backlight, with no content displayed.

The logs related to i915 driver.

Jan 02 19:03:09 ArchLu kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in Xwayland [1292]
Jan 02 19:03:09 ArchLu kernel: i915 0000:00:02.0: [drm] Xwayland[1292] context reset due to GPU hang
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Xwayland [1292]
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] Resetting chip for GuC failed to reset engine mask=0x1
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] Renderer[1638] context reset due to GPU hang
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] Xwayland[1292] context reset due to GPU hang
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.13.1
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Jan 02 19:03:18 ArchLu kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Jan 02 19:03:29 ArchLu kernel: Asynchronous wait on fence 0000:00:02.0:sway[1179]:2d55c6 timed out (hint:intel_atomic_commit_ready [i915])
Jan 02 19:03:29 ArchLu kernel: Asynchronous wait on fence 0000:00:02.0:sway[1179]:2d55c2 timed out (hint:intel_atomic_commit_ready [i915])

There is one section in wiki dedicated to this issue https://wiki.archlinux.org/title/Intel_ … tel_driver. However, it seems impact the performance, I'll consider trying it later  if other solutions don't resolve the problem

Offline

#6 2024-03-26 17:55:59

0xlogn
Member
Registered: 2023-02-09
Posts: 5

Re: GPU Hang Issue, Laptop completely freezes except cursor, hard reset

Offline

Board footer

Powered by FluxBB