You are not logged in.

#26 2019-12-11 10:22:43

Archanfel80HUN
Member
Registered: 2017-04-27
Posts: 6

Re: i915 Skylake GPU hangs with kernel 5.3.11

5.4.2 still freezes here. sad
Tried today.

loqs wrote:

673 / 674 has been marked fixed https://patchwork.freedesktop.org/patch/344105/
670,  713 and 712 were also marked as duplicates of 673.
Edit:
attempt to rebase patch onto 5.4.2

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 06a506c29463..b70a59cdcdf2 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -471,12 +471,6 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
 	return desc;
 }
 
-static void unwind_wa_tail(struct i915_request *rq)
-{
-	rq->tail = intel_ring_wrap(rq->ring, rq->wa_tail - WA_TAIL_BYTES);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-}
-
 static struct i915_request *
 __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
@@ -495,7 +489,6 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 			continue; /* XXX */
 
 		__i915_request_unsubmit(rq);
-		unwind_wa_tail(rq);
 
 		/*
 		 * Push the request back into the queue for later resubmission.
@@ -649,13 +642,29 @@ execlists_schedule_out(struct i915_request *rq)
 	i915_request_put(rq);
 }
 
-static u64 execlists_update_context(const struct i915_request *rq)
+static u64 execlists_update_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->hw_context;
-	u64 desc;
+	u64 desc = ce->lrc_desc;
+	u32 tail;
 
-	ce->lrc_reg_state[CTX_RING_TAIL + 1] =
-		intel_ring_set_tail(rq->ring, rq->tail);
+	/*
+	 * WaIdleLiteRestore:bdw,skl
+	 *
+	 * We should never submit the context with the same RING_TAIL twice
+	 * just in case we submit an empty ring, which confuses the HW.
+	 *
+	 * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of
+	 * the normal request to be able to always advance the RING_TAIL on
+	 * subsequent resubmissions (for lite restore). Should that fail us,
+	 * and we try and submit the same tail again, force the context
+	 * reload.
+	 */
+	tail = intel_ring_set_tail(rq->ring, rq->tail);
+	if (unlikely(ce->lrc_reg_state[CTX_RING_TAIL] == tail))
+		desc |= CTX_DESC_FORCE_RESTORE;
+	ce->lrc_reg_state[CTX_RING_TAIL] = tail;
+	rq->tail = rq->wa_tail;
 
 	/*
 	 * Make sure the context image is complete before we submit it to HW.
@@ -674,9 +683,7 @@ static u64 execlists_update_context(const struct i915_request *rq)
 	 */
 	mb();
 
-	desc = ce->lrc_desc;
 	ce->lrc_desc &= ~CTX_DESC_FORCE_RESTORE;
-
 	return desc;
 }
 
@@ -1149,16 +1156,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			if (!list_is_last(&last->sched.link,
 					  &engine->active.requests))
 				return;
-
-			/*
-			 * WaIdleLiteRestore:bdw,skl
-			 * Apply the wa NOOPs to prevent
-			 * ring:HEAD == rq:TAIL as we resubmit the
-			 * request. See gen8_emit_fini_breadcrumb() for
-			 * where we prepare the padding after the
-			 * end of the request.
-			 */
-			last->tail = last->wa_tail;
 		}
 	}
 

Offline

#27 2019-12-11 12:04:26

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

Thank you for trying.  So no improvement with the patch on your system.
Is there anything in dmesg?

Offline

#28 2019-12-11 15:00:12

dcc24
Member
Registered: 2009-10-31
Posts: 731

Re: i915 Skylake GPU hangs with kernel 5.3.11

Just ran into this issue for the first time with kernel 5.4.2. Relevant journalctl logs:

Dec 11 14:30:44 laptop kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Dec 11 14:30:44 laptop kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Dec 11 14:30:44 laptop kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Dec 11 14:30:44 laptop kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Dec 11 14:30:44 laptop kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Dec 11 14:30:44 laptop kernel: GPU crash dump saved to /sys/class/drm/card0/error
Dec 11 14:30:44 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:30:48 laptop kernel: Asynchronous wait on fence i915:gnome-shell[1210]:2a2dd0 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 11 14:30:48 laptop kernel: Asynchronous wait on fence i915:gnome-shell[1210]:2a2dd0 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 11 14:30:48 laptop kernel: Asynchronous wait on fence i915:gnome-shell[1210]:2a2dd0 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 11 14:30:52 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:30:54 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:30:56 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:30:58 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:00 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:02 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:04 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:06 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:08 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:10 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:12 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:14 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:16 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:18 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:20 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:22 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:24 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:26 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:28 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:30 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:32 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:34 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:36 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:38 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:40 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:42 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:44 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:46 laptop kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 11 14:31:48 laptop kernel: i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.

Unfortunately, I don't have the crash dump from /sys/class/drm/card0/error as I had shutdown from the power button.


It is better to keep your mouth shut and be thought a fool than to open it and remove all doubt. (Mark Twain)

My AUR packages

Offline

#29 2019-12-11 17:15:08

bpunktm
Member
Registered: 2018-10-06
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

Hi,

My computer (a Lenovo T480 [20L5...]) is sadly also affected by this error and it is very painful.

Dez 11 17:15:05 bmxc kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Dez 11 17:15:05 bmxc kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Dez 11 17:15:05 bmxc kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Dez 11 17:15:05 bmxc kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Dez 11 17:15:05 bmxc kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Dez 11 17:15:05 bmxc kernel: GPU crash dump saved to /sys/class/drm/card0/error
Dez 11 17:15:05 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dez 11 17:15:05 bmxc kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dez 11 17:15:05 bmxc kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Dez 11 17:15:05 bmxc kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dez 11 17:15:05 bmxc kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dez 11 17:15:11 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dez 11 17:15:19 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dez 11 17:15:21 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dez 11 17:15:23 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dez 11 17:15:25 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dez 11 17:15:27 bmxc kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

Unfortunately I don't have a crash report from /sys/class/drm/card0/error either, because I also had to switch off my computer hard with the power button.

Does anyone know if the bug is being worked on and when a patch is expected?

Offline

#30 2019-12-11 17:20:02

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 9,062

Re: i915 Skylake GPU hangs with kernel 5.3.11

loqs linked the patch so it "should be fixed" if you properly apply it to the kernel (though there might be some incompats/dependencies with trying to slap it on 5.4.2), so if you rebuild the kernel with that patch applied and it doesn't fix it that might be an important datapoint to potentially reopen the bug report.

Offline

#31 2019-12-11 20:31:21

bpunktm
Member
Registered: 2018-10-06
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

What means "should be fixed"? If there are incompatibilities in the 5.4.2 kernel, on which kernel should I apply the patch?

Offline

#32 2019-12-11 20:36:33

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

The link I posted in #25 to contains the original commit I do not know which tree that was based off.
The code block I posted contains the commit after I modified it to apply cleanly to 5.4.2.
Edit:
The original commit was on top of drm-intel/for-linux-next-fixes / drm-intel/drm-intel-fixes https://github.com/freedesktop/drm-intel
https://gitlab.freedesktop.org/drm/inte … ote_359912

Last edited by loqs (2019-12-11 20:41:31)

Offline

#33 2019-12-11 21:16:03

bpunktm
Member
Registered: 2018-10-06
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

Offline

#34 2019-12-11 21:21:20

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

xf86-video-intel is userspace and separate to the i915 kernel module.
https://github.com/mkahola/drm-intel-mika Highly experimental Linux kernel tree for i915 development (based on 5.5-rc1)
Edit:

I see. The bottom line: the solution is near.

If it fixes your issue.  It did not fixe dcc24's.
This is why you were asked to test to avoid waiting for it to pass through drm-intel -> drm-tip -> mainline -> stable only then to find it does not fix your issue.

Last edited by loqs (2019-12-11 21:28:12)

Offline

#35 2019-12-11 21:40:37

bpunktm
Member
Registered: 2018-10-06
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

loqs wrote:

This is why you were asked to test to avoid waiting for it to pass through drm-intel -> drm-tip -> mainline -> stable only then to find it does not fix your issue.

How was I supposed to test that? It already fails at the first line:

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c

diff --git is an unknown option. Or did he mean git --diff??

And how do I recompile the kernel with this change?

Offline

#36 2019-12-11 21:50:16

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

https://bugs.archlinux.org/task/64725#comment184468 contains an archive with PKGBUILD with patch applied.  Ensure the base-devel group is installed on the system.

bsdtar -xf linux-5.4.2.arch1-1.src.tar.gz
cd linux
gpg --recv-keys 8218F88849AAC522E94CF470A5E9288C4FA415FA
makepkg -rsi

Edit:
change gpg --fetch-keys to gpp --recv-keys

Last edited by loqs (2019-12-11 22:32:11)

Offline

#37 2019-12-11 22:11:43

bpunktm
Member
Registered: 2018-10-06
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

loqs wrote:
gpg --fetch-keys 8218F88849AAC522E94CF470A5E9288C4FA415FA

gpg: WARNING: the URI 8218F... cannot be fetched: Syntax error in URI

Offline

#38 2019-12-11 22:20:23

wioo
Member
Registered: 2017-05-18
Posts: 24

Re: i915 Skylake GPU hangs with kernel 5.3.11

Try with "--recv-keys"

Offline

#39 2019-12-12 01:11:15

Fandekasp
Member
From: Japan
Registered: 2012-02-12
Posts: 22
Website

Re: i915 Skylake GPU hangs with kernel 5.3.11

I tried to apply the patch, but my screen froze during the makepkg xD

Offline

#40 2019-12-12 01:17:05

joanbrugueram
Member
Registered: 2018-11-12
Posts: 13

Re: i915 Skylake GPU hangs with kernel 5.3.11

I rebuilt the official 5.4.2 Arch Linux kernel using the patch provided by loqs, but unfortunately it seems to cause other problems. For reference my system is an Intel(R) Core(TM) i5-7200U / Intel Corporation HD Graphics 620 (rev 02).

When I boot the patched kernel I see those messages in dmesg:

[    5.896163] fb0: switching to inteldrmfb from EFI VGA
[    5.897995] iwlwifi 0000:02:00.0: Detected Intel(R) Dual Band Wireless AC 3168, REV=0x220
[    5.899752] Console: switching to colour dummy device 80x25
[    5.899798] i915 0000:00:02.0: vgaarb: deactivate vga console
[    5.901752] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    5.901753] [drm] Driver supports precise vblank timestamp query.
[    5.902481] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    5.903130] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
[    5.905917] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    5.908739] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input12
[    5.919366] iwlwifi 0000:02:00.0: base HW address: 94:b8:6d:34:82:f8
[    5.920193] audit: type=1130 audit(1576112387.370:17): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lm_sensors comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
--
[    5.955980] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    5.970041] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
[    5.970645] thermal thermal_zone6: failed to read out thermal zone (-61)
[    6.123040] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
[    6.226906] hp_wmi: query 0xd returned error 0x5
[    6.226975] input: HP WMI hotkeys as /devices/virtual/input/input13
[    6.253060] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged!
[    6.253063] i915 0000:00:02.0: Please file a bug at https://bugs.freedesktop.org/enter_bug.cgi?product=DRI against DRM/Intel providing the dmesg log by booting with drm.debug=0xf
[    6.267023] [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0
[    6.270001] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[    6.271757] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input14
[    6.271960] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    6.459947] fbcon: i915drmfb (fb0) is primary device
[    6.469242] Console: switching to colour frame buffer device 240x67
[    6.491479] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
[    6.568635] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC3227: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:speaker
[    6.568638] snd_hda_codec_realtek hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[    6.568640] snd_hda_codec_realtek hdaudioC0D0:    hp_outs=1 (0x21/0x0/0x0/0x0/0x0)

After this, the graphical environment still works (X server starts, etc), but performance seems to be way worse, and also video hardware acceleration doesn't work.

I'm pretty confident I patched/built the kernel correctly. Unfortunately I don't really have the time to inquire further on this, so just providing a data point.

Last edited by joanbrugueram (2019-12-12 01:18:50)

Offline

#41 2019-12-12 01:40:30

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

Thank you for testing.  It could easily be my backporting of the patch to 5.4.2 is flawed.

Offline

#42 2019-12-12 08:11:31

wioo
Member
Registered: 2017-05-18
Posts: 24

Re: i915 Skylake GPU hangs with kernel 5.3.11

I used loqs's PKGBUILD
My dmesg

[    1.527003] i915 0000:00:02.0: vgaarb: deactivate vga console
[    1.527618] i915 0000:00:02.0: Direct firmware load for i915/gvt/vid_0x8086_did_0x191b_rid_0x06.golden_hw_state failed with error -2
[    1.542694] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    1.543077] [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
[    1.762339] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
[    1.825795] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged!
[    1.825807] i915 0000:00:02.0: Please file a bug at https://bugs.freedesktop.org/enter_bug.cgi?product=DRI against DRM/Intel providing the dmesg log by booting with drm.debug=0xf
[    1.870978] [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0
[    1.883139] fbcon: i915drmfb (fb0) is primary device
[    1.918468] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
[    1.953269] i915 0000:00:02.0: MDEV: Registered

Offline

#43 2019-12-12 08:26:21

joanbrugueram
Member
Registered: 2018-11-12
Posts: 13

Re: i915 Skylake GPU hangs with kernel 5.3.11

loqs wrote:

Thank you for testing.  It could easily be my backporting of the patch to 5.4.2 is flawed.

After taking a look at the backport, the thing that looks more suspicious is that in one of the deleted lines of the original patch there's the expression `ce->lrc_reg_state[CTX_RING_TAIL]`, yet on the 5.4.2 Kernel, the equivalent usage is `ce->lrc_reg_state[CTX_RING_TAIL + 1]`. But later when code is added again in the backported patch, `ce->lrc_reg_state[CTX_RING_TAIL]` instead of `ce->lrc_reg_state[CTX_RING_TAIL + 1]` is used.

Of course it's completely possible it doesn't work anyway, but it's worth a try.

Offline

#44 2019-12-12 10:41:53

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

Well spotted.  If you use `makepkg -ersi` you should only need to rebuild the i915 module.

Offline

#45 2019-12-12 11:04:10

wioo
Member
Registered: 2017-05-18
Posts: 24

Re: i915 Skylake GPU hangs with kernel 5.3.11

nvm

Last edited by wioo (2019-12-12 11:05:45)

Offline

#46 2019-12-12 12:20:00

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

Offline

#47 2019-12-12 17:46:53

joanbrugueram
Member
Registered: 2018-11-12
Posts: 13

Re: i915 Skylake GPU hangs with kernel 5.3.11

I just built the kernel with `ce->lrc_reg_state[CTX_RING_TAIL + 1]` in the patch and so far it seems to be working correctly. I'm not sure if heftig's comment was referring to just that or to further changes, I will report if I experience any error.

Offline

#48 2019-12-13 01:30:11

FlashDaggerX
Member
From: Plainville, CT
Registered: 2019-11-25
Posts: 2

Re: i915 Skylake GPU hangs with kernel 5.3.11

The issue is still present on 5.4.2. The system actually isn't frozen, just X (Audio will still play with the display frozen). The issue can be worked around
by forcing the system to suspend, either by shutting the laptop (if you have one), or by pressing Alt+PrintScreen

System is a Dell Inspiron 7580

Output from the kernel ring buffer:

[20488.603461] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20490.523378] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20492.656763] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20494.580150] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20496.710098] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20498.630146] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20500.550098] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20502.683461] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20504.603446] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20506.523403] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20508.656767] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20510.576771] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20512.710124] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20514.630142] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20516.550116] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20518.683449] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
[20518.683704] i915 0000:00:02.0: Resetting chip for hang on bcs0
[20518.685584] [drm] GuC communication enabled
[20518.685647] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled
[20518.685654] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.0 authenticated:yes
[20520.603450] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
[20520.603637] i915 0000:00:02.0: Resetting chip for hang on bcs0
[20520.605460] [drm] GuC communication enabled
[20520.605515] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled
[20520.605520] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.0 authenticated:yes
[20528.713425] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20536.603447] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20538.523496] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0
[20540.660120] i915 0000:00:02.0: Resetting bcs0 for hang on bcs0

Last edited by FlashDaggerX (2019-12-13 01:30:52)

Offline

#49 2019-12-14 00:52:21

loqs
Member
Registered: 2014-03-06
Posts: 10,083

Re: i915 Skylake GPU hangs with kernel 5.3.11

Offline

#50 2019-12-14 22:29:38

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 225

Re: i915 Skylake GPU hangs with kernel 5.3.11

So this has started happening to me too, seems to have started with 5.4.2?, and is also present in 5.4.3.

Here's my error log.

Interestingly in looking at this I used to have GuC/HuC loading enabled by passing enable_guc=-1 to i915. But checking journalctl it appears this stopped actually loading anything after about 12/2:

$ journalctl -b | grep uC 
...
Dec 02 15:25:23 lap kernel: [drm] HuC: Loaded firmware i915/skl_huc_ver01_07_1398.bin (version 1.7)
Dec 02 15:25:23 lap kernel: [drm] GuC: Loaded firmware i915/skl_guc_32.0.3.bin (version 32.0)
Dec 02 15:25:23 lap kernel: i915 0000:00:02.0: GuC firmware version 32.0
Dec 02 15:25:23 lap kernel: i915 0000:00:02.0: GuC submission disabled
Dec 02 15:25:23 lap kernel: i915 0000:00:02.0: HuC enabled

Here's what I did on 12/2, though note I was not installing xf86-video-intel for the first time, I'd just tested running with modesetting for a few days before.

$ grep 2019-12-02 /var/log/pacman.log | egrep "upgraded|installed"
[2019-12-02T07:29:51-0500] [ALPM] upgraded qt5-webengine (5.13.2-3 -> 5.13.2-4)
[2019-12-02T08:01:08-0500] [ALPM] installed libxvmc (1.0.12-2)
[2019-12-02T08:01:09-0500] [ALPM] installed xf86-video-intel (1:2.99.917+897+g0867eea6-1)
[2019-12-02T16:23:56-0500] [ALPM] upgraded gnupg (2.2.18-1 -> 2.2.18-2)
[2019-12-02T16:24:03-0500] [ALPM] upgraded papirus-icon-theme (20191101-1 -> 20191201-1)
[2019-12-02T17:59:43-0500] [ALPM] upgraded linux (5.3.13.1-1 -> 5.4.1.arch1-1)
[2019-12-02T17:59:48-0500] [ALPM] upgraded linux-headers (5.3.13.1-1 -> 5.4.1.arch1-1)

Not sure if the GuC/HuC thing is related or just a coincidence.

Edit: -- Also, in my case, all my hangs have been while using Chromium which is one of the few applications using XWayland (though I'm using FF now also under XWayland). Also not sure if that's relevant or not.
Edit 2: -- I see @loqs already has listed the available options at this point, looks I'll have to wait for the backport into 5.4.

Last edited by CarbonChauvinist (2019-12-14 22:47:07)


"the wind-blown way, wanna win? don't play"

Offline

Board footer

Powered by FluxBB