You are not logged in.

#151 2020-02-14 17:01:30

loqs
Member
Registered: 2014-03-06
Posts: 9,823

Re: i915 Skylake GPU hangs with kernel 5.3.11

@damige https://gitlab.freedesktop.org/drm/intel/issues/1196 which was marked as a duplicate of 1201 see post #142 in this thread.

Offline

#152 2020-02-14 22:05:33

kihra1
Member
Registered: 2019-12-31
Posts: 4

Re: i915 Skylake GPU hangs with kernel 5.3.11

kihra1 wrote:
kihra1 wrote:
loqs wrote:

Now that linux 5.5.1 is out of testing is anyone able to reproduce the issue?

I just updated earlier today and have been running smooth since then (i9-9900 /  Intel UHD Graphics 630 using modesetting). See post #82  for ref on what I was seeing.

Spoke too soon. Although the issue feels the same (frozen screen, only reboot will cure), I can't verify as there was no info in the systemd log like before. Moving back go 11/17 again.

I've been running 5.5.2 kernel for a week now without any issue. I think it's very possible the hang I had on 5.5.1 (see quoted text) was not the same. Either way, 5.5.2 looks pretty good so far running with XFCE and no mods (besides using modesetting instead of intel drivers).

Offline

#153 2020-02-18 09:38:49

pftbest
Member
Registered: 2017-06-22
Posts: 3

Re: i915 Skylake GPU hangs with kernel 5.3.11

Got one today on 5.5.3-arch1-1

general protection fault: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 667 Comm: Xorg Tainted: G           OE     5.5.3-arch1-1 #1
Hardware name: Gigabyte Technology Co., Ltd. Z97-D3H/Z97-D3H-CF, BIOS F9 09/18/2015
RIP: 0010:kmem_cache_alloc+0x7d/0x210
Code: 75 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 75 01 00 00 41 8b 5e 20 49 8b 3e 48 8d 8a 00 02 00 00 4c 89 e8 4c 01 eb <48> 33 1b 49 33 9e 70 01 00 00 65 48 0f c7 0f>
RSP: 0018:ffffb955405ffa30 EFLAGS: 00010286
RAX: eae9cb4abb983b06 RBX: eae9cb4abb983b06 RCX: 000000000dfa6a03
RDX: 000000000dfa6803 RSI: 000000000dfa6803 RDI: 0000000000033490
RBP: 0000000000000cc0 R08: 0000000000000000 R09: ffff9d62c6de80a0
R10: 0000000000000000 R11: 0000000000000002 R12: ffffffffc0b7fa15
R13: eae9cb4abb983b06 R14: ffff9d62fc98d340 R15: ffff9d62fc98d340
FS:  00007f9e82298dc0(0000) GS:ffff9d62ffd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9e7c8a5020 CR3: 00000007f32f8001 CR4: 00000000001606e0
Call Trace:
 i915_active_ref+0x65/0x180 [i915]
 i915_vma_move_to_active+0x22/0x150 [i915]
 i915_gem_do_execbuffer+0xd7b/0x18a0 [i915]
 i915_gem_execbuffer2_ioctl+0x1df/0x3d0 [i915]
 ? _raw_spin_lock_irqsave+0x26/0x50
 ? i915_gem_execbuffer_ioctl+0x2f0/0x2f0 [i915]
 drm_ioctl_kernel+0xb2/0x100 [drm]
 drm_ioctl+0x209/0x360 [drm]
 ? i915_gem_execbuffer_ioctl+0x2f0/0x2f0 [i915]
 do_vfs_ioctl+0x4b7/0x730
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4e/0x150
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f9e830de2eb
Code: 0f 1e fa 48 8b 05 a5 8b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 8b 0c>
RSP: 002b:00007ffe687eeda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe687eedf0 RCX: 00007f9e830de2eb
RDX: 00007ffe687eedf0 RSI: 0000000040406469 RDI: 000000000000000e
RBP: 0000000040406469 R08: 00005590517b2330 R09: 0000000000100000
R10: 0000000000000000 R11: 0000000000000246 R12: 000055905176fd90
R13: 000000000000000e R14: ffffffffffffffff R15: 00007f9e81bfc528
Modules linked in: rfcomm nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay cmac algif_hash algif_skcipher af_alg bnep ipt_REJECT>
 snd_pcm_dmaengine r8169 snd_hwdep mei_me realtek lpc_ich sysfillrect e1000e sysimgblt libphy fb_sys_fops mei snd_pcm snd_timer evdev snd soundcore mac_hid vboxnetflt(OE) vboxnet>
---[ end trace a2db1602b59cc33b ]---
RIP: 0010:kmem_cache_alloc+0x7d/0x210
Code: 75 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 75 01 00 00 41 8b 5e 20 49 8b 3e 48 8d 8a 00 02 00 00 4c 89 e8 4c 01 eb <48> 33 1b 49 33 9e 70 01 00 00 65 48 0f c7 0f>
RSP: 0018:ffffb955405ffa30 EFLAGS: 00010286
RAX: eae9cb4abb983b06 RBX: eae9cb4abb983b06 RCX: 000000000dfa6a03
RDX: 000000000dfa6803 RSI: 000000000dfa6803 RDI: 0000000000033490
RBP: 0000000000000cc0 R08: 0000000000000000 R09: ffff9d62c6de80a0
R10: 0000000000000000 R11: 0000000000000002 R12: ffffffffc0b7fa15
R13: eae9cb4abb983b06 R14: ffff9d62fc98d340 R15: ffff9d62fc98d340
FS:  00007f9e82298dc0(0000) GS:ffff9d62ffd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9e7c8a5020 CR3: 00000007f32f8001 CR4: 00000000001606e0

Offline

#154 2020-02-18 13:22:36

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 8,712

Re: i915 Skylake GPU hangs with kernel 5.3.11

The current kernel is 5.5.4 which has a load of i915 patches, any reports without using that are moot.

Offline

#155 2020-02-18 20:33:45

clydedroid
Member
Registered: 2020-01-22
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

Just had it happen on 5.5.3 right after I updated to 5.5.4, haven't had another since reboot, but will report here if I see it happen again

Offline

#156 2020-02-22 08:55:15

samurai
Member
From: Turkey
Registered: 2010-04-03
Posts: 27

Re: i915 Skylake GPU hangs with kernel 5.3.11

What is the latest status? I'm on still 4.19. Is it safe to update now?

Offline

#157 2020-02-23 16:59:50

Panda_Foss
Member
Registered: 2019-05-06
Posts: 1

Re: i915 Skylake GPU hangs with kernel 5.3.11

samurai wrote:

What is the latest status? I'm on still 4.19. Is it safe to update now?

I updated from version 4.19.x to 5.5.5 recently and, so far, I had no problems.

Offline

#158 2020-02-24 02:08:17

jghodd
Member
Registered: 2013-02-10
Posts: 66

Re: i915 Skylake GPU hangs with kernel 5.3.11

I'm still seeing it with 5.5.5.

My understanding is that the fixes were merged into 5.6-rc1. Any chance we'll see a backport of these patches?

https://gitlab.freedesktop.org/drm/intel/issues/1201

Last edited by jghodd (2020-02-24 02:43:07)

Offline

#159 2020-02-24 02:48:06

loqs
Member
Registered: 2014-03-06
Posts: 9,823

Re: i915 Skylake GPU hangs with kernel 5.3.11

jghodd wrote:

I'm still seeing it with 5.5.5.

My understanding is that the fixes were merged into 5.6-rc1. Any chance we'll see a backport of these patches?

https://gitlab.freedesktop.org/drm/intel/issues/1201

They are included in 5.5.5-arch1 see https://git.archlinux.org/linux.git/log/?h=v5.5.5-arch1

What is the dmesg from 5.5.5-arch1-1 ?

Last edited by loqs (2020-02-24 02:49:24)

Offline

#160 2020-02-24 03:20:27

jghodd
Member
Registered: 2013-02-10
Posts: 66

Re: i915 Skylake GPU hangs with kernel 5.3.11

Unfortunately, dmesg doesn;t work very well when your entire system is frozen. This is from kernel.log.

Feb 23 18:55:52 bslxhp64 kernel: GpuWatchdog[3044]: segfault at 0 ip 000055e460245eec sp 00007f473f4cc4b0 error 6 in chromium[55e45c81a000+6f5b000]
Feb 23 18:55:52 bslxhp64 kernel: Code: ed e8 88 96 f7 fe eb e6 41 8b 84 24 08 01 00 00 85 c0 74 7d 48 8d 3d cd a4 a9 fb be 01 00 00 00 ba 03 00 00 00 e8 14 47 ee fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 1a 2d
cb 03 01 eb 5b 49 8b
Feb 23 18:55:52 bslxhp64 kernel: audit: type=1701 audit(1582502152.764:232): auid=1001 uid=1001 gid=1001 ses=2 pid=3026 comm="GpuWatchdog" exe="/usr/lib/chromium/chromium" sig=11 res=1
Feb 23 18:55:55 bslxhp64 kernel: audit: type=1130 audit(1582502155.640:233): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-28755-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname
=? addr=? terminal=? res=success'
Feb 23 18:58:15 bslxhp64 kernel: GpuWatchdog[28869]: segfault at 0 ip 00005562f8b5eeec sp 00007f40477114b0 error 6 in chromium[5562f5133000+6f5b000]
Feb 23 18:58:15 bslxhp64 kernel: Code: ed e8 88 96 f7 fe eb e6 41 8b 84 24 08 01 00 00 85 c0 74 7d 48 8d 3d cd a4 a9 fb be 01 00 00 00 ba 03 00 00 00 e8 14 47 ee fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 1a 2d
cb 03 01 eb 5b 49 8b
Feb 23 18:58:15 bslxhp64 kernel: audit: type=1701 audit(1582502295.231:234): auid=1001 uid=1001 gid=1001 ses=2 pid=28833 comm="GpuWatchdog" exe="/usr/lib/chromium/chromium" sig=11 res=1
Feb 23 18:58:19 bslxhp64 kernel: audit: type=1130 audit(1582502299.314:235): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-29175-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname
=? addr=? terminal=? res=success'
Feb 23 19:01:34 bslxhp64 kernel: INFO: task (coredump):29177 blocked for more than 122 seconds.
Feb 23 19:01:34 bslxhp64 kernel:       Tainted: G           OE     5.5.5-arch1-1 #1
Feb 23 19:01:34 bslxhp64 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 23 19:01:34 bslxhp64 kernel: (coredump)      D    0 29177      1 0x00004080

Last edited by jghodd (2020-02-24 03:21:49)

Offline

#161 2020-02-24 03:45:54

loqs
Member
Registered: 2014-03-06
Posts: 9,823

Re: i915 Skylake GPU hangs with kernel 5.3.11

Can you provide all the kernel messages for that boot if it was the last boot you could use

journalctl -o cat  -kb -1

otherwise change -1 to the correct offset

Offline

#162 2020-02-24 04:20:30

jghodd
Member
Registered: 2013-02-10
Posts: 66

Re: i915 Skylake GPU hangs with kernel 5.3.11

Feb 23 18:55:52 bslxhp64 kernel: GpuWatchdog[3044]: segfault at 0 ip 000055e460245eec sp 00007f473f4cc4b0 error 6 in chromium[55e45c81a000+6f5b000]
Feb 23 18:55:52 bslxhp64 kernel: Code: ed e8 88 96 f7 fe eb e6 41 8b 84 24 08 01 00 00 85 c0 74 7d 48 8d 3d cd a4 a9 fb be 01 00 00 00 ba 03 00 00 00 e8 14 47 ee fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 1a 2d cb 03 01 eb 5b 49 8b
Feb 23 18:55:52 bslxhp64 kernel: audit: type=1701 audit(1582502152.764:232): auid=1001 uid=1001 gid=1001 ses=2 pid=3026 comm="GpuWatchdog" exe="/usr/lib/chromium/chromium" sig=11 res=1
Feb 23 18:55:55 bslxhp64 kernel: audit: type=1130 audit(1582502155.640:233): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-28755-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 23 18:58:15 bslxhp64 kernel: GpuWatchdog[28869]: segfault at 0 ip 00005562f8b5eeec sp 00007f40477114b0 error 6 in chromium[5562f5133000+6f5b000]
Feb 23 18:58:15 bslxhp64 kernel: Code: ed e8 88 96 f7 fe eb e6 41 8b 84 24 08 01 00 00 85 c0 74 7d 48 8d 3d cd a4 a9 fb be 01 00 00 00 ba 03 00 00 00 e8 14 47 ee fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 1a 2d cb 03 01 eb 5b 49 8b
Feb 23 18:58:15 bslxhp64 kernel: audit: type=1701 audit(1582502295.231:234): auid=1001 uid=1001 gid=1001 ses=2 pid=28833 comm="GpuWatchdog" exe="/usr/lib/chromium/chromium" sig=11 res=1
Feb 23 18:58:19 bslxhp64 kernel: audit: type=1130 audit(1582502299.314:235): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-29175-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 23 19:01:34 bslxhp64 kernel: INFO: task (coredump):29177 blocked for more than 122 seconds.
Feb 23 19:01:34 bslxhp64 kernel:       Tainted: G           OE     5.5.5-arch1-1 #1
Feb 23 19:01:34 bslxhp64 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 23 19:01:34 bslxhp64 kernel: (coredump)      D    0 29177      1 0x00004080
Feb 23 19:01:34 bslxhp64 kernel: Call Trace:
Feb 23 19:01:34 bslxhp64 kernel:  ? __schedule+0x2e8/0x7a0
Feb 23 19:01:34 bslxhp64 kernel:  schedule+0x46/0xf0
Feb 23 19:01:34 bslxhp64 kernel:  rwsem_down_write_slowpath+0x2a2/0x530
Feb 23 19:01:34 bslxhp64 kernel:  ? kmem_cache_alloc_trace+0x17b/0x220
Feb 23 19:01:34 bslxhp64 kernel:  register_shrinker_prepared+0x15/0x70
Feb 23 19:01:34 bslxhp64 kernel:  register_shrinker+0x1f/0x30
Feb 23 19:01:34 bslxhp64 kernel:  nfsd_reply_cache_init+0x93/0x170 [nfsd]
Feb 23 19:01:34 bslxhp64 kernel:  nfsd_init_net+0x6e/0x150 [nfsd]
Feb 23 19:01:34 bslxhp64 kernel:  ops_init+0x3a/0x100
Feb 23 19:01:34 bslxhp64 kernel:  setup_net+0xd3/0x210
Feb 23 19:01:34 bslxhp64 kernel:  copy_net_ns+0xf5/0x220
Feb 23 19:01:34 bslxhp64 kernel:  create_new_namespaces+0x113/0x200
Feb 23 19:01:34 bslxhp64 kernel:  unshare_nsproxy_namespaces+0x55/0xa0
Feb 23 19:01:34 bslxhp64 kernel:  ksys_unshare+0x1df/0x3a0
Feb 23 19:01:34 bslxhp64 kernel:  __x64_sys_unshare+0xe/0x20
Feb 23 19:01:34 bslxhp64 kernel:  do_syscall_64+0x4e/0x150
Feb 23 19:01:34 bslxhp64 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 23 19:01:34 bslxhp64 kernel: RIP: 0033:0x7f931016509b
Feb 23 19:01:34 bslxhp64 kernel: Code: Bad RIP value.
Feb 23 19:01:34 bslxhp64 kernel: RSP: 002b:00007ffe249945c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
Feb 23 19:01:34 bslxhp64 kernel: RAX: ffffffffffffffda RBX: 000055f34f26d738 RCX: 00007f931016509b
Feb 23 19:01:34 bslxhp64 kernel: RDX: 0000000000000000 RSI: 00007ffe24994530 RDI: 0000000040000000
Feb 23 19:01:34 bslxhp64 kernel: RBP: 00000000fffffff5 R08: 0000000000000000 R09: 000055f34f201bf0
Feb 23 19:01:34 bslxhp64 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Feb 23 19:01:34 bslxhp64 kernel: R13: 0000000000000000 R14: 000055f34f072210 R15: 00007ffe24994a00
Feb 23 19:03:37 bslxhp64 kernel: INFO: task systemd-coredum:28798 blocked for more than 122 seconds.
Feb 23 19:03:37 bslxhp64 kernel:       Tainted: G           OE     5.5.5-arch1-1 #1
Feb 23 19:03:37 bslxhp64 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 23 19:03:37 bslxhp64 kernel: systemd-coredum D    0 28798      1 0x80004186
Feb 23 19:03:37 bslxhp64 kernel: Call Trace:
Feb 23 19:03:37 bslxhp64 kernel:  ? __schedule+0x2e8/0x7a0
Feb 23 19:03:37 bslxhp64 kernel:  schedule+0x46/0xf0
Feb 23 19:03:37 bslxhp64 kernel:  rwsem_down_write_slowpath+0x2a2/0x530
Feb 23 19:03:37 bslxhp64 kernel:  unregister_memcg_shrinker.isra.0+0x18/0x40
Feb 23 19:03:37 bslxhp64 kernel:  unregister_shrinker+0x6e/0x80
Feb 23 19:03:37 bslxhp64 kernel:  deactivate_locked_super+0x29/0x70
Feb 23 19:03:37 bslxhp64 kernel:  cleanup_mnt+0x104/0x160
Feb 23 19:03:37 bslxhp64 kernel:  task_work_run+0x93/0xb0
Feb 23 19:03:37 bslxhp64 kernel:  do_exit+0x36b/0xb30
Feb 23 19:03:37 bslxhp64 kernel:  do_group_exit+0x3a/0xa0
Feb 23 19:03:37 bslxhp64 kernel:  get_signal+0x132/0x8c0
Feb 23 19:03:37 bslxhp64 kernel:  do_signal+0x43/0x680
Feb 23 19:03:37 bslxhp64 kernel:  exit_to_usermode_loop+0x7f/0x100
Feb 23 19:03:37 bslxhp64 kernel:  do_syscall_64+0x11f/0x150
Feb 23 19:03:37 bslxhp64 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 23 19:03:37 bslxhp64 kernel: RIP: 0033:0x7fb680916567
Feb 23 19:03:37 bslxhp64 kernel: Code: Bad RIP value.
Feb 23 19:03:37 bslxhp64 kernel: RSP: 002b:00007ffed4c6ad18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Feb 23 19:03:37 bslxhp64 kernel: RAX: 00000000000019c8 RBX: 000000000003d88a RCX: 00007fb680916567
Feb 23 19:03:37 bslxhp64 kernel: RDX: 000000000003d88a RSI: 00007fb67fa5e010 RDI: 0000000000000007
Feb 23 19:03:37 bslxhp64 kernel: RBP: 00007fb67fa5e010 R08: 0000000000080000 R09: 8902020289000000
Feb 23 19:03:37 bslxhp64 kernel: R10: 00007fb680a42010 R11: 0000000000000246 R12: 0000000000000007
Feb 23 19:03:37 bslxhp64 kernel: R13: 0000000000000000 R14: 000000003de80000 R15: 000000000003d88a
Feb 23 19:03:37 bslxhp64 kernel: INFO: task (coredump):29177 blocked for more than 245 seconds.
Feb 23 19:03:37 bslxhp64 kernel:       Tainted: G           OE     5.5.5-arch1-1 #1
Feb 23 19:03:37 bslxhp64 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 23 19:03:37 bslxhp64 kernel: (coredump)      D    0 29177      1 0x00004084
Feb 23 19:03:37 bslxhp64 kernel: Call Trace:
Feb 23 19:03:37 bslxhp64 kernel:  ? __schedule+0x2e8/0x7a0
Feb 23 19:03:37 bslxhp64 kernel:  schedule+0x46/0xf0
Feb 23 19:03:37 bslxhp64 kernel:  rwsem_down_write_slowpath+0x2a2/0x530
Feb 23 19:03:37 bslxhp64 kernel:  ? kmem_cache_alloc_trace+0x17b/0x220
Feb 23 19:03:37 bslxhp64 kernel:  register_shrinker_prepared+0x15/0x70
Feb 23 19:03:37 bslxhp64 kernel:  register_shrinker+0x1f/0x30
Feb 23 19:03:37 bslxhp64 kernel:  nfsd_reply_cache_init+0x93/0x170 [nfsd]
Feb 23 19:03:37 bslxhp64 kernel:  nfsd_init_net+0x6e/0x150 [nfsd]
Feb 23 19:03:37 bslxhp64 kernel:  ops_init+0x3a/0x100
Feb 23 19:03:37 bslxhp64 kernel:  setup_net+0xd3/0x210
Feb 23 19:03:37 bslxhp64 kernel:  copy_net_ns+0xf5/0x220
Feb 23 19:03:37 bslxhp64 kernel:  create_new_namespaces+0x113/0x200
Feb 23 19:03:37 bslxhp64 kernel:  unshare_nsproxy_namespaces+0x55/0xa0
Feb 23 19:03:37 bslxhp64 kernel:  ksys_unshare+0x1df/0x3a0
Feb 23 19:03:37 bslxhp64 kernel:  __x64_sys_unshare+0xe/0x20
Feb 23 19:03:37 bslxhp64 kernel:  do_syscall_64+0x4e/0x150
Feb 23 19:03:37 bslxhp64 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 23 19:03:37 bslxhp64 kernel: RIP: 0033:0x7f931016509b
Feb 23 19:03:37 bslxhp64 kernel: Code: Bad RIP value.
Feb 23 19:03:37 bslxhp64 kernel: RSP: 002b:00007ffe249945c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
Feb 23 19:03:37 bslxhp64 kernel: RAX: ffffffffffffffda RBX: 000055f34f26d738 RCX: 00007f931016509b
Feb 23 19:03:37 bslxhp64 kernel: RDX: 0000000000000000 RSI: 00007ffe24994530 RDI: 0000000040000000
Feb 23 19:03:37 bslxhp64 kernel: RBP: 00000000fffffff5 R08: 0000000000000000 R09: 000055f34f201bf0
Feb 23 19:03:37 bslxhp64 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Feb 23 19:03:37 bslxhp64 kernel: R13: 0000000000000000 R14: 000055f34f072210 R15: 00007ffe24994a00

Offline

#163 2020-02-24 06:59:08

Kinslayer11
Member
Registered: 2011-07-17
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

The segfaults error 6 in chrome (and other browsers) is from the recent glibc upgrade (2.30-3 -> 2.31-1).  It did seem to result in system instability and crashes for me too, often (but not always) still relating to i915 in the logs.  Upgrade chromium to the latest version and see how you go (or reboot on old working kernel and you'll see you suddenly have issues there now too) - I've been fine since.  Some relevant links:

* Arch BBS:  [SOLVED] Was "Kernel 5.5.2 breaks DRM", should be glibc 2.31 breaks it
* CrBugs Issue 1025739: Chromium needs to allow sys_clock_nanosleep to work with the latest glibc code on Linux (note the final comment about it affecting glibc 2.31)
* RedHat Bug 1778559 - [abrt] firefox: __open64_nocancel(): firefox killed by SIGSYS

I will note that this was more obvious, as, I'd have a number of crashes in chrome tabs (especially gmail) with the system still working for a while before the eventual big crash and freeze.  Will also note I'm using Google Chrome but assuming the fixes land to chromium at the same time.

Offline

#164 2020-02-24 18:38:26

jghodd
Member
Registered: 2013-02-10
Posts: 66

Re: i915 Skylake GPU hangs with kernel 5.3.11

I am currently running the latest version of chromium (80.0.3987.116-1) and am using glibc-2.31. Guess I'll wait for the next chromium or glibc update then, although these crashes are seriously compromising my daily use of arch linux, which is my only OS environment, not to mention the anxiety and irritation that goes along with it. Because I was also experiencing the 5.5.4 i915 crashes, I can say that this one feels identical to the others. The system freezes completely and requires a cold reset + live system boot + fsck's + reboot to come back up. I did note that log's stack trace references nfsd instead of gem_* function calls, but the same GpuWatchdog error is occurring.

Offline

#165 Yesterday 01:44:28

Fandekasp
Member
From: Japan
Registered: 2012-02-12
Posts: 22
Website

Re: i915 Skylake GPU hangs with kernel 5.3.11

Just got a screen freeze for the first time in a while (edit: 1~2 months). Using Linux 5.5.5-arch1-1 with sway.

Strangely, journalctl doesn't report any error from last boot, no segfault error:

hub 2-3:1.0: 1 port detected
audit: type=1100 audit(1582589607.180:115): pid=16561 uid=1000 auid=1000 ses=2 msg='op=PAM:unix_chkpwd acct="dori" exe="/usr/bin/unix_chkpwd" hostname=? addr=? terminal=? res=success'
audit: type=1100 audit(1582591031.600:116): pid=83578 uid=1000 auid=1000 ses=2 msg='op=PAM:unix_chkpwd acct="dori" exe="/usr/bin/unix_chkpwd" hostname=? addr=? terminal=? res=success'
audit: type=1100 audit(1582591746.307:117): pid=116253 uid=1000 auid=1000 ses=2 msg='op=PAM:unix_chkpwd acct="dori" exe="/usr/bin/unix_chkpwd" hostname=? addr=? terminal=? res=success'
audit: type=1100 audit(1582592788.825:118): pid=166627 uid=1000 auid=1000 ses=2 msg='op=PAM:unix_chkpwd acct="dori" exe="/usr/bin/unix_chkpwd" hostname=? addr=? terminal=? res=success'
audit: type=1100 audit(1582593214.473:119): pid=185784 uid=1000 auid=1000 ses=2 msg='op=PAM:unix_chkpwd acct="dori" exe="/usr/bin/unix_chkpwd" hostname=? addr=? terminal=? res=success'
perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79500

The only thing I changed yesterday was re-adding the initrd=/boot/intel-ucode.img parameter to my refind_linux.conf boot loader entry. Removing it again.

Last edited by Fandekasp (Yesterday 06:47:27)

Offline

#166 Yesterday 05:52:27

Kinslayer11
Member
Registered: 2011-07-17
Posts: 5

Re: i915 Skylake GPU hangs with kernel 5.3.11

jghodd wrote:

I am currently running the latest version of chromium (80.0.3987.116-1) and am using glibc-2.31. Guess I'll wait for the next chromium or glibc update then, although these crashes are seriously compromising my daily use of arch linux, which is my only OS environment, not to mention the anxiety and irritation that goes along with it. Because I was also experiencing the 5.5.4 i915 crashes, I can say that this one feels identical to the others. The system freezes completely and requires a cold reset + live system boot + fsck's + reboot to come back up. I did note that log's stack trace references nfsd instead of gem_* function calls, but the same GpuWatchdog error is occurring.

Trust me, I feel your pain, this is my only system too.  Have you had any luck with known stable configurations?  My longest run so far was with

linux-mainline 5.5-1

with the

intel_idle.max_cstate=1 i915.enable_dc=0

options.  15 days straight but then crashed after my glibc upgrade and then crashed more frequently than ever until I upgraded chrome (same as your current version though).  You could also try downgrading glibc back to 2.30-3 and see if it makes any difference.

Fandekasp wrote:

Just got a screen freeze for the first time in a while. Using Linux 5.5.5-arch1-1 with sway.

How long is "in a while" ?  smile

Offline

#167 Yesterday 06:18:59

cornetto
Member
Registered: Yesterday
Posts: 1

Re: i915 Skylake GPU hangs with kernel 5.3.11

You guys must be masochists. I tried 5.4 and 5.5 for a few days, and went back to 5.3.13. Works fine for me, no freezes, everything runs smoothly.

I do get occasional USB-C hard lock-ups due to my use of an external monitor, but they are rare (15 to 30 days in between) and they have been happening for years, so unrelated to this.

Bottom line, i915 is a shit show and the kernel developers, while talented, are clueless when it comes to regression testing. A lot more money needs to be spent on this, unfortunately the big companies are not interested in it.

Offline

Board footer

Powered by FluxBB