You are not logged in.

#1 2017-12-07 01:51:02

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,068

How can I preserve the dump from a GPU hang through reboot?

How can I preserve the dump produced when the GPU hangs so that I can post it with an upstream bug report?

Sometimes, when my laptop is left alone, it appears unresponsive when I return. Key presses etc. are registered, but the display remains blank. The only remedy I've found is to hard reset the machine. As far as I can tell, this happens when KDE's power manager attempts to suspend the machine and a GPU hang is triggered. At least, it just happened and I found this in the journal:

Rha 07 00:32:10 MyComputer org_kde_powerdevil[1112]: powerdevil: Suspend session triggered with QMap(("GraceFade", QVariant(bool, true))("Type", QVariant(uint, 0)))
Rha 07 00:32:17 MyComputer kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, reason: Hang on rcs0, action: reset
Rha 07 00:32:17 MyComputer kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Rha 07 00:32:17 MyComputer kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Rha 07 00:32:17 MyComputer kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Rha 07 00:32:17 MyComputer kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Rha 07 00:32:17 MyComputer kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Rha 07 00:32:17 MyComputer kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
Rha 07 00:32:29 MyComputer kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
Rha 07 00:32:41 MyComputer kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang

However, /sys/class/drm/card0/error is empty, presumably because I can view it only after rebooting.

Because this issue cannot be reliably reproduced, I cannot ssh in or hook up an external monitor. To do these things, I'd need to be in a particular room in a particular building on campus. If I could reliably reproduce, I could go there, configure sshd on the laptop and then trigger the bug. But the bug is triggered only occasionally - most of the time, I don't see it. So, short of moving in for a month or so, this kind of diagnostics is not a realistic option.

Is there some way of retrieving the file after rebooting? Or of configuring things so the dump is saved to disk? As I understand it from https://01.org/linuxgraphics/documentat … eport-bugs, the file does not actually have any content, but is generated only on being read. So even a script which automatically copied that file to disk on file change wouldn't, as I understand things, actually do the job.

Note that this question is related to https://bbs.archlinux.org/viewtopic.php?id=231954, but I did not know that a GPU hang was involved when I described the problem there (as the journal didn't contain this information then) and this seemed to me a different question (how do I collect specific diagnostic information?) and better asked separately. Apologies in advance if this is an incorrect judgement - I wasn't sure whether this should be a separate thread or not.


How To Ask Questions The Smart Way | Help Vampires

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Corporation Wireless 8265 / 8275 | US keyboard with Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#2 2017-12-07 02:18:20

Trilby
Forum Fellow
From: Massachusetts, USA
Registered: 2011-11-29
Posts: 17,573
Website

Re: How can I preserve the dump from a GPU hang through reboot?

cfr wrote:

the file does not actually have any content, but is generated only on being read. So even a script which automatically copied that file to disk on file change wouldn't, as I understand things, actually do the job.

There are probably better approaches, but the above isn't really a limitation.

`cp` itself may not work, but any reading of the file will generate the output - it need not be human reading.  You could collect the output of `cat /sys/class/drm/card0/error >> /path/to/log` in a loop - or as a systemd timer to gather the results.


Resist the GNU world order.

Offline

#3 2017-12-07 02:51:23

loqs
Member
Registered: 2014-03-06
Posts: 3,099

Re: How can I preserve the dump from a GPU hang through reboot?

Perhaps something like the following

dmesg -w | awk '/GPU crash dump saved to \/sys\/class\/drm\/card0\/error/ {system("cat /sys/class/drm/card0/error | bzip2 > error.bz2")}'

Edit:
removed quotes from string match that cause it to fail

Last edited by loqs (2017-12-07 03:22:22)

Offline

#4 2017-12-07 03:09:15

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,068

Re: How can I preserve the dump from a GPU hang through reboot?

Trilby wrote:
cfr wrote:

the file does not actually have any content, but is generated only on being read. So even a script which automatically copied that file to disk on file change wouldn't, as I understand things, actually do the job.

There are probably better approaches, but the above isn't really a limitation.

`cp` itself may not work, but any reading of the file will generate the output - it need not be human reading.  You could collect the output of `cat /sys/class/drm/card0/error >> /path/to/log` in a loop - or as a systemd timer to gather the results.

I guess my thought was that I could have something watch the file and trigger the copy, if the file itself changed to include the dump. Whereas otherwise I have to keep triggering the script, which I though might be problematic given how infrequently I'm triggering the bug.

What would a better approach be? I only asked about this because I couldn't think of any alternatives. If I've posed an XY problem, I'm happy to learn about Y (or X - can't remember which way round they go).


How To Ask Questions The Smart Way | Help Vampires

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Corporation Wireless 8265 / 8275 | US keyboard with Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#5 2017-12-07 11:21:54

seth
Member
Registered: 2012-09-03
Posts: 4,971

Re: How can I preserve the dump from a GPU hang through reboot?

inotifywatch/inotifywait (inotify-tools), but no guarantees that it's gonna work reliably on sysfs (I'd say "no" but haven't tried in years)

Maybe you could setup yourself a shortcut to dump /sys/class/drm/card0/error into a regular file, sync the filesystem and reboot/stop X11/wayland and reload i915?

Offline

#6 2017-12-07 12:42:35

Trilby
Forum Fellow
From: Massachusetts, USA
Registered: 2011-11-29
Posts: 17,573
Website

Re: How can I preserve the dump from a GPU hang through reboot?

cfr wrote:

Whereas otherwise I have to keep triggering the script

That's why I suggested the loop.

while grep -q '^No error' /sys/class/drm/card0/error ; do sleep 1 done;
cat  /sys/class/drm/card0/error > /path/to/my/log

Resist the GNU world order.

Offline

#7 2017-12-07 13:53:18

loqs
Member
Registered: 2014-03-06
Posts: 3,099

Re: How can I preserve the dump from a GPU hang through reboot?

I had the same thought watching dmesg with awk will just sit until a new line is produced by dmesg.

Offline

#8 2017-12-07 14:05:56

Trilby
Forum Fellow
From: Massachusetts, USA
Registered: 2011-11-29
Posts: 17,573
Website

Re: How can I preserve the dump from a GPU hang through reboot?

If these errors are logged in dmesg then the problem is much simpler as they should also then be in the journal (which is maintained across a reboot).


Resist the GNU world order.

Offline

#9 2017-12-07 14:12:08

loqs
Member
Registered: 2014-03-06
Posts: 3,099

Re: How can I preserve the dump from a GPU hang through reboot?

The problem is in the journal you get

Rha 07 00:32:17 MyComputer kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error

but it does not contain the contents of /sys/class/drm/card0/error
So I was using awk or at least the idea was scan dmesg -w so it follows dmesg for the error line mentioning /sys/class/drm/card0/error then execute a system call to bzip2 up the contents of  /sys/class/drm/card0/error to some location.

Last edited by loqs (2017-12-07 14:13:12)

Offline

#10 2017-12-07 14:13:52

slithery
Member
Registered: 2013-12-01
Posts: 2,075

Re: How can I preserve the dump from a GPU hang through reboot?

Can you still SSH into the machine after a hang has occurred?


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan

aur - dotfiles

Offline

#11 2017-12-07 14:14:03

Trilby
Forum Fellow
From: Massachusetts, USA
Registered: 2011-11-29
Posts: 17,573
Website

Re: How can I preserve the dump from a GPU hang through reboot?

Oops, yes, sorry I missed that part.


Resist the GNU world order.

Offline

#12 2017-12-07 15:26:41

seth
Member
Registered: 2012-09-03
Posts: 4,971

Re: How can I preserve the dump from a GPU hang through reboot?

@slithery, he *could* but lacks the ability to ssh into it in general (see his first post)

Offline

#13 2017-12-07 17:59:30

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,068

Re: How can I preserve the dump from a GPU hang through reboot?

loqs wrote:

So I was using awk or at least the idea was scan dmesg -w so it follows dmesg for the error line mentioning /sys/class/drm/card0/error then execute a system call to bzip2 up the contents of  /sys/class/drm/card0/error to some location.

Ah! OK. Now I understand. I didn't realise you could do that with dmesg. That sounds ideal. I'll see if I can get this working.


How To Ask Questions The Smart Way | Help Vampires

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Corporation Wireless 8265 / 8275 | US keyboard with Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#14 2017-12-07 18:04:41

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,068

Re: How can I preserve the dump from a GPU hang through reboot?

seth wrote:

inotifywatch/inotifywait (inotify-tools), but no guarantees that it's gonna work reliably on sysfs (I'd say "no" but haven't tried in years)

Maybe you could setup yourself a shortcut to dump /sys/class/drm/card0/error into a regular file, sync the filesystem and reboot/stop X11/wayland and reload i915?

I'm not sure. I tried (1) switching to another TTY and (2) killing X during the last incident. No trace of these attempts appeared in the journal. In contrast, brightness key presses were registered. (I had thought that maybe the screen just had brightness set to zero - I didn't know about the GPU hang as this didn't occur in the journal when this happened before.) There was also no trace of my  pressing the power button briefly. But sleep key and lid open/close were registered. So I'm not clear what the system will/won't register when in this state.


How To Ask Questions The Smart Way | Help Vampires

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Corporation Wireless 8265 / 8275 | US keyboard with Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#15 2017-12-07 18:06:21

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,068

Re: How can I preserve the dump from a GPU hang through reboot?

Trilby wrote:

If these errors are logged in dmesg then the problem is much simpler as they should also then be in the journal (which is maintained across a reboot).

This is actually how I know it is a GPU hang. Last time, I didn't get this. But this time, the GPU hang was logged to the journal. Unfortunately, as noted above, I need the stuff from sysfs to file a useful bug report.


How To Ask Questions The Smart Way | Help Vampires

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Corporation Wireless 8265 / 8275 | US keyboard with Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#16 2017-12-07 19:00:37

slithery
Member
Registered: 2013-12-01
Posts: 2,075

Re: How can I preserve the dump from a GPU hang through reboot?

seth wrote:

@slithery, he *could* but lacks the ability to ssh into it in general (see his first post)

Oops, I missed that.

Are you really not able to SSH in at other locations, for example by using a smartphone?


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan

aur - dotfiles

Offline

#17 2017-12-09 03:52:51

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 6,068

Re: How can I preserve the dump from a GPU hang through reboot?

slithery wrote:
seth wrote:

@slithery, he *could* but lacks the ability to ssh into it in general (see his first post)

Oops, I missed that.

Are you really not able to SSH in at other locations, for example by using a smartphone?

No smart phone. I have a phone, but I don't think you can ssh by SMS, can you?


How To Ask Questions The Smart Way | Help Vampires

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Corporation Wireless 8265 / 8275 | US keyboard with Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

Board footer

Powered by FluxBB