You are not logged in.

#1 2022-07-18 12:48:36

coxe87b
Member
From: Canberra
Registered: 2019-12-08
Posts: 67

[SOLVED] Troubleshooting random crashes

I am hoping someone might be able to help me with a random crashing issue on my laptop. I've had it for a few months now and every now and then I get a random crash. I haven't been able to pin-point the root cause as yet, but I have a few suspicions. Also note that this is an older installation of Arch from my previous laptop as I carried over the SSD in to the new laptop.

The laptop came with a single 8GB memory module and I upgraded to 16GB by adding a matching spec module. Given that I hadn't experienced any crashes before upgrading, I suspected a faulty memory module, however I did a sudo memtester 16G and it seems to run fine for multiple passes over a few hours.

I am aware that the journal dump and data provided is not for the current kernel, but this has been happening over several kernel versions and I have upgraded the kernel to current 5.18.12.arch1-1 which is also switching away from the LTS kernel so I will see how that goes. The problem is that it doesn't happen every time and I don't really know what other data I should collect to troubleshoot further. I have also done an update for brave browser in case that is the cause of the issue.

I did notice this in the journal right before the latest crash which would seem to point to a cause, but searching on this doesn't really reveal much to me;

~ ɸ sudo journalctl -eb -1 | tail -n 20
Jul 18 22:13:54 skywalker kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff920a1c908738
Jul 18 22:13:54 skywalker kernel: R13: 00000000ffffffff R14: ffff920a1c908718 R15: 0000000000000000
Jul 18 22:13:54 skywalker kernel: FS:  00007f2ddd58a640(0000) GS:ffff920d0f880000(0000) knlGS:0000000000000000
Jul 18 22:13:54 skywalker kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 18 22:13:54 skywalker kernel: CR2: 000005a0032a8000 CR3: 000000012903c002 CR4: 00000000003706e0
Jul 18 22:13:54 skywalker kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 18 22:13:54 skywalker kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 18 22:13:54 skywalker audit[1286]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 pid=1286 comm="Chrome_ChildIOT" exe="/usr/lib/brave-bin/brave" sig=11 res=1
Jul 18 22:13:54 skywalker kernel: audit: type=1701 audit(1658146434.279:253): auid=1000 uid=1000 gid=1000 ses=2 pid=1286 comm="Chrome_ChildIOT" exe="/usr/lib/brave-bin/brave" sig=11 res=1
Jul 18 22:13:54 skywalker systemd[1]: Created slice Slice /system/systemd-coredump.
Jul 18 22:13:54 skywalker audit: BPF prog-id=31 op=LOAD
Jul 18 22:13:54 skywalker audit: BPF prog-id=32 op=LOAD
Jul 18 22:13:54 skywalker audit: BPF prog-id=33 op=LOAD
Jul 18 22:13:54 skywalker kernel: audit: type=1334 audit(1658146434.309:254): prog-id=31 op=LOAD
Jul 18 22:13:54 skywalker kernel: audit: type=1334 audit(1658146434.309:255): prog-id=32 op=LOAD
Jul 18 22:13:54 skywalker kernel: audit: type=1334 audit(1658146434.309:256): prog-id=33 op=LOAD
Jul 18 22:13:54 skywalker systemd[1]: Started Process Core Dump (PID 10932/UID 0).
Jul 18 22:13:54 skywalker kernel: BUG: unable to handle page fault for address: 0000000000004008
Jul 18 22:13:54 skywalker kernel: #PF: supervisor read access in kernel mode
Jul 18 22:13:54 skywalker kernel: #PF: error_code(0x0000) - not-present page
 ~ ɸ

System specs:

HP Elitebook 820 G4
Kernel 5.15.53-2-lts
i3wm
Intel Core i7-7600U
16GB DDR4 2133MHz

Full journalctl dump for last boot
https://pastebin.com/fggxtAV9

Last edited by coxe87b (2022-08-30 11:20:49)


Desktop: Arch Linux  |  i3-gaps WM  |  Intel Core i5-9600K  |  16GB RAM  |  AMD Radeon RX 6700XT  |  Dual monitors @ 1440p + 1080p
Laptop: Garuda Linux  |  Sway WM  |  Dell Latitude E7270  |  Intel Core i5-6300U  |  16GB RAM
~ Do or do not, there is no try ~

Offline

#2 2022-07-18 15:54:19

seth
Member
Registered: 2012-09-03
Posts: 56,483

Re: [SOLVED] Troubleshooting random crashes

Given that I hadn't experienced any crashes before upgrading, I suspected a faulty memory module, however I did a sudo memtester 16G and it seems to run fine for multiple passes over a few hours.

memtest86, "days" but at least overnight.

Is it always brave that's crashing?

Jul 18 22:13:54 skywalker systemd[1]: Created slice Slice /system/systemd-coredump.

https://wiki.archlinux.org/title/Core_d … _core_dump

Do you have swap space (that pastebin is not the complete journal)

Offline

#3 2022-07-18 23:24:42

coxe87b
Member
From: Canberra
Registered: 2019-12-08
Posts: 67

Re: [SOLVED] Troubleshooting random crashes

I will do an overnight run of memtest86 and report findings.

It's the whole system that crashes. I think Brave is usually open when it does however. The system locks up and I cannot open another TTY and commands do not respond. Not even the caps lock light changes on press. I have to do a hard reset.

I will get back to you with the coredump.

Sorry, what I meant was that it's the complete journalctl log for the previous boot which includes the crash. I can provide the full journalctl log of that helps.

No, I do not have any swap space allocated, but my RAM usage never goes above 50%.


Desktop: Arch Linux  |  i3-gaps WM  |  Intel Core i5-9600K  |  16GB RAM  |  AMD Radeon RX 6700XT  |  Dual monitors @ 1440p + 1080p
Laptop: Garuda Linux  |  Sway WM  |  Dell Latitude E7270  |  Intel Core i5-6300U  |  16GB RAM
~ Do or do not, there is no try ~

Offline

#4 2022-07-19 06:28:21

seth
Member
Registered: 2012-09-03
Posts: 56,483

Re: [SOLVED] Troubleshooting random crashes

No, I do not have any swap space allocated, but my RAM usage never goes above 50%.

You should™ add some swap file as spillaway capture - a rogue process can allocate 8GB of RAM in no time.

However, the cut-off is already suspicious. I guess you didn't replace the old DIMM but just added a new one that is probably not the same model/vendor let alone charge?
=> Alter the RAM timings in the BIOS to the most conservative values possible.

Offline

#5 2022-07-19 13:58:54

coxe87b
Member
From: Canberra
Registered: 2019-12-08
Posts: 67

Re: [SOLVED] Troubleshooting random crashes

I'm not convinced that I'm running out of RAM, monitoring with free, htop and my polybar monitor all report less than 4GB of RAM usage most of the time while I'm using the laptop. If nothing else works, I'll consider allocating swap, but I've never needed it in the past and I haven't really changed my usage patterns.

No, I didn't replace the old DIMM. I actually did try to match the RAM as close to the existing one however. The vendor, timings and frequency are identical across both modules.

So far memtest86 has run 2 passes and been running for a few hours with no errors. I'll let it continue to run overnight.


Desktop: Arch Linux  |  i3-gaps WM  |  Intel Core i5-9600K  |  16GB RAM  |  AMD Radeon RX 6700XT  |  Dual monitors @ 1440p + 1080p
Laptop: Garuda Linux  |  Sway WM  |  Dell Latitude E7270  |  Intel Core i5-6300U  |  16GB RAM
~ Do or do not, there is no try ~

Offline

#6 2022-07-19 14:28:42

qinohe
Member
From: Netherlands
Registered: 2012-06-20
Posts: 1,494

Re: [SOLVED] Troubleshooting random crashes

coxe87b wrote:

...If nothing else works, I'll consider allocating swap, but I've never needed it in the past ...

It's your free choice to do what you like;) .. take it from seth he's very right your swap could use 10G RAM in a second we all seen that!
It's not about 'if nothing else works', A sane system has a swap file/partition to prevent{hopefully} it breaking/hanging/becoming unreachable etc. due to RAM spiking. I know I could get some noise from people that are against swap - so be it;)

Offline

#7 2022-07-20 13:41:49

coxe87b
Member
From: Canberra
Registered: 2019-12-08
Posts: 67

Re: [SOLVED] Troubleshooting random crashes

qinohe wrote:
coxe87b wrote:

...If nothing else works, I'll consider allocating swap, but I've never needed it in the past ...

It's your free choice to do what you like;) .. take it from seth he's very right your swap could use 10G RAM in a second we all seen that!
It's not about 'if nothing else works', A sane system has a swap file/partition to prevent{hopefully} it breaking/hanging/becoming unreachable etc. due to RAM spiking. I know I could get some noise from people that are against swap - so be it;)

I take what you are saying on board, though if that were the issue would I not be seeing the memory usage spike in one of my performance monitors? I have a live reading of memory and CPU usage in my polybar at the top of my screen which updates every second, and as I said before, I've never seen it over 3GB usage, so I still have plenty in reserve. But I do take your point, I won't rule out the possibility.

~ ɸ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       1.4Gi        12Gi       382Mi       1.6Gi        13Gi
Swap:             0B          0B          0B
 
seth wrote:

memtest86, "days" but at least overnight.

I have done this overnight, booting directly off the memtest86 USB image and it passed with no errors. So it seems that the memory is physically ok.

I have not had a crash yet since updating my system and updating brave through the AUR, with an uptime of a bit over an hour. I wonder if it could be that simple, could a bug in brave render my whole system unresponsive? Even to the point that I can't launch another TTY or change caps lock status?

seth wrote:

I guess you didn't replace the old DIMM but just added a new one that is probably not the same model/vendor let alone charge?

~ ɸ sudo lshw -c memory
  *-memory
       description: System Memory
       physical id: 0
       slot: System board or motherboard
       size: 16GiB
     *-bank:0
          description: SODIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
          product: 9905625-004.A03LF
          vendor: Kingston
          physical id: 0
          serial: 66D28227
          slot: Bottom-Slot 1(left)
          size: 8GiB
          width: 64 bits
          clock: 2133MHz (0.5ns)
     *-bank:1
          description: SODIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
          product: HP26D4S9S8MD-8
          vendor: Kingston
          physical id: 1
          serial: 032F285B
          slot: Bottom-Slot 2(right)
          size: 8GiB
          width: 64 bits
          clock: 2133MHz (0.5ns)

The difference in the product number is that one is manufactured for HP as OEM RAM and the other is a retail version of the same memory to my understanding. The specs seemed identical when I bought it.


Desktop: Arch Linux  |  i3-gaps WM  |  Intel Core i5-9600K  |  16GB RAM  |  AMD Radeon RX 6700XT  |  Dual monitors @ 1440p + 1080p
Laptop: Garuda Linux  |  Sway WM  |  Dell Latitude E7270  |  Intel Core i5-6300U  |  16GB RAM
~ Do or do not, there is no try ~

Offline

#8 2022-07-20 14:14:05

seth
Member
Registered: 2012-09-03
Posts: 56,483

Re: [SOLVED] Troubleshooting random crashes

could a bug in brave render my whole system unresponsive?

Brave segfaults and triggers a kernel bug in the audit, audit_kill_trees => kill_rules
The pattern rarely shows up on google (one fedora and one oracle hit)
You can try the impact of disabling the audit framework, https://wiki.archlinux.org/title/Audit_ … stallation

Brave will likely still crash but doesn't necessarily take down the system - however if there's a specific userspace process segfaulting and that triggers some rare issue in the kernelspace through a coredump and browser and coredump tend to be fat beasts and the memory is physically ok…

seth wrote:

a rogue process can allocate 8GB of RAM in no time.

Offline

#9 2022-08-30 11:20:11

coxe87b
Member
From: Canberra
Registered: 2019-12-08
Posts: 67

Re: [SOLVED] Troubleshooting random crashes

I am marking this as solved as I suspect the issue was related to a hardware fault with either the motherboard or CPU. The board no longer works and I'm replacing it.


Desktop: Arch Linux  |  i3-gaps WM  |  Intel Core i5-9600K  |  16GB RAM  |  AMD Radeon RX 6700XT  |  Dual monitors @ 1440p + 1080p
Laptop: Garuda Linux  |  Sway WM  |  Dell Latitude E7270  |  Intel Core i5-6300U  |  16GB RAM
~ Do or do not, there is no try ~

Offline

Board footer

Powered by FluxBB