You are not logged in.
Hey!
I'm having random system hangs where I can't do anything beside force power off (long power button press) on my old laptop. This mainly happens while using a browser (chromium/firefox) and watching twitch/youtube streams or videos. There are literally zero info in logs, it looks like normal system logs and then power off.
I've found this thread because issue looked similar and I've used xf86-video-intel too. But unfortunately deleting xf86-video-intel didn't solve the issue.
Any thoughts or suggestions?
Offline
Might not be super helpful but have you ruled out hardware issues? Have you ran memtest86 to see if the RAM is okay?
Also, can you use CTRL+ALT+F(#) to change to another virtual console when the system hangs?
Offline
Have you ran memtest86 to see if the RAM is okay?
Yeah, like 5-6 times - zero errors, zero hangs during memtest.
Also, can you use CTRL+ALT+F(#) to change to another virtual console when the system hangs?
When system hangs I can't do anything besides force power off, the system just doesn't react to mouse/keyboard.
Also the issue is pretty random, system can work fine without a single hang for 3-5 days and can hang 2-3 times a day.
Offline
Struggled with an issue like this myself on an old laptop.
The setting I posted here solved it eventually: https://bbs.archlinux.org/viewtopic.php … 3#p1961833
Offline
Struggled with an issue like this myself on an old laptop.
The setting I posted here solved it eventually: https://bbs.archlinux.org/viewtopic.php … 3#p1961833
Thanks for an opinion but it looks like you had a Baytrail-specific issue and my CPU is a bit older - Arrandale.
Right now I'm trying another thing. I mainly use Chromium with enabled GUI/video hardware acceleration, witch might cause host craches according to wiki article. Testing with default chromium flags with no hardware acceleration right now.
Offline
Does your system log MCEs ?(Machine Check Exception). They are a low level function and the logs for them are available from the pre-boot configuration menu (sometimes called the BIOS menu, but no longer accurate). I am thinking your machine shut down to protect itself from an over temperature condition.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
Does your system log MCEs ?(Machine Check Exception). They are a low level function and the logs for them are available from the pre-boot configuration menu (sometimes called the BIOS menu, but no longer accurate). I am thinking your machine shut down to protect itself from an over temperature condition.
There no high temperature warnings in BIOS, the last warning was a years ago and it's about battery state. CPU temp before the hang was 55-65c. Also my machine doesn't shut itself down, it just hangs with a static image of a browser window on the display.
Offline
You may try to limit the c-state regardless of the CPU generation.
Have you reproduced the freeze w/o the browser HW accel?
Because of "mainly" - what were some other circumstances?
Offline
You may try to limit the c-state regardless of the CPU generation.
Have you reproduced the freeze w/o the browser HW accel?Because of "mainly" - what were some other circumstances?
Right now I'm testing the system without chromium HW accel, no hangs so far. But as I said before - the issue is pretty random and system can work fine for a few days. Need more tests before trying c-state limit.
I had a hang in firefox (while watching YT) once.
Offline
Tested firefox with default flags (no HW accel) and just got a ff crash while watching YT again. But, fortunately, there was no system hang and I managed to get crash logs with some kernel OOPS logs - here.
Offline
Aug 10 15:02:39 laptop kernel: BUG: unable to handle page fault for address: ffffffffab07de98
Aug 10 15:02:39 laptop kernel: #PF: supervisor write access in kernel mode
Aug 10 15:02:39 laptop kernel: #PF: error_code(0x0003) - permissions violation
Aug 10 15:02:39 laptop kernel: PGD 197415067 P4D 197415067 PUD 197416063 PMD 8000000196e000e1
Aug 10 15:02:39 laptop kernel: Oops: 0003 [#1] PREEMPT SMP PTI
Aug 10 15:02:39 laptop kernel: CPU: 2 PID: 9653 Comm: firefox Tainted: G OE 5.13.9-arch1-1 #1
Aug 10 15:02:39 laptop kernel: Hardware name: Hewlett-Packard HP G62 Notebook PC /143A, BIOS F.48 11/09/2011
Aug 10 15:02:39 laptop kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x1c2/0x200
Aug 10 15:02:39 laptop kernel: Code: ff ff f3 90 8b 02 85 c0 74 ee eb f6 c1 ef 12 83 e0 03 83 ef 01 48 c1 e0 05 48 63 ff 48 05 00 da 02 00 48 03 04 fd 00 89 04 ab <48> 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 48 8b 39
Aug 10 15:02:39 laptop kernel: RSP: 0018:ffffb00680aafdf8 EFLAGS: 00010282
Aug 10 15:02:39 laptop kernel: RAX: ffffffffab07de98 RBX: ffffb00680aafe70 RCX: ffff9c5353cada00
Aug 10 15:02:39 laptop kernel: RDX: ffff9c5316575a18 RSI: 00000000000c0000 RDI: 00000000000003bf
Aug 10 15:02:39 laptop kernel: RBP: 000000000000260c R08: 00000000000c0000 R09: ffff9c531a1b1e00
Aug 10 15:02:39 laptop kernel: R10: ffff9c53136d58e0 R11: 0000000000000036 R12: ffff9c52decff750
Aug 10 15:02:39 laptop kernel: R13: ffff9c5316575a18 R14: ffff9c53165759c0 R15: ffff9c5282e91598
Aug 10 15:02:39 laptop kernel: FS: 00007f60ba2bb780(0000) GS:ffff9c5353c80000(0000) knlGS:0000000000000000
Aug 10 15:02:39 laptop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 10 15:02:39 laptop kernel: CR2: ffffffffab07de98 CR3: 000000019afa0002 CR4: 00000000000206e0
Aug 10 15:02:39 laptop kernel: Call Trace:
Aug 10 15:02:39 laptop kernel: _raw_spin_lock+0x21/0x30
Aug 10 15:02:39 laptop kernel: d_walk+0xc6/0x2a0
Aug 10 15:02:39 laptop kernel: ? select_collect2+0xb0/0xb0
Aug 10 15:02:39 laptop kernel: shrink_dcache_parent+0x4c/0x120
Aug 10 15:02:39 laptop kernel: vfs_rmdir+0xe9/0x180
Aug 10 15:02:39 laptop kernel: do_rmdir+0x1b5/0x1e0
Aug 10 15:02:39 laptop kernel: do_syscall_64+0x40/0x80
Aug 10 15:02:39 laptop kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 10 15:02:39 laptop kernel: RIP: 0033:0x7f60ba3b142b
Aug 10 15:02:39 laptop kernel: Code: 73 01 c3 48 8b 0d 45 ea 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 54 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 ea 0c 00 f7 d8 64 89 01 48
Aug 10 15:02:39 laptop kernel: RSP: 002b:00007ffe3ad57328 EFLAGS: 00000202 ORIG_RAX: 0000000000000054
Aug 10 15:02:39 laptop kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f60ba3b142b
Aug 10 15:02:39 laptop kernel: RDX: 0000000000000003 RSI: 0000000000000003 RDI: 00007f6078159b88
Aug 10 15:02:39 laptop kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
Aug 10 15:02:39 laptop kernel: R10: 0000000000000093 R11: 0000000000000202 R12: 00007f6061266f40
Aug 10 15:02:39 laptop kernel: R13: 00007f6060bfa500 R14: 0000000080520015 R15: 00007f6060d04d00
Aug 10 15:02:39 laptop kernel: Modules linked in: iptable_mangle xt_TCPWIN(OE) xt_tcpudp brcmsmac brcmutil snd_hda_codec_realtek b43 snd_hda_codec_generic ledtrig_audio joydev cordic mousedev mac80211 cfg80211 ssb snd_hda_intel i915 mmc_core snd_intel_dspcfg snd_intel_sdw_acpi uvcvideo pcmcia hp_wmi platform_profile pcmcia_core iTCO_wdt snd_hda_codec sparse_keymap wmi_bmof intel_pmc_bxt rfkill libarc4 iTCO_vendor_support at24 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 r8169 intel_powerclamp videobuf2_common i2c_algo_bit coretemp snd_hda_core intel_cstate intel_uncore realtek snd_hwdep drm_kms_helper snd_pcm videodev mdio_devres psmouse pcspkr snd_timer libphy wmi mc bcma mei_me cec snd syscopyarea mei sysfillrect i2c_i801 lpc_ich sysimgblt fb_sys_fops soundcore mac_hid video i2c_smbus intel_agp intel_gtt acpi_cpufreq vfat fat drm crypto_user fuse agpgart ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq ecb crypto_simd cryptd xts dm_crypt cbc encrypted_keys trusted
Aug 10 15:02:39 laptop kernel: asn1_encoder tee tpm rng_core dm_mod serio_raw atkbd libps2 crc32c_intel sr_mod cdrom i8042 serio
Aug 10 15:02:39 laptop kernel: CR2: ffffffffab07de98
Aug 10 15:02:39 laptop kernel: ---[ end trace 404a77a4955ff7a0 ]---
Can you play local videos w/o problem?
Can eg. youtube-viewer make the system crash?
Do you use wifi (and the broadcom chip)?
Can you try a wired NIC (looks there's some r8169 chip ?
Offline
Can you play local videos w/o problem?
Can eg. youtube-viewer make the system crash?
Do you use wifi (and the broadcom chip)?
Can you try a wired NIC (looks there's some r8169 chip ?
I barely watch local videos but I can try it, is mpv+yt-dl okay?
Yes I use wifi, BCM94313HMG2L with brcmsmac driver.
It's a bit troublesome but I can switch to r8169 for a few days.
Do you think it might be a wifi card?
Offline
It's a kernelspace bug, the context suggests likely either the GPU or the NIC - and by default we blame broadcom :-P
(Therefore mpv+yt-dl is only helpful if you segment the steps, ie. yt-dl your desired por… cat videos and play the local file w/ mpv - we want to know whether the traffic or the playback triggers this)
Offline
It's a kernelspace bug, the context suggests likely either the GPU or the NIC - and by default we blame broadcom :-P
(Therefore mpv+yt-dl is only helpful if you segment the steps, ie. yt-dl your desired por… cat videos and play the local file w/ mpv - we want to know whether the traffic or the playback triggers this)
Oh, I see. I can test the system with another wireless NIC - using my smartphone in tethering mode, that should be fine right?
Offline
Yup.
Offline
Tested with another NIC (my smartphone) and tried to switch to wl driver instead of brcmsmac. And today I just had a kernel panic (caps lock blinking) with no logs catched, unfortunately.
I googled similar issues and there a lot of RAM issue suggestions. I tried to run the latest memtest from repos for 2-4 passes - no errors. I'm not sure if that was enough to exclude RAM issues from the list. Should I try to run memtest longer, like 5-8 hours, or this is pointless?
Offline
I just had a kernel panic
Circumstances? While playing a (youtube) video?
The can be a RAM issue (pretty much everything can happen when your RAM returns bogus values), but a memtest requires maaaaany cycles ("overnight" - at least) to be significant.
If however this only emerges w/ a very spcific usage pattern, random memory errors become less likely (could eg. be heat)
Also try to collect more backtraces (if possible) - the posted one was in the VFS. This could be totally random if the kernel space is tainted, but if it ends there every single time, that'd no longer be a coincidence.
Offline
Circumstances? While playing a (youtube) video?
Yep, chromium while watching a stream.
Also try to collect more backtraces (if possible) - the posted one was in the VFS. This could be totally random if the kernel space is tainted, but if it ends there every single time, that'd no longer be a coincidence.
I tried to grep my current journal for more backtraces - here.
Full log from Aug 8 here.
There are some btrfs funcs in the backtraces and I found a few issues related to NFS containing page_remove_rmap call. I'm not sure if it's related to kernel's FS/disck subsystem but just for more info. I'm using LUKS with btrfs (with subvolumes and compression) on a brand new SATA SSD (I had the same hang issues on my 10 years old HDD too).
Last edited by Yukarin (2021-08-12 17:34:01)
Offline
Did you see the impressive amount of list_del corruptions?
Smells very much filesystem related - you could try some live distro and watch youtube from there - my money would be on compression…
Offline
Did you see the impressive amount of list_del corruptions?
grep 'list_del corruption' /tmp/journal | wc -l
says there are 25 results, all from Aug 8, no new messages since then.
Offline
It's the only log you posted
Other "kernel: BUG" lines for the other days?
(Please don't just post the bug line, the entire segments below to the "end trace" are gonna be relevant)
Offline
Other "kernel: BUG" lines for the other days?
(Please don't just post the bug line, the entire segments below to the "end trace" are gonna be relevant)
Nope, only Aug 8 and Aug 10.
There no new lines below "rtkit-daemon[653]: Demoted 3 threads.", new boot logs after hard reset starts next.
Offline
So the only data we have points towards the filesystem.
I'd give it a shot and mount btrfs w/o compression and see whether the problem still occurs (weird because of the context, but hey…)
Offline
Tried btrfs w/o compression and just had another kernel panic, no logs catched...
Offline
Just found old vendor RAM sticks and swapped the current ones. Right after power on and booting into plasma I saw some random graphic glitches - some buttons/lists/textures turns plain black rectangles and then go back to normal. Glitches disappear after minimize/maximize window a few times or after clicking some buttons/tabs. Also glitches may randomly appear after closing-opening the lid. Testing browsers with playing video right now, no hangs/kernel panics yet...
I'm 100% sure that vendor RAM is fine because I never had such problem before on the same machine with this RAM, so that's probably a GPU issue?
Offline