You are not logged in.
Hi!
When I run
sudo pacman -Syu
, the update starts, but a few package upgrades in, it suddenly stops (shows "Upgrading packagename" and the loading bar stops moving). At this point Ctrl+C no longer stops pacman, but rather prints ^C in my terminal. The mouse is still working and I can open and close some apps, but I cannot open new tabs in Firefox. I can open a terminal but it freezes after typing a few characters or instantly when hitting enter, so I cannot run anything.
I had chaotic-aur enabled, but have since disabled it and the issue still occurs. The only major thing I have changed recently is switching my terminal shell from zsh to fish, but I made sure to try running pacman from zsh just in case, and it still repros every time.
I am still able to install individual packages (I used `paru -S packagename` multiple times with no issue), but I cannot do a system upgrade.
Here is a photo of the frozen install:
https://i.imgur.com/f2yV0mL.png
Sometimes when I try to run a command in this state and the shell freezes, it prints warning: `Locking the history file took too long (0.770 seconds)., which I've never seen before.`
These are the only logs right after the pam_unix ones indicating when I ran the pacman command, before I forced it to power off:
Sep 01 17:43:26 apg systemd[1]: Reloading requested from client PID 1985336 ('systemctl') (unit session-3.scope)...
Sep 01 17:43:26 apg systemd[1]: Reloading...
Sep 01 17:43:26 apg systemd[1]: /usr/lib/systemd/system/libvirtd.service:34: Unknown key name 'LimitNOFile' in section 'Service', ignoring.
Sep 01 17:43:26 apg systemd[1]: /usr/lib/systemd/system/virtlockd.service:21: Unknown key name 'LimitNOFile' in section 'Service', ignoring.
Sep 01 17:43:26 apg systemd[1]: /usr/lib/systemd/system/virtlogd.service:21: Unknown key name 'LimitNOFile' in section 'Service', ignoring.
Sep 01 17:43:26 apg systemd[1]: Reloading finished in 232 ms.
Sep 01 17:43:26 apg systemd[1]: Starting Daily man-db regeneration...
Sep 01 17:43:26 apg nm-applet[2526]: gtk_widget_get_scale_factor: assertion 'GTK_IS_WIDGET (widget)' failed
Sep 01 17:43:36 apg systemd[1]: man-db.service: Deactivated successfully.
Sep 01 17:43:36 apg systemd[1]: Finished Daily man-db regeneration.
Sep 01 17:43:36 apg systemd[1]: man-db.service: Consumed 8.411s CPU time.
Sep 01 17:44:47 apg systemd[1]: Reloading requested from client PID 1997430 ('systemctl') (unit session-3.scope)...
Sep 01 17:44:47 apg systemd[1]: Reloading...
Sep 01 17:44:47 apg systemd[1]: /usr/lib/systemd/system/libvirtd.service:34: Unknown key name 'LimitNOFile' in section 'Service', ignoring.
Sep 01 17:44:47 apg systemd[1]: /usr/lib/systemd/system/virtlockd.service:21: Unknown key name 'LimitNOFile' in section 'Service', ignoring.
Sep 01 17:44:47 apg systemd[1]: /usr/lib/systemd/system/virtlogd.service:21: Unknown key name 'LimitNOFile' in section 'Service', ignoring.
Sep 01 17:44:47 apg systemd[1]: Reloading finished in 231 ms.
Another freeze after disabling chaotic-aur:
https://media.discordapp.net/attachment … height=292
I am running a regular Arch install on a ASUS Zephyrus G14 2022 (AMD edition).
Last edited by TheSunCat (2023-09-09 16:51:37)
Offline
Sounds disk or RAM issue at the point at which it's happening. Check/post SMART information from smartctl -a preferrably after having ran a test. And post your journal from an affected boot
sudo journalctl -b #Or b-1 if you had to force power off
https://wiki.archlinux.org/title/List_o … n_services (if the log even from a previous boot after a forced power off doesn't contain anything conclusive, instead of powering off hard enable the sysrq sequence and use REISUB for a "safer" reboot) https://wiki.archlinux.org/title/Keyboa … el_(SysRq)
Mod side note, please only link to bigger images and/or use thumbnails.
Last edited by V1del (2023-09-08 15:36:46)
Offline
Thanks for your quick reply! Here is a full journal of the past boot (in which I booted, tried to upgrade, observed the freeze, then REISUB'd).
https://0x0.st/Hfmy.txt
I see there is a kernel panic (caps lock was not blinking, though). This has recently happened multiple times while playing OMORI on Steam, maybe it's related to the pacman lockups. In the case of the OMORI panics, `journalctl -b-1` does show the kernel panic in the log despite the non-REISUB hard reboot:
1_806 Aug 28 15:00:01.480376 apg kernel: asus_wmi: Unknown key code 0xcf
1_807 Aug 28 17:22:31.181352 apg kernel: BUG: unable to handle page fault for address: ffff9d351d57c280
1_808 Aug 28 17:22:31.184733 apg kernel: #PF: supervisor read access in kernel mode
1_809 Aug 28 17:22:31.192539 apg kernel: #PF: error_code(0x0000) - not-present page
1_810 Aug 28 17:22:31.196250 apg kernel: PGD 0 P4D 0
1_811 Aug 28 17:22:31.200126 apg kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
1_812 Aug 28 17:22:31.200188 apg kernel: CPU: 8 PID: 1111797 Comm: kworker/u32:2 Tainted: G W 6.4.10-201.fsync.fc38.x86_64 #1
1_813 Aug 28 17:22:31.200223 apg kernel: Hardware name: ASUSTeK COMPUTER INC. ROG Zephyrus G14 GA402RJ_GA402RJ/GA402RJ, BIOS GA402RJ.502.FT 03/22/2023
Here is the result of smartctl -a after having run sudo smartctl -t short /dev/nvme0 and waited for completion.
https://0x0.st/HfmY.txt
I wasn't sure how to post images (I used the img tag as explained in help.php but I think I did it wrong). Is there something I need to add to get it to display as a thumbnail? It looked fine in the preview.
Offline
It hadn't come to mind, but I have recently upgraded my RAM (replaced SODIMM 8GiB stick with a 32GiB 4800 MHz one). After performing the upgrade, I ran a memtest86 overnight and it reported no issues.
Offline
The filesystems (btrfs & vfat) aren't clean, did you trigger the last reboot w/ the power button?
In general: https://wiki.archlinux.org/title/Solid_ … leshooting
You've systemd-networkd & NetworkManager enabled, pick one, disable the other.
Also iwd doesn't look like it's invoked by NM?
Does this here actually co-incide w/ a pacman update attempt?
Sep 08 18:28:03 apg systemd[1]: Startup finished in 7.049s (firmware) + 458ms (loader) + 746ms (kernel) + 1.450s (initrd) + 2min 2.962s (userspace) = 2min 12.667s.
Sep 08 18:28:04 apg systemd[1]: libvirtd.service: Deactivated successfully.
Sep 08 18:28:04 apg systemd[1]: libvirtd.service: Unit process 1501 (dnsmasq) remains running after unit stopped.
Sep 08 18:28:04 apg systemd[1]: libvirtd.service: Unit process 1502 (dnsmasq) remains running after unit stopped.
Sep 08 18:28:11 apg kernel: BUG: unable to handle page fault for address: ffff8abf0b8aea80
Sep 08 18:28:11 apg kernel: #PF: supervisor read access in kernel mode
Sep 08 18:28:11 apg kernel: #PF: error_code(0x0000) - not-present page
Sep 08 18:28:11 apg kernel: PGD 0 P4D 0
Sep 08 18:28:11 apg kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Sep 08 18:28:11 apg kernel: CPU: 15 PID: 844 Comm: kworker/u32:15 Not tainted 6.4.12-201.fsync.fc38.x86_64 #1
Sep 08 18:28:11 apg kernel: Hardware name: ASUSTeK COMPUTER INC. ROG Zephyrus G14 GA402RJ_GA402RJ/GA402RJ, BIOS GA402RJ.502.FT 03/22/2023
Sep 08 18:28:11 apg kernel: Workqueue: events_unbound btrfs_preempt_reclaim_metadata_space
Sep 08 18:28:11 apg kernel: RIP: 0010:extent_buffer_bitmap_set+0x5c/0x140
Sep 08 18:28:11 apg kernel: Code: e0 48 83 ec 08 48 8b 1f 89 4c 24 04 81 e3 ff 0f 00 00 44 88 44 24 03 48 01 f3 48 01 c3 49 89 df 81 e3 ff 0f 00 00 49 c1 ef 0c <4a> 8b 6c ff 70 48 89 ee e8 57 99 ff ff 8b 4c 24 04 48 89 e8 ba 08
Sep 08 18:28:11 apg kernel: RSP: 0018:ffffb476411f3ac0 EFLAGS: 00010212
Sep 08 18:28:11 apg kernel: RAX: 0001ffffffffffe2 RBX: 00000000000003da RCX: 0000000000000002
Sep 08 18:28:11 apg kernel: RDX: 00000000000000ff RSI: 00000000000023f8 RDI: ffff89bf0b8aea00
memtest86+ on 32GB will take more than "while I was sleeping" to yield meaningful results (you made one cycle?) but if this is somewhat reliably happening and in case fixinf your network stack doesn't cut it: try to downclock the RAM (or chose the least aggressive timings in the BIOS)
Offline
did you trigger the last reboot w/ the power button?
I used SysRq+REISUB to reboot (with an external keyboard, waiting a few seconds between each key)).
You've systemd-networkd & NetworkManager enabled, pick one, disable the other.
Thanks, that's probably why I have so many wifi connectivity issues. Must have accidentally reenabled systemd-networkd.
Also iwd doesn't look like it's invoked by NM?
I use iwgtk to connect to wifi networks, and I also have iwd set as the backend for neworkmanager (so nmtui and such still work). Is this OK?
Does this here actually co-incide w/ a pacman update attempt?
Yes, this is when I tried to run pacman -Syu and it panicked halfway through installing updates.
try to downclock the RAM (or chose the least aggressive timings in the BIOS)
There are sadly no BIOS settings for RAM speeds on this laptop. I believe I'm already at the lowest DDR5 speed (4800MHz) hardware-wise.
I ran memtest86 and pressed the labeled key to enable multithreading, but it appeared to just start the test. It ran for about 12h until I got back to it and it said the tests were all passed. It's my first time using memtest86, I thought it selected the test itself haha
Offline
I used SysRq+REISUB to reboot (with an external keyboard, waiting a few seconds between each key)).
I meant before the boot of the posted journal.
The FS are corrupted and unless that happened because of a hard reboot, it might indicate the cause of the freeze (the nvme)
I also have iwd set as the backend for neworkmanager
Please post the output of
find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -f
iwd starts before NM and is not started by it.
You might be ok to enable the service along setting it as backend for NM (NM is still talking to it), but that is neither necessary nor suggested by https://wiki.archlinux.org/title/Networ … Fi_backend
Offline
Oh, could be! I just reprod it again with `journalctl --follow` running and got a stack trace for the crash. Unfortunately since I can't run any commands when it's frozen, I have to resort to sharing a picture of the screen with the crash log. Does indeed look like a btrfs issue.
https://i.imgur.com/LcNSLk6.jpg
The FS probably has some corruption because I have to very frequently do a hard reboot, as I have been affected by this unsolved issue for the past year I've had this laptop: https://gitlab.freedesktop.org/drm/amd/-/issues/2068
Is there a safe way to repair it/clear the corrupted parts? I don't mind some data loss as my important files are not on this computer.
Here's the output of that find command: http://0x0.st/HfBy.txt
I disabled iwd just now and added the wifi_backend config as explained in the article you linked. I remember having done this in the past, but perhaps I cleared it at some point as it was not done.
Last edited by TheSunCat (2023-09-08 21:07:34)
Offline
iwd.service | multi-user.target.wants
That's not supposed to be there.
I have to resort to sharing a picture of the screen with the crash log.
Picture is fine (as absolutely necessary), but please replace the oversized image w/ a link.
Is there a safe way to repair it/clear the corrupted parts?
Way: Yes. Safe: No.
https://wiki.archlinux.org/title/Btrfs#btrfs_check
See the warning.
iwd.service | multi-user.target.wants
That's not supposed to be there.
I have been affected by this unsolved issue for the past year
amdgpu.dpm=0 amdgpu.bapm=0 amdgpu.runpm=0 amdgpu.aspm=0 pcie_aspm=off
"amdgpu.dpm=0" was very recently reported to no longer prevent any boot (but indeed solve a problem) so give that a try along the other kernel parameters.
Offline
I've disabled iwd since (as explained in the article you linked, thanks!).
Got it, I replaced the image with a plain url.
To use btrfs-check, I have to boot a live USB and use it from there, right? Since I don't think I can unmount my running filesystem
I'd previously had issues caused by the WinBtrfs driver when I was dual booting for Respondus Lockdown Browser. There were dmesg logs about a file being corrupted. I found the inode and deleted the file, which caused the errors to go away (I assumed it was solved, and removed WinBtrfs). I understand this is a different situation from that, however.
I've applied the parameters, thanks! Hopefully that means less hard reboots. I don't always have an external keyboard handy with a printscreen key, and this laptop has none.
Offline
To use btrfs-check, I have to boot a live USB and use it from there, right?
Yes.
There were dmesg logs about a file being corrupted. I found the inode and deleted the file, which caused the errors to go away
You can try that, but there's no guarantee for the corruption to be limited to the extents indicated in the journal.
Does pacman trigger the crash always for the same package?
Offline
Nope, it's a different package every time. Always on the installing phase though, and only pacman (and very rarely OMORI, though not sure if they're related since OMORI isn't writing any files when it freezes).
Offline
I booted a Fedora live usb and used btrfs check on the disk. It showed issues with free space being wrong, so I deleted the free space cache v2. I then verified that the errors were gone from btrfs check.
I then mounted the subvolumes and ran a btrfs scrub, finding that a qcow2 VM image was corrupted, which holds no valuable data so I was able to delete it without issues. I rebooted into my system and running a scrub, it did the first ~30% fine and afterward just prints lots of this:
BTRFS warning (device nvme0n1p2): skipping scrub of block group 1073772232704 due to active swapfile
However, it doesn't find any errors. I will try a system upgrade after posting this reply and see how it goes!
EDIT: First system upgrade since last week! Thanks so much for your help. Wifi also works great, thanks to fixing iwd.
Last edited by TheSunCat (2023-09-09 16:51:06)
Offline