You are not logged in.
Hello everybody!
I have Dell XPS 13 9365 notebook and experience random freezes. The problem persists for some time already, I cannot pinpoint to any specific kernel version, packages update or running specific application. Sometimes they recover by themselves after several minutes and I can continue working, sometimes I have to hard-reboot by holding the power button, because the computer becomes completely unresponsive.
I'm not very knowledgeable about debugging such problems, but I don't usually see any errors in `journalctl` log corresponding exactly to the time of a freeze, however, there are a lot of error messages like these:
Mar 25 19:32:04 notebook kernel: could not locate request for tag 0x0
Mar 25 19:32:04 notebook kernel: nvme nvme0: invalid id 49152 completed on queue 4
Mar 25 19:32:04 notebook kernel: could not locate request for tag 0x666
Mar 25 19:32:04 notebook kernel: nvme nvme0: invalid id 26214 completed on queue 4
...
Mar 25 19:32:08 notebook kernel: nvme nvme0: I/O 320 (Write) QID 4 timeout, aborting
Mar 25 19:32:08 notebook kernel: nvme nvme0: Abort status: 0x0
Mar 25 19:32:38 notebook kernel: nvme nvme0: I/O 320 QID 4 timeout, reset controller
Mar 25 19:32:38 notebook kernel: nvme nvme0: 4/0/0 default/read/poll queues
Mar 25 19:32:38 notebook kernel: nvme nvme0: Ignoring bogus Namespace Identifiers(please find the full log here, freeze happened at around 19:32, this time right when error messages appear).
Before installing Archlinux on this drive, it used to run Windows 10 and I don't remember having this kind of problems on it. However, I didn't use it as actively when Windows was installed.
Here is some additional information that I hope might help, please let me know if there is something else to check.
General information about the drive:
$ sudo nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 2J1720133167 ADATA SX6000PNP 1 512.11 GB / 512.11 GB 512 B + 0 B V9001b31Smart log:
$ sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 34°C (307 Kelvin)
available_spare : 100%
available_spare_threshold : 32%
percentage_used : 0%
endurance group critical warning summary: 0
Data Units Read : 6,882,332 (3.52 TB)
Data Units Written : 7,695,374 (3.94 TB)
host_read_commands : 116,586,424
host_write_commands : 60,750,244
controller_busy_time : 0
power_cycles : 6,822
power_on_hours : 1,596
unsafe_shutdowns : 90
media_errors : 0
num_err_log_entries : 287
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0Error log:
$ sudo nvme error-log /dev/nvme0
Error Log Entries for device:nvme0 entries:8
.................
Entry[ 0]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 1]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 2]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 3]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 4]
.................
error_count : 1219368206019409473
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 5]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 6]
.................
error_count : 1
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 7]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(Successful Completion: The command completed without error)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................Last edited by xaxa1 (2023-03-25 17:32:39)
Offline
Any advice will be greatly appreciated, because it starts to really affect my work.
Offline
Don't bump, https://wiki.archlinux.org/title/Genera … es#Bumping
https://wiki.archlinux.org/title/Solid_ … leshooting
Try "nvme_core.default_ps_max_latency_us=0" and "iommu=soft"
Offline
Try disabling APST: https://wiki.archlinux.org/title/Solid_ … ST_support
Edit: Fuck, why did I not F5 it's not even minutes but a full hour now ![]()
Last edited by V1del (2023-03-30 14:30:01)
Offline
Sorry for bumping.
I just tried it and unfortunately the problem still persists. Namely, I added those two options to my systemd-boot entry:
$ cat /boot/loader/entries/2023-02-23_13-07-30_linux.conf
# Created by: archinstall
# Created on: 2023-02-23_13-07-30
title Arch Linux (linux)
linux /vmlinuz-linux
initrd /intel-ucode.img
initrd /initramfs-linux.img
options root=PARTUUID=5017a3a1-e027-4cb1-a629-ad24680e3c60 rw intel_pstate=no_hwp rootfstype=ext4 nvme_core.default_ps_max_latency_us=0 iommu=softthen rebooted. Here is the full log of my session after the reboot. You can see error messages start around 18:45:13 and then there is ~30 seconds timeframe before controller is reset at 18:45:42. During this time the system was frozen.
...
Mar 30 18:44:31 notebook sudo[2880]: work : TTY=pts/2 ; PWD=/home/work ; USER=root ; COMMAND=/usr/bin/journalctl -f
Mar 30 18:44:31 notebook sudo[2880]: pam_unix(sudo:session): session opened for user root(uid=0) by work(uid=1001)
Mar 30 18:45:12 notebook kernel: nvme nvme0: I/O 243 (Write) QID 3 timeout, aborting
Mar 30 18:45:13 notebook kernel: nvme nvme0: Abort status: 0x0
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x5a4
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 30116 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x5
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 4101 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x291
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 49809 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x879
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 63609 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x3a3
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 54179 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x7ec
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 6124 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x22f
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 559 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x507
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 34055 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x12a
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 4394 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x2d4
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 25300 completed on queue 3
Mar 30 18:45:13 notebook kernel: could not locate request for tag 0x607
Mar 30 18:45:13 notebook kernel: nvme nvme0: invalid id 13831 completed on queue 3
Mar 30 18:45:42 notebook kernel: nvme nvme0: I/O 243 QID 3 timeout, reset controller
Mar 30 18:45:43 notebook kernel: nvme nvme0: 4/0/0 default/read/poll queues
Mar 30 18:45:43 notebook kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
Mar 30 18:46:30 notebook dbus-daemon[466]: [session uid=1001 pid=466] Activating via systemd: service name='org.blueman.Manager' unit='blueman-manager.service' requested by ':1.50' (uid=1001 pid=1254 comm="/usr/bin/python /usr/bin/blueman-applet")
...What I find especially confusing is that there doesn't seem to be anything happening before errors start to appear.
Offline
Mar 30 18:43:31 notebook dbus-daemon[276]: [system] Activating via systemd: service name='org.freedesktop.UPower' unit='upower.service' requested by ':1.34' (uid=1001 pid=1130 comm="/usr/lib/xfce4/panel/wrapper-2.0 /usr/lib/xfce4/pa")
Mar 30 18:43:31 notebook dbus-daemon[276]: [system] Successfully activated service 'org.freedesktop.UPower'If you log into eg. an openbox session and kill some time browsing po… cat videos (or watching porn w/ mpv on the console) - does the nvme still hiccup?
(Alternatively mask upower)
Offline
Thank you for the suggestions. I started by masking upower, unfortunately it doesn't seem to help. Here is a fresh log.
I'll try installing OpenBox, since masking seem to have produced errors.
Also, I've searched the internet and tried another kernel option
pcie_aspm=offwhich seem to disable power saving features -- also to no good result...
Offline
tried another kernel option
Mar 31 10:41:20 notebook kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable itI'd have suggested it otherwise ![]()
upower is masked and xfce4-power-manager crashes because of that but otherwise there's no change.
This is grasping for straws, but https://wiki.archlinux.org/title/Intel_ … Intel_CPUs
Offline
I added
ahci.mobile_lpm_policy=1 kernel parameter and it seems the problem now appears less often. Can't be sure yet, because I use laptop less on the weekend, but so far it only happened once (fresh log). I'll continue testing and let you know.
Thanks a lot!
Offline
Ok, it just froze completely twice in a row
had to reboot using power button.
Here's the first log: https://0x0.st/HHDx.log
Here's the second one: https://0x0.st/HHDk.log
The first time system should have suspended but instead it froze.
The second time I was actually typing new post here on the forum describing the first problem, ha-ha.
If I understand correctly, the problem seems to be related to suspending/sleeping/power management of the disk? If so, would it make sense for me to try:
1. spend as much time as possible with laptop unplugged from AC? Maybe that would change the power management strategy
2. switch to a different DE/go WM-only? Maybe xfce's power management is doing something funny
3. unplug some peripherals (external monitor, my wired keyboard, etc)? They consume some power from the laptop anyway
Could it be that there's some corrupted data on the disk, that is preventing the system from suspending properly?
Last edited by xaxa1 (2023-04-03 11:52:53)
Offline
Could it be that there's some corrupted datasudo LC_ALL=C pacman -Qkk | grep -v ', 0 altered files'But I don't think that'd be the cause.
The nvme at some point stops responding, "ADATA SX6000PNP" shows up quite some times - including https://bbs.archlinux.org/viewtopic.php?id=242516
But that of course won't help you.
Offline
Here's the result of pacman command:
[root@notebook ~]# LC_ALL=C pacman -Qkk | grep -v ', 0 altered files'
warning: code: /usr/lib/code/product.json (Modification time mismatch)
warning: code: /usr/lib/code/product.json (Size mismatch)
warning: code: /usr/lib/code/product.json (MD5 checksum mismatch)
warning: code: /usr/lib/code/product.json (SHA256 checksum mismatch)
backup file: at: /var/spool/atd/.SEQ (Modification time mismatch)
backup file: at: /var/spool/atd/.SEQ (Size mismatch)
backup file: at: /var/spool/atd/.SEQ (MD5 checksum mismatch)
backup file: at: /var/spool/atd/.SEQ (SHA256 checksum mismatch)
code: 1343 total files, 1 altered file
warning: filesystem: /root (Permissions mismatch)
backup file: filesystem: /etc/fstab (Modification time mismatch)
backup file: filesystem: /etc/fstab (Size mismatch)
backup file: filesystem: /etc/fstab (MD5 checksum mismatch)
backup file: filesystem: /etc/fstab (SHA256 checksum mismatch)
backup file: filesystem: /etc/group (Modification time mismatch)
backup file: filesystem: /etc/group (Size mismatch)
backup file: filesystem: /etc/group (MD5 checksum mismatch)
backup file: filesystem: /etc/group (SHA256 checksum mismatch)
backup file: filesystem: /etc/gshadow (Modification time mismatch)
backup file: filesystem: /etc/gshadow (Size mismatch)
backup file: filesystem: /etc/gshadow (MD5 checksum mismatch)
backup file: filesystem: /etc/gshadow (SHA256 checksum mismatch)
backup file: filesystem: /etc/passwd (Modification time mismatch)
backup file: filesystem: /etc/passwd (Size mismatch)
backup file: filesystem: /etc/passwd (MD5 checksum mismatch)
backup file: filesystem: /etc/passwd (SHA256 checksum mismatch)
backup file: filesystem: /etc/resolv.conf (Modification time mismatch)
backup file: filesystem: /etc/resolv.conf (Size mismatch)
backup file: filesystem: /etc/resolv.conf (MD5 checksum mismatch)
backup file: filesystem: /etc/resolv.conf (SHA256 checksum mismatch)
backup file: filesystem: /etc/shadow (Modification time mismatch)
backup file: filesystem: /etc/shadow (Size mismatch)
backup file: filesystem: /etc/shadow (MD5 checksum mismatch)
backup file: filesystem: /etc/shadow (SHA256 checksum mismatch)
backup file: filesystem: /etc/shells (Modification time mismatch)
backup file: filesystem: /etc/shells (Size mismatch)
backup file: filesystem: /etc/shells (MD5 checksum mismatch)
backup file: filesystem: /etc/shells (SHA256 checksum mismatch)
backup file: filesystem: /etc/subgid (Modification time mismatch)
backup file: filesystem: /etc/subgid (Size mismatch)
backup file: filesystem: /etc/subgid (MD5 checksum mismatch)
backup file: filesystem: /etc/subgid (SHA256 checksum mismatch)
backup file: filesystem: /etc/subuid (Modification time mismatch)
backup file: filesystem: /etc/subuid (Size mismatch)
backup file: filesystem: /etc/subuid (MD5 checksum mismatch)
backup file: filesystem: /etc/subuid (SHA256 checksum mismatch)
filesystem: 120 total files, 1 altered file
warning: intel-ucode: /boot/intel-ucode.img (Permissions mismatch)
warning: intel-ucode: /boot/intel-ucode.img (Modification time mismatch)
warning: keybase-bin: /usr/bin/keybase-redirector (Permissions mismatch)
backup file: glibc: /etc/locale.gen (Modification time mismatch)
backup file: glibc: /etc/locale.gen (Size mismatch)
backup file: glibc: /etc/locale.gen (MD5 checksum mismatch)
backup file: glibc: /etc/locale.gen (SHA256 checksum mismatch)
intel-ucode: 7 total files, 1 altered file
keybase-bin: 3173 total files, 1 altered file
warning: libutempter: /usr/lib/utempter/utempter (GID mismatch)
warning: libutempter: /usr/lib/utempter/utempter (Permissions mismatch)
libutempter: 20 total files, 1 altered file
backup file: lightdm: /etc/lightdm/lightdm.conf (Modification time mismatch)
backup file: lightdm: /etc/lightdm/lightdm.conf (Size mismatch)
backup file: lightdm: /etc/lightdm/lightdm.conf (MD5 checksum mismatch)
backup file: lightdm: /etc/lightdm/lightdm.conf (SHA256 checksum mismatch)
backup file: lightdm-gtk-greeter: /etc/lightdm/lightdm-gtk-greeter.conf (Modification time mismatch)
backup file: lightdm-gtk-greeter: /etc/lightdm/lightdm-gtk-greeter.conf (Size mismatch)
backup file: lightdm-gtk-greeter: /etc/lightdm/lightdm-gtk-greeter.conf (MD5 checksum mismatch)
backup file: lightdm-gtk-greeter: /etc/lightdm/lightdm-gtk-greeter.conf (SHA256 checksum mismatch)
backup file: mkinitcpio: /etc/mkinitcpio.conf (Modification time mismatch)
backup file: mkinitcpio: /etc/mkinitcpio.conf (Size mismatch)
backup file: mkinitcpio: /etc/mkinitcpio.conf (MD5 checksum mismatch)
backup file: mkinitcpio: /etc/mkinitcpio.conf (SHA256 checksum mismatch)
backup file: pacman: /etc/pacman.conf (Modification time mismatch)
backup file: pacman: /etc/pacman.conf (Size mismatch)
backup file: pacman: /etc/pacman.conf (MD5 checksum mismatch)
backup file: pacman: /etc/pacman.conf (SHA256 checksum mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (Modification time mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (Size mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (MD5 checksum mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (SHA256 checksum mismatch)
backup file: reflector: /etc/xdg/reflector/reflector.conf (Modification time mismatch)
backup file: reflector: /etc/xdg/reflector/reflector.conf (MD5 checksum mismatch)
backup file: reflector: /etc/xdg/reflector/reflector.conf (SHA256 checksum mismatch)
warning: systemd: /var/log/journal (GID mismatch)
backup file: systemd: /etc/systemd/timesyncd.conf (Modification time mismatch)
backup file: systemd: /etc/systemd/timesyncd.conf (Size mismatch)
backup file: systemd: /etc/systemd/timesyncd.conf (MD5 checksum mismatch)
backup file: systemd: /etc/systemd/timesyncd.conf (SHA256 checksum mismatch)
systemd: 1330 total files, 1 altered fileOffline
The only mildly concerning thing there is
warning: filesystem: /root (Permissions mismatch)but that's for sure also not the cause of your freezes.
Offline
I tried running on battery for sometime and the problem still happens, however when I eventually plugged power cable back in, I noticed the following error messages in logs:
Apr 06 12:25:42 notebook kernel: ACPI Error: Thread 2108751872 cannot release Mutex [PATM] acquired by thread 3233226944 (20221020/exmutex-378)
Apr 06 12:25:42 notebook kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/psparse-529)
Apr 06 12:25:45 notebook kernel: ACPI Error: Thread 2113536000 cannot release Mutex [PATM] acquired by thread 3117433216 (20221020/exmutex-378)
Apr 06 12:25:45 notebook kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/psparse-529)Only had a time for a quick googling so far and didn't find anything definitive. However, posting it here so that maybe someone more experienced will recognize this.
If that matters, here are my current boot loader settings:
title Arch Linux (linux)
linux /vmlinuz-linux
initrd /intel-ucode.img
initrd /initramfs-linux.img
options root=PARTUUID=5017a3a1-e027-4cb1-a629-ad24680e3c60 rw intel_pstate=no_hwp rootfstype=ext4 ahci.mobile_lpm_policy=1Offline
Are the ACPI errors a direct response or do they show up all the time (eg. when booting on external power)?
I'm rather convinced of the nvme being the culprit, though.
You could try to boot the install iso (or some live distro), NOT mount or whatsoever touch the nvme and see whether the system eventually freezes.
There're no firmware updates available for the nvme, are there?
Offline