You are not logged in.
Hello everyone.
I am puzzled by the randomly freezing and reboot. Sometimes it traps in a loop with the period of music and everything else is freezing.
And my laptop is HP ZHAN 66 Pro A 14 G4 Notebook PC SBKPF.
Arch Linux releases the frequency of my CPU. It is 2.3GHZ in Windows and now is 4.2GHZ in Linux.
The fan is easy to run when I watch videos whether in browsers(firefox, chromium) or VLC.
And in light use, such as browsing the web that is no videos, the fan runs heavily sometimes. I opened the monitor and saw the temperature of the cpu cores is 60 °C+. When the temperature is lower than 60 °C, the fan stop.
And the fan keeps running when the screen is locked. Though I set the Energy Saving 'do nothing'(no sleep , no hibernate...) in System Settings of KDE.
I edited the first message and tried to make it clear. I am sorry that I did not ask smartly as How To Ask Questions The Smart Way
Last edited by zbridge71 (2024-04-19 00:43:39)
Offline
Neofetch isn't really useful. Does the system freeze or reboot and if it freezes what do you do to unfreeze? Hold the power button? Which isn't really good and lose context.Does it maybe freeze on system updates or is it truly random? Can you switch VTs?
The only useful thing here is getting us a journal from the freeze. If you remember a timeframe/a specific boot, get us a journal from that https://wiki.archlinux.org/title/System … ing_output
Otherwise enable REISUB produce the freeze, if you can't switch VTs/SSH in, use the REISUB sequence to safely reboot and then post
sudo journalctl -b
Offline
Thank you for you help.
I use the power button to reboot when it is freezing.
Here is the journal from this reboot. (the last time my machine got reboot from frozen)
http://0x0.st/X-X2.txt
And here is the journal before the freezing.
http://0x0.st/X-Xe.txt
Last edited by zbridge71 (2024-04-11 06:24:50)
Offline
Your UEFI is out of date, start there.
Offline
Thank your for your advice.
I used a bootable usb with Windows to update it. And it never shows update screen when reboot as the instruction.
However, I found the problem also occurs in Windows10. During the piece of time, it reboots more frequently and also freezing.
Offline
I use fwupd to upgrade my device firmware.
fwupdmgr get-history
HP HP ZHAN 66 Pro A 14 G4 Notebook PC
│
└─UEFI dbx:
│ Device ID: 362301da643102b9f38477387e2193e57abaa590
│ Previous version: 220
│ Update State: Success
│ Last modified: 2024-04-11 02:23
│ GUID: 7689caf4-c147-5c67-bff9-5dbe59a441bd
│ Device Flags: • Internal device
│ • Updatable
│ • Supported on remote server
│ • Needs a reboot after installation
│ • Reported to remote server
│ • Device is usable for the duration of the update
│ • Only version upgrades are allowed
│ • Signed Payload
│
└─Secure Boot dbx Configuration Update:
New version: 371
Remote ID: lvfs
Release ID: 35287
Summary: UEFI Secure Boot Forbidden Signature Database
Variant: x64
It seems to update successfully.
Last edited by zbridge71 (2024-04-11 03:09:36)
Offline
3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
Pushing the powerbutton means to lose the journal (tail)
But, generically
Apr 10 22:52:24 ZHAN kernel: smpboot: CPU0: AMD Ryzen 5 5600U with Radeon Graphics (family: 0x19, model: 0x50, stepping: 0x0)
https://wiki.archlinux.org/title/Ryzen#Troubleshooting
Apr 10 22:52:24 ZHAN kernel: nvme0n1: p1 p2 p3
https://wiki.archlinux.org/title/Solid_ … leshooting
Also
And arch Linux releases the frequency of my CPU. It is 2.3GHZ in windows. The fan is easy to run when I watch videos
can you please elaborate on what that means?
CPU steps up and fans run faster when playing videos (in a browser?) in linux?
The browser-"support" for video acceleration on linux is a disgrace.
Especially with chromium(based) browsers you've to jump through a lot of hoops to convince the browser that "yeah, it's ok, you can use the hardware"
https://bbs.archlinux.org/viewtopic.php?id=244031&p=40
Wrt to the topic of this thread: are you suggesting that the system might overheat when it freezes/reboots?
Offline
I am sorry and I edited the topic and the first message.
Apr 10 22:52:24 ZHAN kernel: smpboot: CPU0: AMD Ryzen 5 5600U with Radeon Graphics (family: 0x19, model: 0x50, stepping: 0x0)
And there is no configuration in my BIOS that can adjust the voltage.
Last edited by zbridge71 (2024-04-11 08:55:07)
Offline
It frozen just now. And I switched to tty4 and got some message of nvme.
Please wait a minute. I will choose some message from the picture I took to upload.
Offline
nvme nvme0: controller is down; will reset : CSTS=0xffffffff, PCI_STATUS=0x10
nvme nvme0: Does your device have a faulty power saving mode enabled?
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
nvme0n1: Read(0x2) @ LBA xxxxx. 256 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
I/O error, dev neme0n1, sector xxxx op 0x0:(READ) falgs 0x84700 phys_seg xx prio class 0
...
nvme nvme0: failed to set APST feature (8194)
BTRFS error (device nvme0n1p3): bdev /dev/nvme0n1p3 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
BTRFS error (device nvme0n1p3): bdev /dev/nvme0n1p3 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
BTRFS error (device nvme0n1p3): bdev /dev/nvme0n1p3 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
nvme_log_error: 8 callbacks suppressed
Offline
I hava added amd_iommu=fullflush in
Apr 10 22:52:24 ZHAN kernel: nvme0n1: p1 p2 p3
Offline
Please dom't power-post.
Edit your previous post if nobody has yet replied.
From the output you desperately want "nvme_core.default_ps_max_latency_us=0"
Offline
Thanks for all instructions. It seems that the machine is normally running. And I enabled REISUB , so I can get the logs when the next time it freezes,
Offline
Thanks for all instructions. It seems that the machine is normally running. And I enabled REISUB , so I can get the logs when the next time it freezes,
Hello there, I noticed that you are also using TiPlus 5000
Apr 10 22:55:04 ZHAN systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:amdgpu_bl1...
Apr 10 22:55:04 ZHAN systemd[1]: Found device ZHITAI TiPlus5000 2TB myArch.
And I'm also running this nvme drive on my host machine with a pcie gen-3 mode on a pcie gen-4 m.2 slot. My fault is exactly the same as yours with nvme0 controller resetting resembling follows
nvme nvme0: controller is down; will reset : CSTS=0xffffffff, PCI_STATUS=0x10
nvme nvme0: Does your device have a faulty power saving mode enabled?
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
I'm also experiencing this sort of bugs both on 6.8 stable and 6.6 lts kernels therefore i'm guessing that is some sort of firmware or kernel regression bugs. Maybe we can share some details to further investigate this bug?
BTW, my nvme drive is relatively new because my old TiPlus 5000 is broken because of its controller is dead and this drive is a new disk drive. Furthermore, my system is also new because I recently bought a new laptop.
Last edited by toolmanp (2024-04-14 14:06:16)
Offline
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Though you'll likely get away w/o disabling aspm entirely, https://wiki.archlinux.org/title/Solid_ … ST_support
Offline
I'm also experiencing this sort of bugs both on 6.8 stable and 6.6 lts kernels therefore i'm guessing that is some sort of firmware or kernel regression bugs. Maybe we can share some details to further investigate this bug?
I am sorry that I am a newbie and I do not know how to investigate this bug. The valuable thing I can do is upload some logs. Off course we can share some details by emails, and my time zone is (GMT+8). Also, my friend who uses TiPlus 1TB as a bootable Arch Linux in lenovo laptop runs his system without bugs.
Offline
If toolmanp uses the exact same HW and has issues w/ the exact same nvme and the kernel is issuing the warning that relates to a common issue w/ several nvmes, it's probably a rather good idea to follow the kernel's suggestion or the arch wiki and disable APST…
Offline
If toolmanp uses the exact same HW and has issues w/ the exact same nvme and the kernel is issuing the warning that relates to a common issue w/ several nvmes, it's probably a rather good idea to follow the kernel's suggestion or the arch wiki and disable APST…
Now I turned off the APST completely and gonna daily drive this nvme drive for another couple days. Hopefully it will work fine.
I'm using exactly the same TiPlus 5000 2TB for this build.
Offline
It's happening again. Turning off the APST does not solve this problem but it does help relieving the issue a little bit, prolonging the timespan from one hour to several hours. Yikes :-(
The system completely freezes and i cannot even use journalctl to get the logs... I try to plug in my usb drive to flush the logs but the kernel refuses to even identity the block device. How to rescue?
Offline
Did you also set pcie_aspm=off?
cat /proc/cmdline
The system completely freezes and i cannot even use journalctl to get the logs... I try to plug in my usb drive to flush the logs but the kernel refuses to even identity the block device.
Can you please elaborate on this? If the "system completely freezes" how can you even attempt to "use journalctl to get the logs" and how would you assess that "the kernel refuses to even identity the block device"?
Can you reboot w/ the https://wiki.archlinux.org/title/Keyboa … el_(SysRq) (nb. that you've to explicitly enable that feature first!)?
Offline
Try to enable REISUB.
Otherwise enable REISUB produce the freeze, if you can't switch VTs/SSH in, use the REISUB sequence to safely reboot and then post
Offline
Did you also set pcie_aspm=off?
cat /proc/cmdline
The system completely freezes and i cannot even use journalctl to get the logs... I try to plug in my usb drive to flush the logs but the kernel refuses to even identity the block device.
Can you please elaborate on this? If the "system completely freezes" how can you even attempt to "use journalctl to get the logs" and how would you assess that "the kernel refuses to even identity the block device"?
Can you reboot w/ the https://wiki.archlinux.org/title/Keyboa … el_(SysRq) (nb. that you've to explicitly enable that feature first!)?
I rechecked the cmdline before sysrq rebooting and it seems that PCIE_ASPM = off is set.
I managed to catch a picture of /proc/kmsg since i cannot even execute the dmesg any more after the nvme crashes.
Journalctl also refuses to work since it always tries to read from the rootfs which is dead because of the nvme failure. It seems bizarre to me that i can launch the terminal emulator correctly and my window manager is also functioning, i can switch shell from fish to bash and execute some commands like ls but some other commands simply yields an I/O error. The keyboard and mouse is also working fine but only the drive is breaking.
Journalctl after rebooting with sysrq seems only able to flush the log before the controller reset.
Sorry my English kinda sucks so i might not convey the meaning very correctly. Maybe I should always connect a external usb drive and tries to dump the log there if the controller is down again.
Offline
Try "iommu=soft"
https://wiki.archlinux.org/title/Solid_ … nd_support
You English is fine, just lacked some context.
Maybe look up "complete"
Since "only" access the root device is affected, everything that's already in memory and at least doesn't attempt any bus IO will continue to function.
Offline
My machine froze again. I used the REISUB sequence to safely reboot.
This is the log from journalctl -b
http://0x0.st/Xo-r.txt/
Offline
journalctl -b-1 would be more interesting after the reboot (the log of the boot prior to your current (working) one) Though these failing coredumps and seemingly a wrong coredump pattern might also be worth looking into. But if you have I/O issues with the root drive, chances are the "safe" flush of information also gets lost...
Offline