You are not logged in.
Update: The latest BIOS update fixed this issue.
Hi everyone.
I've a HP ZBook Power laptop with a Ryzen 7 7840HS CPU and a RTX 4050 Max-Q GPU. It's been running Arch fine with the dGPU disabled in BIOS.
Yesterday, I did a full system upgrade, followed by enabling dGPU & installing the NVIDIA proprietary driver. The dGPU worked fine but then I discovered that a suspend & resume would cause the NVMe drive to be mounted as read-only. The system was unusable after this and the system journal was not written to disk.
I was able to save the output of "journalctl -b" using a USB drive: https://0x0.st/H_wy.txt
Excerpt from around line 4553:
...
Aug 11 10:12:22 zbook.local kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Aug 11 10:12:22 zbook.local kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Aug 11 10:12:22 zbook.local kernel: nvme nvme0: Disabling device after reset failure: -19
Aug 11 10:12:22 zbook.local kernel: nvme0n1: detected capacity change from 2000409264 to 0
Aug 11 10:12:22 zbook.local kernel: EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 1835329 starting block 4375776)
Aug 11 10:12:22 zbook.local kernel: Buffer I/O error on device nvme0n1p2, logical block 3326944
...Things I've tried (in this order):
1. Downgrade kernel version to the previous one (6.4.9-arch1-1 -> 6.4.6.arch1-1)
2. Remove the NVIDIA proprietary driver
3. Add kernel parameter `amd_iommu=fullflush`[1]
4. Disable dGPU in BIOS, the issue is gone
5. Enable dGPU in BIOS again, the issue is back again
6. Switch to linux-lts
7. Switch to linux-mainline
8. Disable dGPU in BIOS and switch back to stable kernel, the issue is gone again
Info of the NVMe drive:
# smartctl -a /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.9-arch1-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZVL41T0HBLB-00BH1
Serial Number: S6B7NJ0W271680
Firmware Version: HPS3NHAV
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Controller ID: 7
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Utilization: 539,495,309,312 [539 GB]
Namespace 1 Formatted LBA Size: 4096
Namespace 1 IEEE EUI-64: 002538 e23142a09a
Local Time is: Fri Aug 11 13:49:28 2023 CST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 83 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 5.94W - - 0 0 0 0 0 0
1 + 5.29W - - 1 1 1 1 0 0
2 + 2.86W - - 2 2 2 2 0 0
3 - 0.0500W - - 3 3 3 3 200 2800
4 - 0.0050W - - 4 4 4 4 4000 19000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 0
1 + 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 33 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 1,383,365 [708 GB]
Data Units Written: 2,426,192 [1.24 TB]
Host Read Commands: 13,970,420
Host Write Commands: 29,602,523
Controller Busy Time: 9
Power Cycles: 548
Power On Hours: 340
Unsafe Shutdowns: 9
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 33 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Extended Completed without error 336 - - - - -
1 Short Completed without error 3 - - - - -
2 Short Completed without error 0 - - - - -I keep the dGPU disabled for now and it's working fine, but is there anything else that I can try? Any help would be appreciated.
--
[1]: Recommended by this Arch wiki article: https://wiki.archlinux.org/title/Solid_ … nd_support
Last edited by maxxie (2023-10-02 19:09:11)
Offline
Sounds firmware/mainboard bug, try getting a UEFI update?
Offline
Aug 11 10:01:39 zbook.local kernel: DMI: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.01.06 05/18/2023If there's no FW update or it doesn't help (try first!)
Aug 11 10:11:16 zbook.local kernel: PM: suspend entry (s2idle)No S3?
Did you try "iommu=soft" and "nvme_core.default_ps_max_latency_us=0"?
Is this on battery or AC? Does it matter?
Online
Thank you for the replies!
Sounds firmware/mainboard bug, try getting a UEFI update?
I've just updated the BIOS to the newest version.
Aug 11 17:47:04 zbook.local kernel: DMI: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.01.10 07/05/2023Unfortunately the issue is still there.
No S3?
It seems so:
$ cat /sys/power/mem_sleep
[s2idle]Did you try "iommu=soft" and "nvme_core.default_ps_max_latency_us=0"?
I just tried them, didn't help.
Is this on battery or AC? Does it matter?
It's the same on battery and AC power.
Last edited by maxxie (2023-08-11 10:24:43)
Offline
Still no S3 after the BIOS update?
Can you switch between internal, hybrid and dedicated GPU in the BIOS?
Does the dedicated GPU alone also cause this?
Online
Still no S3. There's only s2idle even after the BIOS update.
The BIOS only allows me to switch between UMA and hybrid graphics, so I'm not able to verify if the dedicated GPU alone can cause this.
Last edited by maxxie (2023-08-11 14:38:14)
Offline