You are not logged in.
I was running a compiled 5.10 kernel for long, but nvidia DKMS started failing a few weeks back after updating.
Rebuilding the 5.12 kernel still give the same error:
/var/lib/dkms/nvidia/465.31/build/nvidia/nv-caps.c: In function ‘nv_cap_close_fd’:
/var/lib/dkms/nvidia/465.31/build/nvidia/nv-caps.c:598:5: error: implicit declaration of function ‘sys_close’ [-Werror=implicit-function-declaration]
598 | sys_close(fd);
| ^~~~~~~~~BTW This is a separate issue that I haven't been able to resolve - I just cant get nvidia DKMS to work for any kernel version I compile - it used to work fine just a few weeks ago.
So I switched back to the stock Arch kernel and on random reboots the system will stall for 20 to 30 seconds.
I believe I have seen a red error message scroll by very fast when the delay ends, but I can't see that in dmesg or journalctl.
systemd-analyze shows this:
$ systemd-analyze blame
29.943s dev-nvme0n1p3.device
1.570s systemd-resolved.service
1.308s systemd-logind.service
357ms systemd-journal-flush.service
150ms user@1000.service
141ms udisks2.service
134ms upower.service
--- snip ----Since it is almost exactly 30 seconds, it looks like some kind of timeout thing.
My root file system is a BTRFS one that is striped over partitions in 2 disks: /dev/nvme0n1p3 and /dev/nvme1n1p4
$ s fdisk -l
Disk /dev/nvme1n1: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 970 EVO Plus 250GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0AD970CF-ECB4-47B5-A1C9-F11A84A57FDB
Device Start End Sectors Size Type
/dev/nvme1n1p1 302254080 369362943 67108864 32G Linux swap
/dev/nvme1n1p2 134219776 138414079 4194304 2G EFI System
/dev/nvme1n1p3 369362944 488396799 119033856 56.8G Linux filesystem
/dev/nvme1n1p4 138414080 302254079 163840000 78.1G Linux filesystem
Partition table entries are not in disk order.
Disk /dev/nvme0n1: 238.47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: PC300 NVMe SK hynix 256GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A45FE626-3D7A-4D2B-8E26-7C3B86EAC4CF
Device Start End Sectors Size Type
/dev/nvme0n1p1 298059776 500117503 202057728 96.3G Linux filesystem
/dev/nvme0n1p2 2048 134219775 134217728 64G Linux filesystem
/dev/nvme0n1p3 134219776 298059775 163840000 78.1G Linux filesystem
Partition table entries are not in disk order.smartctl shows no apparent errors
$ s smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.12-arch1-1] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: PC300 NVMe SK hynix 256GB
Serial Number: FS6AN51781090AJ05
Firmware Version: 20004A00
PCI Vendor/Subsystem ID: 0x1c5c
IEEE OUI Identifier: 0xace42e
Controller ID: 0
NVMe Version: 1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: ace42e 619001c890
Local Time is: Mon Jul 5 14:47:25 2021 IST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0016): Format Frmw_DL Self_Test
Optional NVM Commands (0x001e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 88 Celsius
Critical Comp. Temp. Threshold: 90 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 5.87W - - 0 0 0 0 5 5
1 + 2.40W - - 1 1 1 1 30 30
2 + 1.90W - - 2 2 2 2 100 100
3 - 0.1000W - - 3 3 3 3 1000 1000
4 - 0.0060W - - 3 3 3 3 1000 5000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 0
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 45 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 10,469,631 [5.36 TB]
Data Units Written: 25,543,398 [13.0 TB]
Host Read Commands: 174,338,837
Host Write Commands: 519,710,729
Controller Busy Time: 9,448
Power Cycles: 6,674
Power On Hours: 763
Unsafe Shutdowns: 1,788
Media and Data Integrity Errors: 0
Error Information Log Entries: 913,662
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 45 Celsius
Error Information (NVMe Log 0x01, 16 of 255 entries)
No Errors LoggedThe delay occurs randomly, but sometimes it doesn't happen.
How do I figure this out?
Thanks in advance!
Offline