You are not logged in.

#1 2021-07-05 09:25:52

rep_movsd
Member
Registered: 2013-08-24
Posts: 148

30 second delay on boot since installing latest kernel

I was running a compiled 5.10 kernel for long, but nvidia DKMS started failing a few weeks back after updating.
Rebuilding the 5.12 kernel still give the same error:

/var/lib/dkms/nvidia/465.31/build/nvidia/nv-caps.c: In function ‘nv_cap_close_fd’:
/var/lib/dkms/nvidia/465.31/build/nvidia/nv-caps.c:598:5: error: implicit declaration of function ‘sys_close’ [-Werror=implicit-function-declaration]
  598 |     sys_close(fd);
      |     ^~~~~~~~~

BTW This is a separate issue that I haven't been able to resolve - I just cant get nvidia DKMS to work for any kernel version I compile - it used to work fine just a few weeks ago.


So I switched back to the stock Arch kernel and on random reboots the system will stall for 20 to 30 seconds.
I believe I have seen a red error message scroll by very fast when the delay ends, but I can't see that in dmesg or journalctl.

systemd-analyze shows this:

$ systemd-analyze blame
29.943s dev-nvme0n1p3.device
 1.570s systemd-resolved.service
 1.308s systemd-logind.service
  357ms systemd-journal-flush.service
  150ms user@1000.service
  141ms udisks2.service
  134ms upower.service

--- snip ----

Since it is almost exactly 30 seconds, it looks like some kind of timeout thing.

My root file system is a BTRFS one that is striped over partitions in 2 disks:   /dev/nvme0n1p3 and /dev/nvme1n1p4

$ s fdisk -l
Disk /dev/nvme1n1: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 970 EVO Plus 250GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0AD970CF-ECB4-47B5-A1C9-F11A84A57FDB

Device             Start       End   Sectors  Size Type
/dev/nvme1n1p1 302254080 369362943  67108864   32G Linux swap
/dev/nvme1n1p2 134219776 138414079   4194304    2G EFI System
/dev/nvme1n1p3 369362944 488396799 119033856 56.8G Linux filesystem
/dev/nvme1n1p4 138414080 302254079 163840000 78.1G Linux filesystem

Partition table entries are not in disk order.


Disk /dev/nvme0n1: 238.47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: PC300 NVMe SK hynix 256GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A45FE626-3D7A-4D2B-8E26-7C3B86EAC4CF

Device             Start       End   Sectors  Size Type
/dev/nvme0n1p1 298059776 500117503 202057728 96.3G Linux filesystem
/dev/nvme0n1p2      2048 134219775 134217728   64G Linux filesystem
/dev/nvme0n1p3 134219776 298059775 163840000 78.1G Linux filesystem

Partition table entries are not in disk order.

smartctl shows no apparent errors

$ s smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.12-arch1-1] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PC300 NVMe SK hynix 256GB
Serial Number:                      FS6AN51781090AJ05
Firmware Version:                   20004A00
PCI Vendor/Subsystem ID:            0x1c5c
IEEE OUI Identifier:                0xace42e
Controller ID:                      0
NVMe Version:                       1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            ace42e 619001c890
Local Time is:                      Mon Jul  5 14:47:25 2021 IST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     88 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.87W       -        -    0  0  0  0        5       5
 1 +     2.40W       -        -    1  1  1  1       30      30
 2 +     1.90W       -        -    2  2  2  2      100     100
 3 -   0.1000W       -        -    3  3  3  3     1000    1000
 4 -   0.0060W       -        -    3  3  3  3     1000    5000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    2%
Data Units Read:                    10,469,631 [5.36 TB]
Data Units Written:                 25,543,398 [13.0 TB]
Host Read Commands:                 174,338,837
Host Write Commands:                519,710,729
Controller Busy Time:               9,448
Power Cycles:                       6,674
Power On Hours:                     763
Unsafe Shutdowns:                   1,788
Media and Data Integrity Errors:    0
Error Information Log Entries:      913,662
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               45 Celsius

Error Information (NVMe Log 0x01, 16 of 255 entries)
No Errors Logged

The delay occurs randomly, but sometimes it doesn't happen.

How do I figure this out?

Thanks in advance!

Offline

Board footer

Powered by FluxBB