You are not logged in.

#1 2017-12-03 00:44:50

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,132

SDD temperature jumps +43C & system rendered unusable: how to respond?

I just forced shutdown after the machine became essentially unusable. Not a complete freeze, but so busy doing something that it had no time for me. Looking at the log from the previous boot:

Rha 03 00:00:21 MyComp mandb[28980]: 0 man subdirectories contained newer manual pages.
Rha 03 00:00:21 MyComp mandb[28980]: 0 manual pages were added.
Rha 03 00:00:21 MyComp mandb[28980]: 0 stray cats were added.
Rha 03 00:00:21 MyComp mandb[28980]: 0 old database entries were purged.
Rha 03 00:00:21 MyComp systemd[1]: Started Update man-db cache.
Rha 03 00:00:24 MyComp systemd[1]: Started Update locate database.
Rha 03 00:00:43 MyComp systemd[1]: Started pkgfile database update.
Rha 03 00:06:00 MyComp smartd[698]: Device: /dev/nvme0n1, Temperature changed +43 Celsius to 64 Celsius (Min/Max 15/64!)
Rha 03 00:15:55 MyComp kernel: IN=wlan0 OUT= MAC= SRC=fe80:0000:0000:0000:f696:34ff:fedc:aaf5 DST=ff02:0000:0000:0000:0000:0000:0000:00fb LEN=84 TC=0 HOPLIMIT=255 FLOWLBL=207639 PROTO=UDP SPT=5353 DPT=5353 LEN=44 

So I can see that the SDD temperature jumped by 43C, but almost nothing else happens until I force shutdown.

I don't know how to judge timings from Xorg logs, but the old Xorg log ends with

[108199.045] (EE) event16 - (EE) SynPS/2 Synaptics TouchPad: (EE) kernel bug: Touch jump detected and discarded.
See https://wayland.freedesktop.org/libinput/doc/1.9.2/touchpad_jumping_cursor.html for details
[128442.591] (EE) event16 - (EE) SynPS/2 Synaptics TouchPad: (EE) kernel bug: Touch jump detected and discarded.
See https://wayland.freedesktop.org/libinput/doc/1.9.2/touchpad_jumping_cursor.html for details
[128499.138] (EE) event16 - (EE) SynPS/2 Synaptics TouchPad: (EE) kernel bug: Touch jump detected and discarded.
See https://wayland.freedesktop.org/libinput/doc/1.9.2/touchpad_jumping_cursor.html for details
[131071.173] (EE) event16 - (EE) SynPS/2 Synaptics TouchPad: (EE) kernel bug: Touch jump detected and discarded.
See https://wayland.freedesktop.org/libinput/doc/1.9.2/touchpad_jumping_cursor.html for details
[131082.220] (EE) event16 - (EE) SynPS/2 Synaptics TouchPad: (EE) kernel bug: Touch jump detected and discarded.
See https://wayland.freedesktop.org/libinput/doc/1.9.2/touchpad_jumping_cursor.html for details
[131390.351] (EE) event16 - (EE) SynPS/2 Synaptics TouchPad: (EE) kernel bug: Touch jump detected and discarded.
See https://wayland.freedesktop.org/libinput/doc/1.9.2/touchpad_jumping_cursor.html for details
[131819.354] (EE) libinput bug: timer event16 tap: offset negative (-808833)
[131820.656] (EE) libinput bug: timer event16 tap: offset negative (-1157554)
[131824.863] (EE) libinput bug: timer event16 tap: offset negative (-1786908)
[131824.899] (EE) libinput bug: timer event16 tap: offset negative (-1734675)
[131824.937] (EE) libinput bug: timer event16 tap: offset negative (-321906)
[131824.989] (EE) libinput bug: timer event16 tap: offset negative (-308800)
[131826.822] (EE) libinput bug: timer event16 tap: offset negative (-321072)
[131828.145] (EE) libinput bug: timer event16 tap: offset negative (-625640)
[131828.178] (EE) libinput bug: timer event16 tap: offset negative (-609275)
[131955.131] (EE) libinput bug: timer event16 tap: offset negative (-25833)
[132522.727] (II) event5  - (II) Logitech USB-PS/2 Optical Mouse: (II) SYN_DROPPED event - some input events have been lost.
[132523.413] (II) event16 - (II) SynPS/2 Synaptics TouchPad: (II) SYN_DROPPED event - some input events have been lost.

I have no idea whether this is to do with the update of pkgfile's database (which seems to be the last started cron job in the journal), some input bug, firefox or something else.

I wanted to get to a terminal to run top/free/sensors during the chew-up, but, in the end I forced shutdown as the machine was just getting hotter and hotter. (I know it will presumably poweroff in self-defence eventually, but I didn't really want to wait for that to happen.)

I've had the OOM killer kill firefox before, but nothing like this. And I can't think why the SDD would jump in temperature like that. +43??!!

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.13.12-1-ARCH] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPEKKF512G7L
Serial Number:                      BTPY73240SPG512F
Firmware Version:                   121P
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            5cd2e4 1871a15aa8
Local Time is:                      Sun Dec  3 00:43:57 2017 GMT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        5       5
 1 +     4.60W       -        -    1  1  1  1       30      30
 2 +     3.80W       -        -    2  2  2  2       30      30
 3 -   0.0700W       -        -    3  3  3  3    10000     300
 4 -   0.0050W       -        -    4  4  4  4     2000   10000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        22 Celsius
Available Spare:                    97%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    3,578,881 [1.83 TB]
Data Units Written:                 1,826,108 [934 GB]
Host Read Commands:                 145,302,808
Host Write Commands:                17,890,735
Controller Busy Time:               320
Power Cycles:                       246
Power On Hours:                     745
Unsafe Shutdowns:                   61
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    3
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

What should I be thinking?


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

Board footer

Powered by FluxBB