You are not logged in.
Pages: 1
as the hardware forum is for
Problems and questions concerning kernel and hardware support.
I don't see my topic fit there - and hence there're no other sections it would fit better I guess here's the correct one - although I guess this causes it to net get the attention I would like
anyway
https://lim.cryptearth.de/~cryptearth/journal.txt (@seth: yes, I know, it contains my public ipv6 - but that's scattered accross the forum anyway - but feel free to point it out anyway)
today my system ran into an issue - or rather: my nvme ssd did (in the log above the fun starts at 19:35:05 - roughly 2/3 to 3/4 down - but as I got it I included the full journal)
booted the system - all fine
started steam - still fine
started arma3 launcher - still fine
clicked "play" in the launcher to start the game proper - hell broke loose - or rather: it froze over
nothin - no mouse cursor movement, no num-lock state change, no tty change, no ssh in from phone - dead
my machine is a bit more than an arms length away from where i sit - and as i figured "well, crashed" it finally switched to tty and showed this on top:
I/O error, dev nvme0n1, sector 170866816 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2ok - nvme just died - tried to login anyway - which worked - and rebooted instantly
as the system rebooted the uefi POST took quite some time - and then suddenly pxe kicked in and booted from network
still all took quite some time - managed to boot archinstall - the nvme still showed up in lspci but was inaccessible (didn't showed up in lsblk)
shutdown - and thought: "well, that's it"
booted again - again into pxe - back into archinstall - and wow - nvme showed up again - which allowed me to get the journal
to get you the full picture: every once in a while I do get a warning about the nvme exceeding its temp rating at boot - until now I thought it's just some hotspot ... yes, when I installed it i made sure to remove the cover from the heat pad and screw down tightly - so it should work as intented
I ran smartctl - but nothin standing out - about 10% "used" - and as I only have the OS on it (home lives on a zfs pool) there's nothin I would lose (well, to ability to boot into an os as I don't have setup my pool to be able to boot from it) but the drive isn't that old (from oct 2022) and not used that much (only for os and some caches)
with its temp issue: is it just a bad batch and was doomed from manufacturing? was it maybe just some hickup maybe due to overload?
it's not about "throwing it out and replace it" - 512gb is about 60 bucks - and I don't even use it even half to it - 256gb would go for 30 - it's just: "is it toast already?"
a Samsung Electronics Co Ltd NVMe SSD Controller 980 (DRAM-less) should get me further than just about 3 years of just OS - i don't even use it as cache for my zfs pool (it was recommended against as such use case would had shred it within months - for that I use a regular sata ssd - and already the 2nd one - so yes, using flash as zfs cache does shred it fast)
maybe any test I may can provide which could give more insight?
thx anyway
Offline
as the hardware forum is for
maybe any test I may can provide which could give more insight?thx anyway
Give qdiskinfo a go and see what it thinks of the smartctl results.
Offline
Please add the output of smartctl.
Moderator Note
as the hardware forum is for
Problems and questions concerning kernel and hardware support.
I don't see my topic fit there - and hence there're no other sections it would fit better I guess here's the correct one - although I guess this causes it to net get the attention I would like
Your problem could be caused by a kernel change, in my opinion it fits fine in the kernel & hardware board.
Moving to "Kernel & Hardware'.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
screw down tightly
"too"? Make sure the board/card isn't under tension.
That aside and ceterum censo: https://wiki.archlinux.org/title/Solid_ … leshooting
@seth: yes, I know, it contains my public ipv6
Since you're collecting our IPs that's just fair ![]()
Online
Give qdiskinfo a go and see what it thinks of the smartctl results.
qdiskinfo reports all green - overall rating is "good 94%"
Please add the output of smartctl.
here you go
smartctl -x /dev/nvme0
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.18.5-arch1-1] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 980 500GB
Serial Number: S64DNF0T311968E
Firmware Version: 2B4QFXO7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 500.107.862.016 [500 GB]
Unallocated NVM Capacity: 0
Controller ID: 5
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 500.107.862.016 [500 GB]
Namespace 1 Utilization: 240.077.103.104 [240 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 d321b96266
Local Time is: Tue Jan 13 17:14:17 2026 CET
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0055): Comp DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x10): NP_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 5.24W - - 0 0 0 0 0 0
1 + 4.49W - - 1 1 1 1 0 0
2 + 2.19W - - 2 2 2 2 0 500
3 - 0.0500W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 1000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 36 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 6%
Data Units Read: 108.368.519 [55,4 TB]
Data Units Written: 50.134.344 [25,6 TB]
Host Read Commands: 1.476.762.504
Host Write Commands: 1.661.538.581
Controller Busy Time: 2.217
Power Cycles: 942
Power On Hours: 806
Unsafe Shutdowns: 41
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 1334
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 36 Celsius
Temperature Sensor 2: 38 Celsius
Thermal Temp. 2 Transition Count: 79284
Thermal Temp. 2 Total Time: 71912
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Loggedas shown: there's quite a high count of over-temp warnings
I also had a logger running with quite a high polling rate - and it seems to be only very short spikes - often at the start of some access - but sometimes even during a long time i/o - under constant load (copy a few gb) in does reach sometimes in the high 50s/low 60s - but stays there - hence I'm not sure if this spike is even to trust or if it's maybe just a faulty thermal diode p/n-junction
also: when the spike occurs it's always exactly 84°C
Moderator Note
Your problem could be caused by a kernel change, in my opinion it fits fine in the kernel & hardware board.
Moving to "Kernel & Hardware'.
appreciated
screw down tightly
"too"? Make sure the board/card isn't under tension.
board is a msi b550-a pro: https://de.msi.com/Motherboard/B550-A-PRO/support
the nvme is installed in the top-most slot between cpu and gpu
the installation is explained on pages 31/32 of the english manual which I did very varefull paying high attention as this was my first time installing a nvme
it's installed correctly with the little standoff at the correct length location - the "stick" is parallel to the board and there're no additional standoffs under it - as it should be
I also made sure to place the cover correctly and made sure to have the plastic peel cover removed
I only used a regular fitting screwdriver and only torqued "hand tight" - as in "firm but not overstressing" - i not used a powered screwdriver or any means of additional leverage
so, as this was my very first nvme, I guess I did all ok closely following the manual (yes, it's not rocket scince - but it was my first - and as it was just newly bought I paid extra attention)
That aside and ceterum censo: https://wiki.archlinux.org/title/Solid_ … leshooting
I'll give that a shot
@seth: yes, I know, it contains my public ipv6
Since you're collecting our IPs that's just fair
btw - off-topic: even if someone knows my current ipv6 - an incoming request shouldn't be able to "just bypass" the nat-"firewall" (yes, I know: NAT is NOT a firewall) of my router without me explicit open a port-forwarding - or am i wrong on this?
Offline
smart data looks unsuspicious, see whether it's just APST.
There's no NAT w/ IPv6, your router might very much still block inbound traffic, though.
You can throw eg. https://pentest-tools.com/network-vulne … nline-nmap at it and see what's open.
Online
smart data looks unsuspicious, see whether it's just APST.
I'll report back if anything changes
off-top
There's no NAT w/ IPv6, your router might very much still block inbound traffic, though.
You can throw eg. https://pentest-tools.com/network-vulne … nline-nmap at it and see what's open.
ok, didn't know that
anyway - the linked site doesn'T allow IPv6 (claims "invalid ip") - but I did i quick test from my root server: according to lsof my pc only has SSH (tcp/22) and DNS (tcp/5355) open (both ipv4 and ipv6) to public (*:22, *:5355) - all other listen ports are bound to localhost anyway
when I try to connect from my external root I just get a permission denied - unless I set a port-forwarding in my routers interface - then I'm able to connect
so it seems that my router does have a proper ipv6 firewall which blocks incoming traffic unless I set a specific rule - it even can redirect incomming ports (say ssh from public 2022 to intern 22) - but I guess that's down to my specific router rather than standard (fyi: avm 6660 cable, FW 8.21)
i do have the option to set my "modem-router-wifi"-combo into bridge mode and have a sbc with multiple nics - maybe gonna do some experiments when i find time for it
// off-top
Offline
Pages: 1