You are not logged in.

#1 2025-06-28 16:16:10

ZeroSkill
Member
Registered: 2025-06-28
Posts: 5

Ideapad Gaming 3 15ACH6 - NVMe drive locks up system

I've been using Arch on this machine for about 10 months now with absolutely no issues.

A few days ago I noticed during regular use (compiling some code for a project of mine) that the whole system unexpectedly froze for a good minute or so. My first instinct was to check dmesg, which showed the following:

4Be06F1.png

and also this, occasionally:

LSiafSa.png

smartctl -a /dev/nvme0

smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.15.3-arch1-1] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Micron MTFDKCD512QFM-1BD1AABLA
Serial Number:                      233442E047A7
Firmware Version:                   1002V3LN
PCI Vendor/Subsystem ID:            0x1344
IEEE OUI Identifier:                0x00a075
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            00a075 0142e047a7
Local Time is:                      Sat Jun 28 18:42:14 2025 +03
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1a):         Cmd_Eff_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     87 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000    3000
 4 -   0.0040W       -        -    3  3  3  3     8000   35000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        46 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    5%
Data Units Read:                    17,953,336 [9.19 TB]
Data Units Written:                 20,528,746 [10.5 TB]
Host Read Commands:                 238,361,808
Host Write Commands:                370,969,172
Controller Busy Time:               6,766
Power Cycles:                       731
Power On Hours:                     3,575
Unsafe Shutdowns:                   118
Media and Data Integrity Errors:    0
Error Information Log Entries:      538
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               46 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Extended          Completed without error                3467            -     -   -   -    -
 1   Short             Completed without error                3467            -     -   -   -    -
 2   Short             Completed without error                3467            -     -   -   -    -
 3   Short             Completed without error                3467            -     -   -   -    -

Immediately I went ahead and made another backup of my data for obvious reasons. What confuses me is that this drive hasn't really had enough use for it to suddenly fail like this.

Since then I've observed whenever the drive is put under a load, even something as simple as dd'ing some zeros into a test file to see if it can handle that, the system locks up after a few seconds and the process has to be terminated because otherwise the system remains unusable. On Windows 11, which I have set up as a dual boot on the drive, putting the drive under a load does not cause it to misbehave like this. There is another SSD (SATA) in this machine, to which I installed Arch yet again in case somehow my main install has something that causes this issue. When booted from that SATA SSD, the test works fine on that drive, yet writing onto the NVMe drive results in the same weird behavior.

To me, given the SMART data, it doesn't really seem like the drive is really *failing* so to speak, as the only "abnormality" I can see are those error log entries. I also ran short and extended self tests as is visible, which also completed just fine.

On the NVMe wiki page I noticed similar errors under the troubleshooting section, which indicated that this is some kind of sleep state error? Even with fix for that (adding the mentioned kernel parameter) the issue remains.

nvme error-log /dev/nvme0

shows either ones looking like this:
GykTedx.png
or (mostly) this:
bgSUcrb.png

At this point I would think it's either a controller issue (which doesn't explain why it works on Windows), or an onboard issue.

I'm a bit stumped at this point. My worry is that it's software-related somehow and that if I shell out to get another drive the same thing would happen.

Offline

#2 2025-06-28 18:29:41

ZeroSkill
Member
Registered: 2025-06-28
Posts: 5

Re: Ideapad Gaming 3 15ACH6 - NVMe drive locks up system

Issue persists with kernel 6.15.4

Offline

#3 2025-06-29 09:00:11

ZeroSkill
Member
Registered: 2025-06-28
Posts: 5

Re: Ideapad Gaming 3 15ACH6 - NVMe drive locks up system

Further testing on an ubuntu 24.02.2 LTS live USB shows the test is working, which makes me strongly believe this is related to a recent update, but I have no idea what it could be other than the kernel itself

Offline

#4 2025-06-29 11:13:40

ZeroSkill
Member
Registered: 2025-06-28
Posts: 5

Re: Ideapad Gaming 3 15ACH6 - NVMe drive locks up system

After even more testing I seem to have narrowed down the issue to a regression in the kernel(?).

From the testing it appears the issue started with kernel 6.15.1-arch1, where the test will lock up the system.
Testing with just one version prior, 6.14.10, the test succeeds and the drive operates normally under load.

It seems like at some point between 6.14.10 and 6.15.1 something was changed in the nvme driver or perhaps someplace else that is now causing trouble. No idea if it's 6.15 or 6.15.1, as from what I can tell the repo only ever had 6.15.1.

Offline

#5 2025-06-29 16:14:09

LuxFerre
Member
Registered: 2010-03-01
Posts: 110

Re: Ideapad Gaming 3 15ACH6 - NVMe drive locks up system

ZeroSkill wrote:

After even more testing I seem to have narrowed down the issue to a regression in the kernel(?).

From the testing it appears the issue started with kernel 6.15.1-arch1, where the test will lock up the system.
Testing with just one version prior, 6.14.10, the test succeeds and the drive operates normally under load.

It seems like at some point between 6.14.10 and 6.15.1 something was changed in the nvme driver or perhaps someplace else that is now causing trouble. No idea if it's 6.15 or 6.15.1, as from what I can tell the repo only ever had 6.15.1.

I would suspect hardware issue first. Had problems with a 2 month old patriot nvme, similar behaviour with slow downs, freezes etc... In my experience hardware issues with SSD drives are not obvious to diagnose. I sent the logs to patriot and they started a RMA, got a new one that fixed it. And it also worked with windows, but sometimes it had blue screens (although it's windows tradition big_smile )....

With newer QLC consumer drives the quality has gone down in my opinion.

However, if it seems stable with kernel 6.14, stick to it for a few weeks (at least a month) and see if it really is stable.

Offline

#6 2025-07-20 13:47:47

ZeroSkill
Member
Registered: 2025-06-28
Posts: 5

Re: Ideapad Gaming 3 15ACH6 - NVMe drive locks up system

LuxFerre wrote:

I would suspect hardware issue first. Had problems with a 2 month old patriot nvme, similar behaviour with slow downs, freezes etc... In my experience hardware issues with SSD drives are not obvious to diagnose. I sent the logs to patriot and they started a RMA, got a new one that fixed it. And it also worked with windows, but sometimes it had blue screens (although it's windows tradition big_smile )....

With newer QLC consumer drives the quality has gone down in my opinion.

However, if it seems stable with kernel 6.14, stick to it for a few weeks (at least a month) and see if it really is stable.

Been running the LTS kernel to play it safe for nearly a month now. Not a single system freeze since.

I seriously doubt this is a hardware issue at this point... I also am not getting any BSODs on Windows when I'm using it.

Offline

Board footer

Powered by FluxBB