You are not logged in.

#1 2016-04-14 19:20:23

westpol
Member
Registered: 2016-04-14
Posts: 3

NVMe device suddenly unavailable

Hi,

i bought a Samsung 950 Pro SSH (500GB) ~3 months ago.

06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01)

It worked without any problems until yesterday but now the device randomly becomes unavailable while the system is running. This leads to the root-fs being unavailable until the system is rebooted.
The error occurs randomly and it feels like the device is just removed form the system.
The Kernel prints following error messages

<4>[18393.590087] nvme 0000:06:00.0: Failed status: ffffffff, reset controller
<6>[18393.590621] nvme 0000:06:00.0: enabling device (0000 -> 0002)
<4>[18393.590881] nvme 0000:06:00.0: Removing after probe failure status: -19
<6>[18393.590910] nvme0n1: detected capacity change from 512110190592 to 0
<3>[18393.590935] blk_update_request: I/O error, dev nvme0n1, sector 497517384
<3>[18393.590954] blk_update_request: I/O error, dev nvme0n1, sector 497517368
<3>[18393.590972] blk_update_request: I/O error, dev nvme0n1, sector 497517352
<3>[18393.590983] blk_update_request: I/O error, dev nvme0n1, sector 497517344
<3>[18393.590992] blk_update_request: I/O error, dev nvme0n1, sector 497517336
<3>[18393.591000] blk_update_request: I/O error, dev nvme0n1, sector 497517328
<3>[18393.591008] blk_update_request: I/O error, dev nvme0n1, sector 497517320
<3>[18393.591015] blk_update_request: I/O error, dev nvme0n1, sector 376050872
...

I have no idea what to make of this and google was not very helpful here. I appreciate any help and hope i do not need to send the SSD back.

Cheers
westpol

Last edited by westpol (2016-04-14 19:22:30)

Offline

#2 2016-04-14 19:22:01

Docbroke
Member
From: India
Registered: 2015-06-13
Posts: 1,433

Re: NVMe device suddenly unavailable

Just check physical connection of your drive, before thinking anything else.

Offline

#3 2016-04-14 19:32:00

westpol
Member
Registered: 2016-04-14
Posts: 3

Re: NVMe device suddenly unavailable

Already did,

the pins are in perfect shape and screw should prevent any movement

Offline

#4 2016-04-14 19:34:03

Docbroke
Member
From: India
Registered: 2015-06-13
Posts: 1,433

Re: NVMe device suddenly unavailable

Errors you are getting suggest physical disconnection, SSD failure in 3 months is very unlikely.

Offline

#5 2016-04-14 19:58:14

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: NVMe device suddenly unavailable

I'm guessing it's M.2 form factor, in which case I would have to agree with OP that random physical disconnection is unlikely.

If the problems didn't follow an update (specifically kernel) then I would be inclined to fault the hardware. I'd suggest booting to Live USB and running smartctl to see if there are any errors logged (assuming NVMe devices support SMART).

If I were you, I'd also install Windows (to some other disk device) so you can use Drive Magician software. Samsung will probably expect it in any case if you need to RMA.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#6 2016-04-19 12:55:57

westpol
Member
Registered: 2016-04-14
Posts: 3

Re: NVMe device suddenly unavailable

thanks so far,

i will check the health values of the device  with windows and Samsung's tool later this week.

I fear that debugging this problem will be a lot of pain. Now the system is up for 4 days without any problems.
The processor is overclocked but if that fact was a problem other issues would have occurred earlier.

I'll add more information as soon i install windows on one of my drives.

Offline

#7 2016-06-19 15:53:18

alcool
Member
Registered: 2016-06-19
Posts: 2

Re: NVMe device suddenly unavailable

Hi,

I experience the exact same issue reported by the OP, where the nvme drive suddenly disappears after a random amount of time (dmesg logs http://pastebin.com/Ad56v1HN).
I have the same Samsung 950 PRO 512 GB mounted on a laptop Dell XPS 9550 (bios 1.20).

I'm currently running kernel 4.6.2 but I had a similar behaviors with 4.4.0.
I have a dual boot with Windows and also there I experience similar issues, even though the proprietary software Samsung Magician reports the drive as 'healthy'.

I thought the issue could be related to a faulty motherboard (waiting for Dell Support Service), but after finding this post it might be caused by the drive itself.
Unfortunately, I don't have another nvme drive to test, just the original mSata that works flawlessly.

I have a question for the OP, did you experienced similar issues in Windows? Were you able to get rid of the issue?

cheers,
alcool

$ sudo lspci -vv
[..]
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at de410000 (64-bit, non-prefetchable) [size=16K]
	Region 2: I/O ports at c000 [size=256]
	Expansion ROM at de400000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <4us, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [158 v1] Power Budgeting <?>
	Capabilities: [168 v1] #19
	Capabilities: [188 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [190 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
	Kernel driver in use: nvme
	Kernel modules: nvme

Offline

#8 2016-07-07 21:04:48

moxmi
Member
Registered: 2016-07-07
Posts: 2

Re: NVMe device suddenly unavailable

I am getting the same error on Fedora kernel 4.6.3 - I also have an XPS 15 9550 bios 1.2.
For the Samsung pro, there is a firmware update - I am running the Samsung P951 and I am starting to get quite annoyed

Offline

#9 2016-07-07 22:16:18

moxmi
Member
Registered: 2016-07-07
Posts: 2

Re: NVMe device suddenly unavailable

How did you guys resolve this? Did you buy replacement drives?

Offline

#10 2016-07-13 08:30:09

alcool
Member
Registered: 2016-06-19
Posts: 2

Re: NVMe device suddenly unavailable

After dealing with Dell support I decided to sell the laptop.
They couldn't help me debugging the problem because unfortunately the computer originally came with a SATA SSD installed in the M.2 slot, instead of an NVMe SSD.
I basically didn't want to have the doubt of owning a motherboard with faulty NVMe subsystem.

In the meanwhile, I contacted Samsung support and they accepted to test my SSD under warranty without hesitating: I got it back today and apparently they updated the firmware and ensured the drive is working properly.
This makes me think that there could be some know issues they are currently fixing.
However, I do not currently have any computer to test it lol

I will soon buy another Dell XPS 9550 (with NVMe drive) and test it, I will post here my conclusions.

Offline

#11 2017-02-14 19:52:20

vtyulb
Member
Registered: 2013-08-18
Posts: 16

Re: NVMe device suddenly unavailable

Same problem here. 4.9.8-1 kernel, pm961 NVMe drive, thinkpad p50. I hope this is kernel problem, not hardware one. At least when "disconnect" happens, system breaks too. It can't run some programs that are in cache ( df, for example leads to segfault). But yakuake was able to create new terminal and run [ls,cd]. However, I can't provide any logs, because journalctl was the first thing to break.

My laptop has hdd led (it reacts on ssd too). When break occurs it glows, meaning that something is happening with ssd. Reboot helps, but for the last month this situation happend twice already.

Offline

Board footer

Powered by FluxBB