You are not logged in.

#26 2015-05-20 23:11:17

andrew.boren
Member
Registered: 2015-05-20
Posts: 1

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

Man am I glad I found this. I've spent the last 3 days trying to narrow down this issue. Saw a sale on some Sammy 850 SSD's and decided to grab a couple to put into a 3-disk raid0. Ever since putting them into a raid0, I've been having nothing but issues with finding files that have been zeroed out. Same file size as they should be, but populated with null chars. Thought I had a bad disk that was causing this, and spent 3 days pulling them out of the raid, putting them in different combinations and watching for any dmesg errors. Finally today I started seeing file system errors, which lead me to this page. Luckily the data on these drives is just for compiling android, and is all on github.

This is the only raid0 on this system that is running 4.0.2. I have 3 other SSD's that are just single disks and have no issues. They are pretty heavily used, so I would notice if there was an issue. Looks like a 4.0.4 update dropped today or yesterday, going to give that a try.

Offline

#27 2015-05-21 00:25:44

zeroepoch
Member
From: San Francisco, CA
Registered: 2015-05-21
Posts: 6
Website

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

I believe the problem people are experiencing here is due to a raid0 bug with trim that I discovered on Fedora with 3.19.7 that was backported from 4.0.2.  It still hasn't been fixed in any release.

See, https://bugzilla.kernel.org/show_bug.cgi?id=98501

Offline

#28 2015-05-21 01:55:04

jdbrown
Member
Registered: 2014-01-03
Posts: 73

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

zeroepoch wrote:

I believe the problem people are experiencing here is due to a raid0 bug with trim that I discovered on Fedora with 3.19.7 that was backported from 4.0.2.  It still hasn't been fixed in any release.

See, https://bugzilla.kernel.org/show_bug.cgi?id=98501

Yeah, it seems the md/raid0 patch is the culprit, and it is likely to be fixed soon in future releases.

Offline

#29 2015-05-21 04:44:10

matthew02
Member
Registered: 2012-08-01
Posts: 42

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

Yep, that pretty clearly explains my experience since my last post.
Disabling NCQ obviously didn't help.
Experienced corruption on 3.19.8
Running fstrim burned my house down.

Thanks for sorting that out and posting here.

Offline

#30 2015-05-21 05:30:51

zeroepoch
Member
From: San Francisco, CA
Registered: 2015-05-21
Posts: 6
Website

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

I can pretty much agree with your "burned my house down" statement if you read my first comment in that bug report.  It basically destroyed my primary desktop after the kernel upgrade like I was playing counter strike without a mouse.  My Arch Linux Chromebook was fine because it doesn't use raid.  I also first blamed it on an NCQ TRIM bug since I noticed the changelog for my SSD's firmware upgrade mentioned something related.  Surprisingly my Crucial MX100 w/ MU01 was only recently blacklisted but never had any noticeable problems before then.  I took the chance to upgrade the firmware on everything, BIOS, BD drive, SDDs.  Other than my HDD RAID 5 array I started over trying to purge the problem only to have it come back when I installed Fedora 22.  BTW, a clean install of Windows 8.1 will disable secure erase on "supported" drives due to TCG.  I installed Windows as a quick way to install the BD drive firmware update since the SSDs were already corrupted.  If anyone else runs into this problem you need to do a PSID factory reset using the manufacturer tool (Micron Storage Executive in my case) and a 32 character hex number on the back of the drive.  A "security" feature they call it.  Hopefully they'll get this patch into mainline soon or distros will carry it separately until then.  My solution for now has been to disable triming and keep an older kernel around (at least for Fedora) so I can trim if needed using fstrim.

Offline

#31 2015-05-21 06:35:08

Loong
Member
From: China
Registered: 2012-03-01
Posts: 6

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

I've been using linux-4.0.2 and 4.0.3 for a period of time. How can I know whether there is anything wrong in my filesystem?

Offline

#32 2015-05-21 06:38:01

zeroepoch
Member
From: San Francisco, CA
Registered: 2015-05-21
Posts: 6
Website

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

Loong wrote:

I've been using linux-4.0.2 and 4.0.3 for a period of time. How can I know whether there is anything wrong in my filesystem?

It's really hard to say.  If you haven't enabled the discard option in fstab or run fstrim you probably have no issues.  Otherwise you just have to look for files with all zero bytes.  I had originally hacked together a script (gone now due to the bug ironically) to find these files but eventually found it easier in fedora to run a system-wide rpm verify.  Not sure if Arch Linux has the same thing with pacman.

Offline

#33 2015-05-21 06:44:02

Scimmia
Fellow
Registered: 2012-09-01
Posts: 11,466

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

zeroepoch wrote:

but eventually found it easier in fedora to run a system-wide rpm verify.  Not sure if Arch Linux has the same thing with pacman.

pacman -Qkk is similar

Offline

#34 2015-05-21 09:36:21

zozi56
Member
Registered: 2012-03-10
Posts: 14

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

zeroepoch wrote:

I believe the problem people are experiencing here is due to a raid0 bug with trim that I discovered on Fedora with 3.19.7 that was backported from 4.0.2.  It still hasn't been fixed in any release.

See, https://bugzilla.kernel.org/show_bug.cgi?id=98501

Thank you, this indeed sounds like the problem we are having.

Related LKML threads:
https://lkml.org/lkml/2015/5/20/910
https://lkml.org/lkml/2015/5/21/167

I think Arch users should be warned/informed in a news entry on the front page.

Offline

#35 2015-05-22 01:54:16

Buddlespit
Member
From: Chesapeake, Va.
Registered: 2014-02-07
Posts: 501

Re: [SOLVED] RAID0 data corruption caused by upgrade to linux-4.0.2

Offline

Board footer

Powered by FluxBB