You are not logged in.

#1 2023-02-11 18:07:18

erylflynn
Member
Registered: 2016-10-14
Posts: 11

[SOLVED] Recurring BTRFS Uncorrectable Errors

In the last 6 months or so my server started having issues with these errors.  Being the drive was a bit older and I was getting to where I needed a larger drive I ordered a new one to replace it.  This did not fix the issue, I ordered a new case to better mount the drive and replaced the mother board as it had a failed IPMI on it, figured it could have more wrong and used a new cable to connect it.  Still get errors and what is odd is, they write to the drive at first fine.  It is later they get corrupted.  I have a desktop and server both running Arch and both have a matched drive running BTRFS which using rsync updates the one on my desktop to the files on the server.  The files copied to my desktop are fine, no errors shown by BTRFS or by viewing the file, while I can see the damage to the file on the server side.

At the moment, deleting, copying from a good copy is only a band aid.  I have something else going wrong and I am at a loss for how to figure that out.  I did wipe it all and reinstalled Arch when I put the new motherboard in, my thought was if the motherboard was bad and causing issues it was best to start a clean slate.

What can I do to at first find the cause?  Small example of a  sudo journalctl --dmesg --grep 'sda1'

Feb 11 09:12:24 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:24 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11181, gen 0
Feb 11 09:12:25 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:25 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11182, gen 0
Feb 11 09:12:26 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:26 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11183, gen 0
Feb 11 09:12:27 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:27 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11184, gen 0
Feb 11 09:12:28 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:28 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11185, gen 0
Feb 11 09:12:29 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:29 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11186, gen 0
Feb 11 09:12:30 HOST kernel: BTRFS warning (device sda1): csum failed root 5 ino 57458 off 142118912 csum 0xbd6525af expected csum 0x36ee345b mirror 1
Feb 11 09:12:30 HOST kernel: BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11187, gen 0

Last edited by erylflynn (2023-02-23 00:24:42)

Offline

#2 2023-02-11 18:12:13

Head_on_a_Stick
Member
From: London
Registered: 2014-02-20
Posts: 7,680
Website

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

Have you tried

# btrfs scrub start /dev/sda1

Then leave it a while to let it finish and check

# btrfs scrub status /dev/sda1

EDIT: https://wiki.archlinux.org/title/Btrfs#Scrub

Last edited by Head_on_a_Stick (2023-02-11 18:13:40)

Offline

#3 2023-02-11 20:57:54

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

Exactly how I know there are errors.  They are uncorrectable, sorry should have mentioned that.  Last time I ran it, cleared the issues was January 23 about.

Scrub device /dev/sda1 (id 1) history
Scrub started:    Fri Feb 10 22:58:00 2023
Status:           finished
Duration:         5:46:12
Total to scrub:   4.73TiB
Rate:             237.33MiB/s
Error summary:    csum=6
  Corrected:      0
  Uncorrectable:  6
  Unverified:     0

Offline

#4 2023-02-11 21:19:45

topcat01
Member
Registered: 2019-09-17
Posts: 124

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

This might be totally random, but I wonder if your power supply has an issue. AIUI you have a new mainboard, drive, and cable?

Offline

#5 2023-02-11 21:47:18

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

Yes, new motherboard, cable and drive.  I could look into a new PSU, running this Cooler Master MPY-7501-AFAAG-US MWE 750 Gold Full Modular, 80+ Gold Certified 750W Power Supply from 2020.  So possible it is bad from just normal power issues we get with weather.  Unless I hear any other ideas I could give that a go.  Issue is frustrating, other than checksum issues I don't see anything in journal at all.

Offline

#6 2023-02-12 00:25:39

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

I ordered a PSU be here monday.  Had to repair my boot partition as well.  Hate firing the parts canon but not sure what else to do.

Offline

#7 2023-02-12 05:34:57

just4arch
Member
Registered: 2023-01-07
Posts: 74

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

erylflynn wrote:

Still get errors and what is odd is, they write to the drive at first fine.  It is later they get corrupted.
...
The files copied to my desktop are fine, no errors shown by BTRFS or by viewing the file, while I can see the damage to the file on the server side.

To be sure, there are no ram ECC errors on the server?
Memtest comes back clean?

Offline

#8 2023-02-12 06:10:13

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

No ECC ram, and not done a memtest, guess that is a good thing to try.

Offline

#9 2023-02-12 06:34:50

topcat01
Member
Registered: 2019-09-17
Posts: 124

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

just4arch wrote:

To be sure, there are no ram ECC errors on the server?
Memtest comes back clean?

Definitely check RAM. I forgot to mention that!

Offline

#10 2023-02-12 18:38:01

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

Well memtest came back with ALOT of red and I mean ALOT.  Errors are over 1600 and still going while on the final test.  PSU likely should be replaced to be safe as well but I just ordered some memory as well.

To add the memory should be here today, so will pop it in and run another memtest to confirm.

Last edited by erylflynn (2023-02-12 18:45:59)

Offline

#11 2023-02-13 00:57:41

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

Likely last post on this for a while, new memory is in and passing tests so will put it back in service, clean things up then see if I get any issues in the next few weeks.

Oh and thanks for the advise on memtest.

Last edited by erylflynn (2023-02-13 00:58:17)

Offline

#12 2023-02-23 00:24:12

erylflynn
Member
Registered: 2016-10-14
Posts: 11

Re: [SOLVED] Recurring BTRFS Uncorrectable Errors

I have now ran for over a week no errors or issues showing.  So as sure as possible the issue was the memory pair, Corsair brand for any curious.  I have other pairs of Corsair with no issues, first bad memory I have had.

Offline

Board footer

Powered by FluxBB