You are not logged in.
Hi,
I have a weird problem I've never tried before, so just wanted to hear if any has tried it and has any experience with it: So I have a 3.5" SATA-harddisc drive connected via USB3 to my pc, using something that is pretty close to this device: https://www.thermaltakeusa.com/thermalt … ation.html - the volume was formatted inside my Synology NAS, so I have it mounted to an Arch Linux pc using "mount /dev/md3 /mnt/mountpoint" and the filesystem is ext4. Some details (not sure if it means anything, but now you have it):
# mdadm /dev/md3
/dev/md3: 7447.44GiB raid1 1 devices, 0 spares. Use mdadm --detail for more detail.
# mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Mon Jul 4 20:58:07 2016
Raid Level : raid1
Array Size : 7809204544 (7447.44 GiB 7996.63 GB)
Used Dev Size : 7809204544 (7447.44 GiB 7996.63 GB)
Raid Devices : 1
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Fri Dec 11 06:19:58 2020
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Consistency Policy : resync
Name : Syn:3
UUID : 14b58f99:13b5e0fa:4455458a:9041eb50
Events : 1132235
Number Major Minor RaidDevice State
0 8 51 0 active sync /dev/sdd3
I'm transferring a 3 TB file, the problem is that md5sum shows the copied file differs from the source file, e.g. source-file md5sum is 9ac057361d8c725dfef90df4d37b7fb7 and destination is 8b3e2da08e3128e24d068827ef0e7b15. This is really bad, I think I had the problem using rsync earlier but at that time I didn't thought the problem would happen again. Could it be the https://www.thermaltakeusa.com/thermalt … ation.html that just doesn't work? As mentioned, I think the problem happened a few weeks ago also where I used rsync (this time I used cp). I'm currently doing a:
badblocks -b 4096 -v /dev/md3 > badblocks_b4096_v_md3.txt
But other than that I have no idea about the cause of this problem, never tried it before - found a single google post: https://serverfault.com/questions/33094 … -sometimes about a guy having the same problem, but I also don't think that guy found the cause and solution so I hope maybe some of you have good ideas or relevant experience to share? The disk drive is around 8 TB, only 2 years old and haven't beed used much (for offline backups earlier). I now suspect the HDD docking station is the problem and I might want to repeat the experiment later. The obvious thing to try after the "badblocks"-command finish, is to both run memtest for maybe 24 hours and also by turning off the pc and repeat the copy using SATA-cable inside the computer - but the idea with a USB3 docking station is to avoid having to open and close the computer case... Never thought a docking station could be so bad - if that is the problem... Anyone tried something similar?
Last edited by newsboost (2020-12-16 00:07:17)
Offline
root@proxmox:/hugeZFS#
So the target is ext4, but what's the source and is it what that path suggests?
Offline
Is the file size identical?
You could compare files with 'cmp' instead of 'md5sum', that would tell you exactly which byte differs.
Offline
root@proxmox:/hugeZFS#
So the target is ext4, but what's the source and is it what that path suggests?
Right, the source FS is also ext4, the path from where the command was issued is not really relevant, sorry for the confusion. I've removed it from the post to avoid further confusion, but the source is also ext4... But different physical disks.
Is the file size identical?
You could compare files with 'cmp' instead of 'md5sum', that would tell you exactly which byte differs.
Yes, I should've written that the file size appears to be the same. It's an excellent suggestion with "cmp": I assume it works fine with binary files too. The "badblocks" command is running now, I expect it to be running for around 12 hours more. After that, I'll try to see if the "cmp" can perhapes give me the offset and tell "how bad" the copy-operation went. Excellent idea, thanks! I'll update later when I've got some more info...
Offline
UPDATE:
The file sizes are the same. The "badblocks" command showed no errors, it indicated all was fine. After around 10 minutes, the "cmp -l srcfile dstfile" created a 22 MB file, the contents is something like:
7837057025 365 32
7837057026 117 131
7837057027 147 4
7837057028 244 125
7837057029 333 207
7837057030 156 334
7837057031 302 372
7837057032 367 372
7837057033 323 144
7837057034 162 26
7837057035 40 353
...
...
etc etc etc etc - page up and down (don't see any reason to show more of this)
I think I used rsync the first time. I next tried to repeat and this time used "cp". Still, the same problem happened - different md5sum and different "cmp -l"-output. I've never experienced weird things so I don't suspect the RAM (or RAM-test) will show anything. However, I came up with the excellent (if I should say so) idea of running "smartctl -a /dev/sdd" which revealed:
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD80EFZX-68UW8N0
Serial Number: VKJNEDUX
LU WWN Device Id: 5 000cca 254e578c8
Firmware Version: 83.H0A83
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Dec 13 03:49:59 2020 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 101) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1083) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 129 129 054 Pre-fail Offline - 124
3 Spin_Up_Time 0x0007 146 146 024 Pre-fail Always - 453 (Average 450)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 86
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 4124
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 86
22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 841
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 841
194 Temperature_Celsius 0x0002 139 139 000 Old_age Always - 43 (Min/Max 23/51)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 2539564
SMART Error Log Version: 1
ATA Error Count: 65535 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 65535 occurred at disk power-on lifetime: 4118 hours (171 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 a0 a7 70 65 40 Error: ICRC, ABRT 160 sectors at LBA = 0x006570a7 = 6647975
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 03 00 48 70 65 40 00 5d+05:08:38.816 WRITE DMA EXT
35 03 00 48 70 65 40 00 5d+05:08:38.796 WRITE DMA EXT
25 03 08 60 08 10 40 00 5d+05:08:38.796 READ DMA EXT
35 03 00 48 68 65 40 00 5d+05:08:38.792 WRITE DMA EXT
35 03 00 48 68 65 40 00 5d+05:08:38.772 WRITE DMA EXT
Error 65534 occurred at disk power-on lifetime: 4118 hours (171 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 20 27 78 65 40 Error: ICRC, ABRT 32 sectors at LBA = 0x00657827 = 6649895
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 03 00 48 70 65 40 00 5d+05:08:38.800 WRITE DMA EXT
25 03 08 60 08 10 40 00 5d+05:08:38.796 READ DMA EXT
35 03 00 48 68 65 40 00 5d+05:08:38.792 WRITE DMA EXT
35 03 00 48 68 65 40 00 5d+05:08:38.772 WRITE DMA EXT
35 03 00 48 68 65 40 00 5d+05:08:38.747 WRITE DMA EXT
Error 65533 occurred at disk power-on lifetime: 4118 hours (171 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 80 c7 69 65 40 Error: ICRC, ABRT 128 sectors at LBA = 0x006569c7 = 6646215
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 03 00 48 68 65 40 00 5d+05:08:38.773 WRITE DMA EXT
35 03 00 48 68 65 40 00 5d+05:08:38.747 WRITE DMA EXT
25 03 08 58 08 10 40 00 5d+05:08:38.747 READ DMA EXT
35 03 48 00 68 65 40 00 5d+05:08:38.745 WRITE DMA EXT
25 03 08 50 08 10 40 00 5d+05:08:38.745 READ DMA EXT
Error 65532 occurred at disk power-on lifetime: 4118 hours (171 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 b0 97 69 65 40 Error: ICRC, ABRT 176 sectors at LBA = 0x00656997 = 6646167
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 03 00 48 68 65 40 00 5d+05:08:38.748 WRITE DMA EXT
25 03 08 58 08 10 40 00 5d+05:08:38.747 READ DMA EXT
35 03 48 00 68 65 40 00 5d+05:08:38.745 WRITE DMA EXT
25 03 08 50 08 10 40 00 5d+05:08:38.745 READ DMA EXT
35 03 00 00 60 65 40 00 5d+05:08:38.741 WRITE DMA EXT
Error 65531 occurred at disk power-on lifetime: 4118 hours (171 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 40 bf 54 65 40 Error: ICRC, ABRT 64 sectors at LBA = 0x006554bf = 6640831
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 03 00 00 50 65 40 00 5d+05:08:38.715 WRITE DMA EXT
25 03 08 38 08 10 40 00 5d+05:08:38.711 READ DMA EXT
35 03 00 00 48 65 40 00 5d+05:08:38.707 WRITE DMA EXT
25 03 08 30 08 10 40 00 5d+05:08:38.707 READ DMA EXT
35 03 08 f8 44 65 40 00 5d+05:08:38.704 WRITE DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3261 -
# 2 Short offline Completed without error 00% 2538 -
# 3 Short offline Completed without error 00% 1995 -
# 4 Short offline Completed without error 00% 1604 -
# 5 Short offline Completed without error 00% 814 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I think the line "Error 65535 occurred at disk power-on lifetime" tells something is wrong. It's a WD RED harddisk, so it should be able to be turned on 24/7/365. I googled a bit and it seems I've got some of these errors: "const char *icrc = "ICRC"; // INTERFACE CRC ERROR"? From google:
One important error is ICRC – the interface CRC error. This means that there are errors being detected on the IDE/SATA or PCIe bus the hard drive is connected to. Although this is rare and might be caused by the HDD itself, it might mean that your chipset (the hardware controlling e.g. SATA) is damaged – in this case, replacing the hard drive would not fix the issue. Possibly there is also an intermittent cable connection.
The command "smartctl -H /dev/sdd" says: "SMART overall-health self-assessment test result: PASSED". Googling even more led me to https://bbs.archlinux.org/viewtopic.php?id=246686 which says (quoting from the last post/the conclusion):
Amongst all the different things I tried, moving the two HDDs sdc and sdd from USB to SATA fixed the issue.
I guess the USB protocol does not handle well some kind of file transfert (I guess it's very frequent tiny modifications). Even though it's USB3.
Internally I don't have any SATA-cables left but I guess next step is to remove an SSD-disk I have (temporarily) and then repeat everything and see if the problem persists, just to get to the bottom of this... Unbelievable... Incredibly! I'm also using USB3 - but if the USB3 docking station cannot transfer files reliable, that's completely - even unbelievable - stupid!?! I've never heard of such a problem before, but that would mean these harddisk docking station devices that transfer via USB-cable cannot be used for anything serious?!? Amazing, if this is really the problem, how can manufacturers put crap like that (Thermaltake BlacX) on the market (if it really is the problem, which I suspect as the disk haven't been used that much)??? Any of you know anything about this hypothesis, maybe if you can confirm that it's true (or not) or is there some other linux command I could use, to get more insight? Or maybe you understand the smartctl output better than me and can give your thoughts on this? Appreciate if so, thanks!
Last edited by newsboost (2020-12-13 04:24:16)
Offline
Wow, that's a crazy high udma crc error count.
UDMA CRC is a 16 bit hash, so there should only be 1:65536 chance for bad data to go through. For the occasional random blip on the sata line this should be good enough.
Usually this should be good enough. And if the cable is so bad you get tens of thousands of these errors, transfer speed would effectively drop to zero. You should notice.
Most likely the drive aborted these commands and never wrote the data it was supposed to write, so what is returned on read is old data that was there before.
The drive only logs the most recent errors, if you have journalctl syslogs of the time you did the copy operation, you could check those.
Until you determine the cause you can not trust the drive, sata cable, sata port/controller, usb enclosure (if applicable).
Also run a long selftest (smartctl -t long), as well as a memtest on the PC itself.
Good luck,
Offline
The syslog would be interesting in particular since you have that all running on top of a single-disk RAID 1. Normally when a drive fails, it would be kicked from RAID. It could be there is a problem with that when it's the only remaining drive in a RAID. It would be unfortunate if that in turn caused errors to not be reported to the userland copy utility.
I use single disk RAID myself, on my SSD, to allow for on-the-fly mirroring. It hasn't failed me yet, but this setup is after all a bit unusual, so I can't rule it out either. I'll have to do some testing, I guess...
Offline
https://www.amazon.de/Thermaltake-SATA- … geNumber=1
Seems a pattern…
Offline
Wow, that's a crazy high udma crc error count.
UDMA CRC is a 16 bit hash, so there should only be 1:65536 chance for bad data to go through. For the occasional random blip on the sata line this should be good enough.
Usually this should be good enough. And if the cable is so bad you get tens of thousands of these errors, transfer speed would effectively drop to zero. You should notice.
I'm transferring a 3 TB file, so in any case it'll take 1/2 day or several days, the pc is maybe 3-4 years old. It's the first time I've tried something like this, where things apparantly seems ok, but as I was transferring a file that around 2 weeks ago seemed to have become corrupt I was suspicious this time and the md5sum (+cmp command) showed the data being written wasn't really the same as that being read... No obvious errors, during the copy - shit.... What a piece of shit, that USB docking station... OMFG, it should be forbidden to sell such crap on the market where people otherwise think they can trust the equipment... Damn, it'll be thrown directly out with the garbage very soon... But I didn't notice the transfer speed dropping to zero, by the way - I'm not watching the speed from beginning to end, of that 3 TB copy operation... For some reason the speed using both cp and rsync varies from around 5-10 MB/s to around 140-150 MB/s and it varies a lot.
I can add that yesterday I replaced a 120 GB SSD harddisk with this 8 TB disk that previously was connected via the USB docking station - so now it's connected via SATA-cables, internally in the machine (fortunately I cannot use the SSD disk as there aren't more cables, but at least this seems better now). When I yesterday started a new copy of the file (using both cp + rsync) it started out with around 5 MB/sec transfer speed and it stayed like that for maybe at least 10-20 minutes and then I stopped watching and went to bed (maybe it was doing some auto-correction due to the many write errors there must've been, I don't know). When I woke up, it was more around 90-140 MB/s and the speed seemed to stay like that until it finished. I think the 110-130 MB/sec, which it ended up with for a long time is rather "normal" for a mechanical HDD (not SSD unfortunately, because the mechanical drives are much cheaper for the space I need).
By the way, do you know if I can reset that high UDMA crc error value, now where I'll throw out that ridiculous usb docking station?
Most likely the drive aborted these commands and never wrote the data it was supposed to write, so what is returned on read is old data that was there before.
The drive only logs the most recent errors, if you have journalctl syslogs of the time you did the copy operation, you could check those.
Until you determine the cause you can not trust the drive, sata cable, sata port/controller, usb enclosure (if applicable).
Also run a long selftest (smartctl -t long), as well as a memtest on the PC itself.
Good luck,
1) I think it just wrote random data, from time to time - no error messages - no warnings... Holy shit, I cannot believe companies but crap like that on the market. Note to self: NEVER EVER BUY ANYTHING FROM THERMAL-TAKE AGAIN... Even for consumer-products, this is completely unacceptable. I wouldn't even pay a cent for that piece of shit, if I knew how bad it is, when you really need it.
2) About syslog: Sorry, I switched to using the internal SATA cable (actually took it an SSD-drive which used the SATA-cable from the cdrom-drive, which therefore doesn't work right now). After I rebooted, it seems I can only see the last/current messages (also tried to look in /var/log, couldn't see anything useful):
# journalctl --list-boots
0 b14e2c3de3fc44edb66014d2a747e64f Sun 2020-12-13 07:07:48 UTC_Sun 2020-12-13 15:43:00 UTC
# journalctl -b -1
Specifying boot ID or boot offset has no effect, no persistent journal was found.
3) As written, I'm now using a SATA-cable and the harddisk is now inside the pc. I've copied the 3 TB file over again and am in the process of doing a "cmp" on the 2 files. Currently it has verified 44% (1.3 TiB / 2.9 TiB), running at around 150 MiB/sec, 3 hours remaining. No errors this far - with the USB, when I did the same I think I started seeing errors after 20-30 minutes, which is a lot less than 44% of the file-size. So everything looks MUCH better now - fortunately - so the drive itself is ok, I'm pretty sure of (also hasn't been used much). Also haven't noticed any problems with memory or something before - but I might test when I'm done, just to be sure, as suggested. I'll try both the long (smartctl -t long) test, as well as a memtest, but I'm relatively optimistic now...
The syslog would be interesting in particular since you have that all running on top of a single-disk RAID 1. Normally when a drive fails, it would be kicked from RAID. It could be there is a problem with that when it's the only remaining drive in a RAID. It would be unfortunate if that in turn caused errors to not be reported to the userland copy utility.
I use single disk RAID myself, on my SSD, to allow for on-the-fly mirroring. It hasn't failed me yet, but this setup is after all a bit unusual, so I can't rule it out either. I'll have to do some testing, I guess...
I have an old synology disk-station, that's where the disk was originally formatted inside. I then discovered I could/can directly mount it in Linux, which is really convenient. Inside the synology I have a volume consisting of 3 disks, but I don't have any machines with available 3 SATA cables to use, therefore the fourth disk (which I placed in the docking station and caused all of this) is formatted as a single-raid disk. Could you tell a bit about how you use this "single disk RAID" for doing "on-the-fly mirroring"? Sounds interesting...
https://www.amazon.de/Thermaltake-SATA- … geNumber=1
Seems a pattern…
Oh, damn, didn't see that... Thanks a lot - very interesting that I'm not the only one... I'm betting a LOT of people are missing a LOT of data on those morons who made that docking station... And if you just look at the reviews, it has 4.4 out of 5 stars, which doesn't sound bad... There are always people who are unhappy about products, but this time it's really crazy this product even is or has been on the market... I think I'll never ever again buy a product from thermaltake, if I can avoid it, actually I think some part of the 3 TB file is already corruct (happened around 2-3 weeks ago) but it is so big so I wouldn't notice if e.g. only a tiny fraction is corruct. Well, done it done. I'm happy I think I've found a conclusion, cause and explanation and know how to avoid the problem... Tomorrow I'll give a - probably final - update and I think everything will be fine by then (currently 48% of the new rsync/copy-operation has been verified ok). I don't even know if this is only thermaltake-products - or all usb3 harddisk docking stations, that could be malfunctioning but hopefully the drive can last at least 2-3 years more before beginning to fail. Thanks all for now!
Offline
Final update: Sorry, almost forgot this - marking the thread as solved. Completed a new 3 TB transfer and md5sum matches (using SATA-cable). Threw away that crappy usb hd docking station. The long smartctl test completed as below - everything is fine - thanks for all the hints/feedback.
# smartctl -a /dev/sdb3 | grep -C 2 -i 'self-test execution'
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Offline
There are a variety of sources for corruption. This case pretty conclusively shows it's cable or connector. It could just as well be logic board memory, drive cache memory, drive firmware.
rsync --checksum --dry-run can be used to compare. It'll only tell you whether they're the same or different, not which one is correct. You'd need another source of truth. Both ZFS and Btrfs checksum data, and the checksums themselves are checksummed.
Additionally, by default on spinning drives Btrfs uses "dup" (duplicate) profile for metadata, the file system itself. Upon detection of corruption of metadata, it can self-heal, by locating the good copy and overwriting the bad copy with the good.
XFS and ext4 somewhat recently (few years) have added metadata only checksumming, which will detect errors in the metadata but can't self-heal, though their repair utilities are quite good. Both ZFS and Btrfs offer a scrub feature that checks integrity of every block, metadata and data, compared to checksums, and will report mismatches. In such a case you can consider it a reliable indication of corruption.
Memory bitflips can be quite pernicious. We see them in btrfs land frequently. I don't exactly recommend using Btrfs to try and track down memory problems, but it does seem to have that feature as a side effect. As metadata is a small ~4% portion of what's on disk, data making up 96% it is quite a much larger target for any sort of corruption.
Offline