You are not logged in.
Hi, I am observing immediate corruption when mounting ext4 loopback images (.img files on disk) on Arch specifically. The issue occurs with both main (6.6.9) and LTS (6.1.70) kernels, but does not occur on my other machines running Gentoo with the same (6.6.9) kernel.
Here's all the investigation I have done, and a minimal reproducer. To ensure this wasn't a hardware issue with my local disk, I also tested creating the .img on an NFS mount, and the issue was still present.
$ wget "https://distfiles.gentoo.org/releases/x86/autobuilds/20240101T170201Z/stage3-i686-openrc-20240101T170201Z.tar.xz"
$ dd status=progress if=/dev/zero of=stage3-i686-openrc-20240101T170201Z.img bs=10M count=200 oflag=dsync
$ mkfs.ext4 stage3-i686-openrc-20240101T170201Z.img
$ mkdir stage3-i686-openrc-20240101T170201Z
$ sudo mount stage3-i686-openrc-20240101T170201Z.img stage3-i686-openrc-20240101T170201Z
$ cd stage3-i686-openrc-20240101T170201Z
$ sudo tar xvJf ../stage3-i686-openrc-20240101T170201Z.tar.xz --numeric-owner --xattrs-include='*.*'At this point you will quickly see tar start to error out with a bunch of errors like "Bad message". The following gets dumped in dmesg:
EXT4-fs warning (device loop1): ext4_dirblock_csum_verify:405: inode #390021: comm tar: No space for directory leaf checksum. Please run e2fsck -D.And if you run e2fsck, it spits out a lot of unrecoverable corruption. This occurs regardless of whether the .img is stored on local disk or on an NFS mount. To make sure it wasn't an xz issue, I also removed the compression on the tarball before unpacking it, but still got the same problem
I repeated these steps on other machines running non-Arch distros with kernel 6.6.9 and did not see the issue occur. What I did next was create the image and untar on one of these machines, then copied the .img file to my Arch machine. e2fsck reported no errors. Then I mounted and unmounted the image, after which running e2fsck indicated that it was corrupted. So simply mounting the image seems to trigger the problem.
At first I thought this might be a case of https://lwn.net/Articles/954285/, but supposedly that issue does not affect kernel 6.6, and I am seeing this on 6.6.
Can somebody try out the steps above and see if they are able to reproduce this issue? Let me know if there are any other troubleshooting steps you would recommend.
Offline
works fine here, system was updated yesterday
Offline
works fine here, system was updated yesterday
Thanks for testing, I guess it might be some sort of memory issue then. I'll try to run a memtest overnight, though it's a laptop with soldered RAM so not much I can do if it turns up issues...
Offline