The final solution then was to do a lazy unmount of the partition being resized and then do a reboot. The process will take some time (in the range of minutes) and then stop and let the system reboot. I only noticed issues with files I added while resize was running and there it was like 10% of the files are corrupt. Anything that was already on it did not cause any issues yet. I did not run fsck yet, since this will also take forever, but at least you can kill that one
The problem still remains, though and my drive is not fully resized. My next step will be to try to resize it while unmounted and if that also has such speed issues, I will just let it continue online as it should not take more than a few weeks... At least, I am happy that no courrption of my data happened and everything went better than I expected. I love Linux!
]]>first of all: all sizes provided during this article will be to the power of 1024. I will still call them GB, TB, etc. because TiB is stupid
Okay right to the point: I have an LVM logical volume that consisted of 2 * 2.73 TB = 5.46 TB drive space. I also had two "old" 1.82 TB drives (2 * 1.82 TB = 3.64 TB). I also installed those into the logical volume which gave me a nice 9.10 TB storage,
From the old (5.46 TB) logcial volume I create an ext3 parition. After growing my logcial volum with those two other drives, I wanted to grow my partition, too. Since it was mounted (and my kernel supports online resizing), I launched
resize2fs /dev/vgname/lvname
That was 72h ago. If I checked my drive size before it gave me 5.46 TB (expected), if I check it now it gives 5.8 TB. Boy, that does not escalated quickly. Calculating based on the time it already took and the size it still has to take this puts me at an amazing 770 hours or 32 days.
Why does this take this long? I knew it would not be fast, but I thought 2 days max.
Furthermore, though risky, I tried to end the process. It did not react to SIGINT, SIGHUP, SIGTERM, so I finally tried SIGKILL. Not even that could stop it. My research shows that a kernel operation can block SIGKILL so I assume it hangs in there. This means only a reboot would stop it (but also possibly damage my drive).
Note: I cannot backup the data so backup & format is not an option (who hase 4 TB of storage lying around unused anyway?).
Also, every drive is individually encrypted with a keyfile and merged together as an lvm after decryption (if this may hurt the speed).
I am willing to unmount the drive (for a day or so) if this would speed up the process. My most urgent task is to stop the process (as it slows down all affected drives) and to find a solution that is quicker. I also would prefer not loosing the data, as you can imagine.
Finally, I noticed that most of the work goes into CPU usage. One core is always as maximum 100% and the drive is barely touched (check with HDD led & iotop).
]]>