You are not logged in.

#1 2023-03-06 22:30:12

lv426
Member
Registered: 2022-11-11
Posts: 19

Copying a file to a USB drive shows the wrong file transfer times

When I copy a large file to a USB drive, the transfer times are inaccurate or not even shown at all. I ran multiple tests with 2 file managers and rsync but get similar, unexpected results.

- Copying a file from the PCmanFM file manager to a USB formatted as either ext4, fat32, or NTFS, shows a file transfer window that immediately goes to 100%, but the file is still copying in the background.
- Copying a file from the Thunar file manager to a USB formatted as either ext4, fat32, or NTFS, shows a file transfer window that does not progress, but the file is still copying in the background.
- Using rsync with the flags `-vP` shows 100% progress right away, but rsync is still running in the background. The file eventually copies.
- I have tried multiple USB drives and they all present the same problems.
- My test file was the arch linux ISO

Is this a bug or perhaps I am missing a package? I have both udisks and udiskie installed. I am running kernel 6.2.1-arch1-1.

Offline

#2 2023-03-07 01:16:29

mpan
Member
Registered: 2012-08-01
Posts: 1,211
Website

Re: Copying a file to a USB drive shows the wrong file transfer times

This is not a bug. It’s inherent to how data storage operations are handled. It may be unexpected, if one doesn’t know the underlying mechanics.

Data is read into RAM and then written to underlying storage (your USB drive in this case). Writing does not happen instantly. It only happens after some period or when pages must be evicted from RAM to make space for more pages. This increases performance, aids allocation on the target device and reduces device’s wear.

But this means, the “copy” transfer times are actually the time it takes to read data from the source storage. Situation complicates more, if there is a lot of data to be copied and the target device is slow. First only read happens, then page cache fills up and data is actually written to the destination device, and — finally — not all of it is instantly written. Which means two different speeds are measured at two different stages, and it still misses time needed to write the final portion of data. This final flush is among the reasons for option to “safely remove” or “eject” to exist.

Accounting for this is possible only by interrupting the optimization and forcing writes, so always the write speed is measured. But be aware this does not cause any real improvement for the copy operation itself: it takes no less time.⁽¹⁾ The user is just having a fancy UI element that displays changing numbers. Which still lies, because you will need between a few and a dozen seconds to make the final flush.

If you were using Windows and never noticed a similar situation: Windows is optimized for being appealing. It implements many special cases just to prevent users from developing unwanted thoughts, which could be detrimental to Microsoft. One of them is to periodically flush copied data, if a large file is being copied. This is the behavior of Explorer. Windows also offers a centralized, vast API to increase uniformity. Among the APIs is CopyFileEx, offering the same mechanism, which is why you might’ve not see the situation even outside Explorer.

The situation is diametrically different in Linux ecosystem. On many levels: from philosophy, through goals, to choices made by developers of your file management software.

If you really want to force a similar behavior, there are some options. They do require copying from a command line.

  • Run the copy operation in a cgroup with memory limited to a few dozen MiB.

  • There is a tool, quite old by now, called nocache. Supposedly it applies FADVISE_DONT_NEED to each open stream in the child process, which currently should prevent caching writes.

  • One of a very few valid uses of dd:

    dd if=SRCFILE of=DSTFILE bs=32M oflag=dsync status=progress

    Do not use `conv=sync` or `oflag=direct`, which you can find adviced on the internet. The first one has nothing to do with flushing data and, if dd ever actually uses it, it will cause data corruption. The other’s purpose is to offload CPU, not to make data “be used directly”. It only coincidently skips caches. Even if that happens, only system file caches are skipped and the entire operation may fail altogether, because specific alignment requirements apply for O_DIRECT mode to work.

But be aware of the shortcomings I mentioned earlier. The behavior you observe now is not an oversight and you pay a price for a more accurate measurement.

____
⁽¹⁾ This is ‘≥’, not ‘=’. In practice it takes not the same time, but longer. If the flushing operation is performed too often, the performance penalty becomes huge.

Last edited by mpan (2023-03-07 02:07:53)


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#3 2023-03-07 02:10:19

jonno2002
Member
Registered: 2016-11-21
Posts: 684

Re: Copying a file to a USB drive shows the wrong file transfer times

https://wiki.archlinux.org/title/Sysctl#Virtual_memory

this annoyed me too while copying data to usb sticks so i fixed it with the following values:

vm.dirty_bytes=50331648
vm.dirty_background_bytes=16777216

Offline

Board footer

Powered by FluxBB