You are not logged in.

#1 2024-03-24 14:21:35

eomanis
Member
Registered: 2013-04-17
Posts: 50

[Solved] Backup of large file system to multiple smaller file systems

I have a large file system that I'd like to periodically and manually backup, but I do not have a target device that is large enough. I do however have a few separate smaller devices, such that 3 or 4 of them would in sum provide enough storage space for all files on the large file system.
Does anyone know an application similar to e.g. rsync that can distribute the files over multiple targets sequentially in a somewhat lazy yet elegant manner, so that if I were to copy everything back together I would get the original file tree of the large file system back?

Goals:

  1. Copies all metadata (permissions, user and group, ACLs, times ...) except possibly access times

  2. Is lazy, i.e. does not move stuff between the target devices unnecessarily. Might start out with a roughly equal relative usage over all targets and "goes from there" so to speak

  3. Keeps single directories on a single target; if this is not possible requires the user to add the affected directory to an allowlist of directories that may be distributed over multiple targets

  4. When a new subdirectory shows up in a distributed directory the new subdirectory is placed on a target device predictably, such as in alphabetical order among its siblings

  5. Be able to do the backup with only a single target accessible at a time, kind of tape-archive-style, where you swap the targets out as you go

  6. Correctly identifies the target file systems by itself, maybe by placing a top-level marker file on them

  7. The resulting backup should be accessible/restorable without special software, and restoring it must work with only sequential access to the targets, i.e. copy everything together one target after another and you are good to go

  8. Can be run on the command line

Out of scope:

  1. Incremental backup / history; mirror of the current file tree only

  2. Operation with the source file system and the targets on different systems (may be a stretch goal, but for starters a dedicated root sshfs hosted on the source may be used that is mounted on the target system, or the other way around)

Does anyone know an application that can do this?
I thought I'd ask before I go and write a rsync frontend myself.

Edit: Maybe I am overthinking stuff here and could cover the use case with some smart use of find and rsync.

Edit 2: Seeing that I have come up with a solution with only rsync and basic bash I'd say the suspicion of overthinking was spot on.
Here is what I intend to use (not tested yet):

#!/bin/bash

set -o nounset
set -o noclobber
set -o errexit
shopt -qs inherit_errexit


# Helper functions
# ---------------------------------------------------------------------

# cloneIfTargetMounted rsyncTarget [rsyncArgs...] rsyncSource \
#   [rsyncSource...]
cloneIfTargetMounted () {
    local target="$1"; shift

    if findmnt "$target" 1> /dev/null; then
        echo " INFO Cloning to \"$target\"..." >&2
        rsync "${rsyncArgs[@]}" "$@" "$target"
    else
        echo " INFO Nothing mounted at \"$target\", skipping" >&2
    fi
}


# The default arguments for rsync
# ---------------------------------------------------------------------

rsyncArgs=()
rsyncArgs+=(--relative)
rsyncArgs+=(--info=progress2)
rsyncArgs+=(--no-inc-recursive)
rsyncArgs+=(--whole-file)
rsyncArgs+=(--secluded-args)
rsyncArgs+=(--open-noatime)
rsyncArgs+=(--numeric-ids)
rsyncArgs+=(--archive)
rsyncArgs+=(--hard-links)
rsyncArgs+=(--acls)
rsyncArgs+=(--xattrs)
rsyncArgs+=(--delete-before)

# Comment these two out for live runs (!)
rsyncArgs+=(--verbose)
rsyncArgs+=(--dry-run)

# Uncomment this one for the first run per target after you have
# changed any of your target-specific rsync calls, to clean up stuff
# that is now being backed up to a different target than before
#rsyncArgs+=(--delete-excluded)


# Distribute the the contents of storage-0 to multiple smaller file
# systems
# ---------------------------------------------------------------------

# Everything on storage-0 except large/subdir
cloneIfTargetMounted /mnt/storage-0-backup-disk-0 \
    --exclude='/large/subdir' \
    /mnt/storage-0/./

# storage-0/large/subdir/[0..9abc..r]*
cloneIfTargetMounted /mnt/storage-0-backup-disk-1 \
    --include='/large/subdir/[^s-z]*' \
    --exclude='/large/subdir/*' \
    /mnt/storage-0/./large/subdir

# storage-0/large/subdir/[stu..z]*
cloneIfTargetMounted /mnt/storage-0-backup-disk-2 \
    --include='/large/subdir/[s-z]*' \
    --exclude='/large/subdir/*' \
    /mnt/storage-0/./large/subdir

Last edited by eomanis (2024-03-25 23:58:07)

Offline

#2 2024-03-24 21:18:13

Awebb
Member
Registered: 2010-05-06
Posts: 6,304

Re: [Solved] Backup of large file system to multiple smaller file systems

rsnapshot

Offline

#3 2024-03-25 23:19:00

eomanis
Member
Registered: 2013-04-17
Posts: 50

Re: [Solved] Backup of large file system to multiple smaller file systems

Awebb wrote:

rsnapshot

I can't quite find out how the distribution-of-large-file-system functionality is solved by rsnapshot.

Anyway, I think I have figured out a reasonably simple rsync-only solution that I'll edit into the initial topic post.

Offline

Board footer

Powered by FluxBB