You are not logged in.

#1 2024-02-21 02:39:36

whittlers
Member
Registered: 2024-02-19
Posts: 58

(how to?) track differences using git, but not using diffs

i want to keep track of my personal files, for if they change or are deleted, but i cannot use git, since it tracks the contents of it and takes exceedingly long (and would make the .git file bloat to infinity)

i just want to track if files:
- changed
- were deleted (or moved)

with a simple mechanism like rsync does to check if a file has changed

the only example i know that does this with something that is a no-code project, is 'pass', with the .password-store folder

how much of a bad idea is this? is this doable in some (other) way?


sorry for bad english

Offline

#2 2024-02-21 08:56:35

seth
Member
Registered: 2012-09-03
Posts: 59,373

Re: (how to?) track differences using git, but not using diffs

What do you mean by "track"?
When? Where? How?
You could maintain a database of paths along a checksum (sha1 or md5 is probably gonna do for performance) or file size and timestamps (though those can be manipulated, ie. if I change some bytes and then reset the timestamps, you wouldn't notice)

People use github to manage their dotfiles and rsync et al. for backups - but I've no idea what your plan is here.

Online

#3 2024-02-21 22:37:30

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

i said what my plan is: to know if files changed or were deleted. you may not want to do that yourself, but i want to. it helps when i... change files or delete them. i guess i forgot to specify what "personal files" means, i mean pictures, videos, rtfs or whatever other binaries, and a full scale storage of them

the idea of saving the paths with checksum sounds good, but i wonder if that can difficult the difference tracking


sorry for bad english

Offline

#4 2024-02-21 22:51:14

seth
Member
Registered: 2012-09-03
Posts: 59,373

Re: (how to?) track differences using git, but not using diffs

i said what my plan is

No, you didn't.
Your subject is "track differences using git, but not using diffs" and immediately contradicted by the very first sentence "i want to keep track of my personal files, for if they change or are deleted, but i cannot use git"
Also git doesn't "track" anything, it's a version control system, where you have to actively commit changes (with a comment) and that allows for infinite undo.

to know if files changed or were deleted.

*when*?
When you boot? When it happens? When you check explicitly?

it helps when i... change files or delete them

Helps how?
Do you want to log that or when you deleted a file?
Do you want to track integrity?
And then what?
Is this meant to support you with backups/restorage?

https://en.wikipedia.org/wiki/XY_problem


the idea of saving the paths with checksum sounds good, but i wonder if that can difficult the difference tracking

What?
Calculating a checksum isn't difficult, but can consume time - if you meant "slow down".

Online

#5 2024-02-21 23:03:28

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: (how to?) track differences using git, but not using diffs

If you want to track in real time, then you may want an inotify setup that logs any changes.  But if you want to take controlled "snapshots", a list of checksums would be better.


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#6 2024-02-21 23:30:46

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

seth, the only thing i want is to use something similar to git, to
- CONTROL the versions of
- EVERY file,
- at ANY time (periodicly),
- without storing the contents of each file in each version (which is what git does(?) (i don't really know))

with "difficult" i mean: if i'm keeping some kind of list of files with their paths, say

find . -type f -exec sh -c 'echo -e "$(cksum "{}" | cut -d " " -f 1 | cut -b1-6)\t$(du -h "{}")"' \;

or

tree --du --dirsfirst -ahF --timefmt "%D" --sort mtime

whatever is better to be able to TRACK changes, running a diff over it, or something; i'm not creative to know what's the solution, that's why it's not easy lol

i want to see the differences clearly, like "oh, i moved this to this folder, that's why i didn't find it, my brain is rotting!"

Last edited by whittlers (2024-02-21 23:35:16)


sorry for bad english

Offline

#7 2024-02-21 23:40:17

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: (how to?) track differences using git, but not using diffs

whittlers wrote:

the only thing you need to know is ... (i don't really know))

That about sums it up doesn't it.  You are most definitely wrong that what you say is all we need to know.

Of course, if you don't want help, continue to keep your feet firmly planted.  We don't need to know anything at all to not help you.  But if you want productive help, we need to know what the goal is.

whittlers wrote:

- CONTROL the versions of
- EVERY file,
- at ANY time (periodicly),

What do you mean the "version" of files, and more importantly, what do you mean by "CONTROL"?

As for your example commands, they are definitely not a great approach.

When you say you want to have this "CONTROL" periodically, what does that mean?  What are you going to do, and / or what do you expect to happen in each period?  Say this is once a week on Sundays, what should happen when Sunday comes?  Do you want a list of files that have changed since the previous Sunday?  Do you want that list to include when they were changed (e.g., date/time of the change)?  Then what is to be done with this record of changes?  Do you want to be able to revert or undo some changes?  If so, would it matter if a file changed many times in the same week?  Should each change be logged?  Or is it still just one entry for the fact that on Sunday it ended up different than it was on the previous Sunday?

This is definitely an X-Y problem, and you keep using jargon words that you think represent your goal, but they don't, because - as you yourself admit - you don't understand the things to which those jargon words refer.  So now you have to make a choice: do you want to sound like you know what you are talking about and be wrong and not get help, or do you want to help us understand what you are talking about and get the assistance you seek?

Last edited by Trilby (2024-02-21 23:43:10)


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#8 2024-02-22 00:11:06

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

i'm sorry for my bad explanation and attitude. my only intention is to ask for a way to do something, and mentioning git is the only way i could explain, never intended to claim i know anything at all, and the main purpose is to get propositions on how to achieve my goal

When you say you want to have this "CONTROL" periodically, what does that mean?  What are you going to do, and / or what do you expect to happen in each period?  Say this is once a week on Sundays, what should happen when Sunday comes?  Do you want a list of files that have changed since the previous Sunday?  Do you want that list to include when they were changed (e.g., date/time of the change)?  Then what is to be done with this record of changes?  Do you want to be able to revert or undo some changes?  If so, would it matter if a file changed many times in the same week?  Should each change be logged?  Or is it still just one entry for the fact that on Sunday it ended up different than it was on the previous Sunday?

ideally, in real time, although it's not necesary because i personally will not need to check real time changes. i will check in short periods, like days or weeks when not doing any significant changes

This is definitely an X-Y problem, and you keep using jargon words that you think represent your goal, but they don't, because - as you yourself admit - you don't understand the things to which those jargon words refer.  So now you have to make a choice: do you want to sound like you know what you are talking about and be wrong and not get help, or do you want to help us understand what you are talking about and get the assistance you seek?

sorry for not using the proper words, english is not my main language, so it will not be precise lol

i have absolutely no idea of what git does internally, but i mention it because it achieves the same thing that i want, but has a little limitation that makes it unusable. so i don't know if git is the solution or not. git does a lot of things that are not intended for that, like branching and merging, patching, tags... i don't know. i'm not intending to use git in the way it was supposed to be used, but to use it like someone that uses github to store their keepass database file, for example; not the way github is intended to be used, but it accomplishes something

the only thing my ideal program must do, is knowing when a file was changed. i don't know what's the better way to approach this. i would prefer to keep a log of every file and their date, filesize, checksum or whatever, to be able to compare, than to monitor in real time if something is changing (however that would work), and finally to be able to check differences between versions of this paths log, for example from 3 days ago to now, or 1 year ago to now. mainly to the current version, not checking differences between old ones

Last edited by whittlers (2024-02-22 00:15:05)


sorry for bad english

Offline

#9 2024-02-22 00:59:09

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

current approach:

1. generate the "path list":

#!/bin/bash
find .. -type f -exec sh -c 'echo -e "$(stat -c "%y" "{}" | cut -d " " -f 1) $(stat -c "%y" "{}" | cut -d " " -f 2 | cut -d "." -f 1)\t$(stat -c "%w" "{}" | cut -d " " -f 1) $(stat -c "%w" "{}" | cut -d " " -f 2 | cut -d "." -f 1)\t$(cksum "{}" | cut -d " " -f 1 | cut -b1-8)\t$(du -h "{}")"' \; > "paths$(date +"%Y-%m-%d %H:%M:%S")"

2. and when you have two of those, use `diff old new`:

3052c3052
< 2024-01-21 06:09:03   2024-01-31 19:50:23     12548041        36K     ../Pics/GEVSNQLawAAO3pE.jpg
---
> 2024-02-22 01:50:20   2024-01-31 19:50:23     12548041        36K     ../Pics/GEVSNQLawAAO3pE.jpg

you can see the only modification date changed because i just touch'd it

stay tuned for updates, hit like comment, and subscribe for more

any suggestion is welcome


sorry for bad english

Offline

#10 2024-02-22 03:17:24

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: (how to?) track differences using git, but not using diffs

That is a ridiculously convoluted script.  There is absolutely no reason to call stat 4 times, nor any reason for dozens of pipes through `cut` and others.  But aside from simplifying that, you really should just set up `inotifywait` to maintain a real-time log of relevant changes.

Here's a starting point:

#!/bin/sh

inotifywait -mr \
	--format "%T %f %e" \
	--timefmt "%Y.%m.%d-%H%M" \
	-e create,delete,modify,move_self \
	.

Be sure to read the inotifywait man page and adjust the format, timefmt, and events to watch as needed.

Last edited by Trilby (2024-02-22 13:46:12)


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#11 2024-02-22 08:32:30

seth
Member
Registered: 2012-09-03
Posts: 59,373

Re: (how to?) track differences using git, but not using diffs

Also inotify has limits on the amounts of trackable inodes, you can't monitor your entire porn database with that (at least I can't…)

The git-without-patch-history example would mean you're keeping a journal about your data changes and
1. git is utterly unsuited for that (handling binary "opaque" data is it's main design "flaw" because it was absolutely not designed for that)
2. it requires you to explicitly keep the journal ("git mv" instead of "mv" and "git commit" to actually log the changes)
3. from your example it does't look like you want to annotate changes either.

the only thing my ideal program must do, is knowing when a file was changed.

Does that mean "read the date of the last change" or "be notified to act whenever a file changes"?

[rather] than to monitor in real time if something is changing (however that would work)

inotify or audit, but apparently that's not it.
The last modification time is part of every file timestamp on any sane filesystem anyway?

i would prefer to keep a log of every file and their date, filesize, checksum or whatever, to be able to compare

If the actual purpose is integrity checks, you definitively want to keep checksums - date and filesize are not suited for this (the file can change w/o those changing and the date can change without any change to the file)
Storing paths along checksums will also allow you to track an unmodified file through the filesystem (if it got moved/renamed the checksum will stay the same)

And finally and wrt

i want to see the differences clearly, like "oh, i moved this to this folder, that's why i didn't find it, my brain is rotting!"

You're not just looking for mlocate, are you?

Online

#12 2024-02-22 09:07:49

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

seth wrote:

If the actual purpose is integrity checks, you definitively want to keep checksums - date and filesize are not suited for this (the file can change w/o those changing and the date can change without any change to the file)
Storing paths along checksums will also allow you to track an unmodified file through the filesystem (if it got moved/renamed the checksum will stay the same)

are you saying rsync's way of doing it is bad? should i be using -c in rsync from now on? (not rethorical)

Does that mean "read the date of the last change" or "be notified to act whenever a file changes"?

the first, but i do *not* want to track WHEN a file was "changed", if it was moved (which the program would unavoidably interpret as deleted, and then created in other place), for example

You're not just looking for mlocate, are you?

no

Last edited by whittlers (2024-02-22 09:09:25)


sorry for bad english

Offline

#13 2024-02-22 09:45:49

seth
Member
Registered: 2012-09-03
Posts: 59,373

Re: (how to?) track differences using git, but not using diffs

rsync is whatever you feel it necessary to do - there're shallow and thorough ways to compare files with the performance being a trade-off.
99.8% of all times (number straight out of a place between my head and my legs) comparing size and timestamps will do the job, but it cannot detect storage corruption or malicious manipulation.

the first, but i do *not* want to track WHEN a file was "changed" if it was moved

I'll just assume the comma was misplaced - moving a file will not alter its modification timestamp.

cd /tmp
mkdir foo bar
touch foo/snafu
stat -c '%y' foo/snafu
sleep 60
touch foo/snafu bar
stat -c '%y' bar/snafu

Online

#14 2024-02-22 10:23:33

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

your provided code would give an error `touch: cannot touch 'foo/snafu': No such file or directory`.

as far as i know, a 'mv' does indeed change the modification date. but it's beside the point anyway. as you say, tracking by modification dates can lead to missing something, like if an external program moves it or whatever...

saving the full paths is the best approach for now

/tmp/t/jwbkcokgwp>stat a
Change: 2024-02-22 11:18:27.940006567 +0100
/tmp/t/jwbkcokgwp>mkdir jeje
/tmp/t/jwbkcokgwp>mv a jeje
/tmp/t/jwbkcokgwp>stat jeje/a
Change: 2024-02-22 11:18:42.659779047 +0100

(removed the non relevant parts of 'stat')

Last edited by whittlers (2024-02-22 10:24:12)


sorry for bad english

Offline

#15 2024-02-22 10:33:18

seth
Member
Registered: 2012-09-03
Posts: 59,373

Re: (how to?) track differences using git, but not using diffs

Did you skip the mkdir?
https://man.archlinux.org/man/core/core … tat.1.en#c
There're *four* timestamps, some will change, others won't.
You're looking for "Modify"
And on what actual OS/FS is that the output of "stat"?

At least not while you stay on the same FS.

tracking by modification dates can lead to missing something, like if an external program moves it or whatever...

No, not "like if an external program moves it"

Online

#16 2024-02-22 11:20:46

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

i mixed it. what changes is the "change" date, not "modify" (it actually says it in the code i sent)

Did you skip the mkdir?

what?

And on what actual OS/FS is that the output of "stat"?

i said at the end why, but yeah i could have used the -c

/tmp/t/drvamcqufb>ls --color=auto
b  e/
/tmp/t/drvamcqufb>stat -c "+%z" b
+2024-02-22 12:17:49.148017433 +0100
/tmp/t/drvamcqufb>mv b e
/tmp/t/drvamcqufb>stat -c "+%z" e/b
+2024-02-22 12:18:04.273543564 +0100
/tmp/t/drvamcqufb>

No, not "like if an external program moves it"

'No' what? i say it because you said "but it cannot detect storage corruption or malicious manipulation."


sorry for bad english

Offline

#17 2024-02-22 13:50:49

seth
Member
Registered: 2012-09-03
Posts: 59,373

Re: (how to?) track differences using git, but not using diffs

your provided code would give an error `touch: cannot touch 'foo/snafu': No such file or directory`.

Means you didn't create foo w/ mkdir…

stat -c "+%z" b

seth wrote:

There're *four* timestamps, some will change, others won't.
You're looking for "Modify"

%y, not %z …

'No' what?

There's no "external program move", everything goes through the same syscalls and then the FS implementation.

storage corruption or malicious manipulation

is
1. your disk is broken
2. somebody wants to actively screw you, this doen't happen accidentally

You'll now either explain what actual problem you're trying to solve or you're going to solve it by yourself.
There's no point in re-reading manpages to you when we don't even know whether any of this is relevant.

Online

#18 2024-02-23 00:28:44

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

Means you didn't create foo w/ mkdir…

not commenting on that since it's not relevant

is
1. your disk is broken
2. somebody wants to actively screw you, this doen't happen accidentally

well,never experienced or heard about that. i guess storing the cksum is still better than not doing it

here is the source code for my masterpiece database tracking tool:

find .. -type f -exec sh -c 'echo -e "$(stat -c "%y" "{}" | cut -d " " -f 1) $(stat -c "%y" "{}" | cut -d " " -f 2 | cut -d "." -f 1)\t$(stat -c "%w" "{}" | cut -d " " -f 1) $(stat -c "%w" "{}" | cut -d " " -f 2 | cut -d "." -f 1)\t$(stat -c "%z" "{}" | cut -d " " -f 1) $(stat -c "%z" "{}" | cut -d " " -f 2 | cut -d "." -f 1)\t$(cksum "{}" | cut -d " " -f 1 | cut -b1-9)\t$(du -h "{}")"' \; > "tree$(date +"%Y-%m-%d %H:%M:%S")"

the fun is running it, not reading it

    1. find .. -type f: this part of the command initiates a recursive search (find) starting from the parent directory (..). it looks for files (-type f).

    2. -exec sh -c '...' \;: for each file found by find, it executes the specified shell command within single quotes.

    3. inside the single quotes:
        echo -e: this command is used to print the following formatted information to the console.
        $(stat -c "%y" "{}" | cut -d " " -f 1) $(stat -c "%y" "{}" | cut -d " " -f 2 | cut -d "." -f 1): collects the modification time of the file, splitting it into date and time components.
        $(stat -c "%w" "{}" | cut -d " " -f 1) $(stat -c "%w" "{}" | cut -d " " -f 2 | cut -d "." -f 1): collects the access time of the file, splitting it into date and time components.
        $(stat -c "%z" "{}" | cut -d " " -f 1) $(stat -c "%z" "{}" | cut -d " " -f 2 | cut -d "." -f 1): collects the change time of the file, splitting it into date and time components.
        $(cksum "{}" | cut -d " " -f 1 | cut -b1-9): calculates the checksum of the file and extracts the first 9 characters.
        $(du -h "{}"): fetches the human-readable size of the file.

    4. > "tree$(date +"%y-%m-%d %h:%m:%s")": redirects the output of the entire command to a file named with the current date and time, prefixed with "tree". the date command generates the current date and time in the specified format ("%y-%m-%d %h:%m:%s").

the output file will look something like this:

2023-12-16 19:14:46	2023-12-30 15:14:51	2024-01-18 18:56:01	198452955	2.0M	../Camera/old/image1.jpg
2023-12-16 19:14:48	2023-12-30 15:14:51	2024-01-18 18:56:01	413425810	1.9M	../Camera/old/image2.jpg
2023-12-16 19:14:49	2023-12-30 15:14:51	2024-01-18 18:56:01	188311049	1.6M	../Camera/old/image3.jpg
2023-12-16 19:14:40	2023-12-30 15:14:51	2024-01-18 18:56:01	195515848	1.8M	../Camera/old/image4.jpg

now you can run diff of it and see the changes of different versions of your database with ease. haters gonna hate

Last edited by whittlers (2024-02-23 00:29:49)


sorry for bad english

Offline

#19 2024-02-23 01:56:06

whittlers
Member
Registered: 2024-02-19
Posts: 58

Re: (how to?) track differences using git, but not using diffs

find .. -type f -exec sh -c 'stats=$(stat -c "%.19y\t%.19w\t%.19z" "{}"); echo -e "$stats\t$(cksum "{}" | cut -d " " -f 1 | cut -b1-9)\t$(du -h "{}")"' \;

sorry for bad english

Offline

Board footer

Powered by FluxBB