You are not logged in.

#1 2018-06-12 15:30:01

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

sorting script output ...

Hi folks,

I have a folder full of folders ... these are backup folders which are named in YYYY-MM-DD format (date -I format).  My backup script runs every day and creates a new folder every time.  I am now trying to write a script to list all the backup folders with a date name which is older than a specified number of days (eg:  10 days ago), and I want the list to be sorted using the sort command.  So far I've written this ...

for backup in $(find /mnt/seagate -maxdepth 1 -type d ! -path /mnt/seagate -name 20*); do
        if [ $(date -d $(basename $backup) +%s) -lt $(date -d "10 days ago" +%s) ]; then
                echo $backup;
        fi
done

Feel free to rip this to shreds!  I'm fairly new to bash scripting and I do it so infrequently that I find myself constantly on a steep learning curve.  This script above generates a list in stdout but obviously does not do any sorting yet.  How can I sort this output?

Hopefully this explains in enough detail.  Any help would be appreciated.

JB

Offline

#2 2018-06-12 15:33:04

ayekat
Member
Registered: 2011-01-17
Posts: 1,589

Re: sorting script output ...

You can just pipe it into `sort`:

for ... ; do
    ...
done | sort

pkgshackscfgblag

Offline

#3 2018-06-12 16:15:33

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,441
Website

Re: sorting script output ...

find /mnt/seagate/ -maxdepth 1 -type d -mtime 10 ! -path /mnt/seagate

Oh, your question was just how to sort the output?  Yeah, just use sort.  But you should get rid of that loop and odd conditionals, just tell find you want things from within the last 10 days and be done with it.

Last edited by Trilby (2018-06-12 16:17:36)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#4 2018-06-12 21:33:28

Roken
Member
From: South Wales, UK
Registered: 2012-01-16
Posts: 1,251

Re: sorting script output ...

Or, you could always write your own sort routine if you want to re-invent the wheel. Ahhh, memories of my early programming days and bubble sorts.


Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus Prime B450 Plus, 32Gb Corsair DDR4, Cooler Master N300 chassis, 5 HD (1 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703

Offline

#5 2018-06-13 01:12:32

mpan
Member
Registered: 2012-08-01
Posts: 1,188
Website

Re: sorting script output ...

jjb2016: if you relax your requirements a bit¹ and can use GNU² ls and head, you could simplify your code and avoid non-robust idioms³:

ls -pt . | grep -F '/' | head -n -10

____
¹ Not an insane idea with your conditions.
² The default for Arch
³ Not an issue with daily-created files, but in a general case blindly expanding glob patterns may lead to painful failures.

Last edited by mpan (2018-06-13 01:19:43)


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#6 2018-06-13 03:30:53

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: sorting script output ...


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#7 2018-06-13 10:40:55

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,441
Website

Re: sorting script output ...

While I agree parsing ls will always be a bad idea, I've just found another reason to love my shell:

$ touch 'a space' $'a\nnewline'

$ echo "don't taze me, bro" > a

$ ls
a          a?newline  a space

$ ls | cat
a
a?newline
a space

$ ls | /bin/grep -Z '\n'
a?newline

$ ls | grep ' '
a space

I don't miss BASH in the slightest.

Last edited by Trilby (2018-06-13 10:43:03)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#8 2018-06-13 16:02:14

mpan
Member
Registered: 2012-08-01
Posts: 1,188
Website

Re: sorting script output ...

Eschwartz: if that message was addressed to me and wasn’t just an auxiliary info for OP: in which place YYYY-MM-DD contains characters that could fail with ls?

Last edited by mpan (2018-06-13 16:02:34)


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#9 2018-06-13 16:26:45

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: sorting script output ...

Because if they're genuinely named like that, then glob order will be mtime order already, so why do silly parsing of any sort anyways if you can give me 100% guarantees that this will never, ever, ever, ever see content which isn't 100% what the intent of today declares shall be the case?

If you genuinely need to do such sorting, use find . -printf '%T@ %p\0' and do proper null-separated sorting based on unix epoch seconds, then use mapfile/readarray to array'ize it, then remove the leading epoch and single space character via "${array[@]#* }" when using it.

This also lets you use find's builtin functionality for choosing depth, -name or -regex, -type d instead of insanely grepping for a trailing /, etc. All in *one* command, a second to sort it, and pure bash builtins to process it in a way that actually comprehends each filename as a separate element regardless of what bizarre characters it has to process.

Also encouraging the use of ls on one case "because it doesn't matter" (even if that were true, which it isn't) directly leads to the cargo-culting of *bad code on the internet*, which encourages people to use it when it majorly breaks things. I didn't see you mentioning anything about how parsing ls is even *ever* considered bad, so you can hardly tell me now "oh, I knew in my heart that it's bad, but it's okay here, so therefore it's also okay to say it here". Recommending ls *ever*, without explaining the costs and benefits, is a massive disservice no matter how suitable it is to the specific instance.

There. Is. A. Reason. Why. Everyone. Who. Knows. What. They're. Talking. About. Says. Don't. Parse. Ls.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#10 2018-06-13 19:13:01

mpan
Member
Registered: 2012-08-01
Posts: 1,188
Website

Re: sorting script output ...

First of all: I know about issues with ls, so there is no need to throw asterisked words at me.

I have provided code tha works under some conditions (and this is even marked). If those are not met, the code doesn’t work as expected in cases much more probable than YYYY-MM-DD containing a newline. It’s enough that there is less than 10 directories or some are missing. But within given conditions it will work.

We can argue that one should avoid potentially dangerous constructs even if in a particular case they work. I subscribe to that opinion. But the problem is that no one provided a better alternative. Glob patterns are even worse than ls. find, as used in this thread, produces output that is as broken as ls one, just with much longer command. So what is here to defend? Utterly verbose code that is at least equally broken?


Sometimes I seem a bit harsh — don’t get offended too easily!

Offline

#11 2018-06-13 19:22:35

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,441
Website

Re: sorting script output ...

mpan, while I agree with you on almost all points, the benefit of the find command is - certainly - not brevity, but resource minimization.  It's a single process.  Not only is the pipeline several processes, but it also requires a couple additional subshells.  This may not be a concern, but a command that is simply shorter to type is not necessarily better, especially if it would be used in a function/alias anyways.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#12 2018-06-13 20:10:27

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: sorting script output ...

I've been challenged that working code is better than suggestions of pseudocode which are then left as an exercise to the reader to complete. I'm unsure how that makes my find suggestion "broken". So here it is.

mapfile -td '' mybackups < <(find /mnt/seagate/ -mindepth 1 -maxdepth 1 -type d -mtime 10 -printf '%T@ %p\0' | sort -nrz)
printf '%s\n' "${mybackups[@]#* }"

This uses two commands, one pipe and one subshell, and two builtins. It is backed by the full power of GNU find, and will *safely* handle any conceivable filename no matter what you try to use it for.

The alternative with ls, uses three commands, two pipes, and mangles filenames, in the process of *failing to get the right answer*. The OP wants listings from the last ten days, not the last ten listings which may break when a day got skipped.

I think I'm justified in saying ls is the wrong solution. I've never yet found a place where it's the right one, and this is not the day either.

mpan wrote:

I have provided code tha works under some conditions (and this is even marked).

I don't see where it's marked.

mpan wrote:

you could simplify your code and avoid non-robust idioms³:

³ Not an issue with daily-created files, but in a general case blindly expanding glob patterns may lead to painful failures.

Unless I've dramatically misread this, you're saying that your suggestion is in fact the thing which is robust against weirdness. Which is incorrect.

You've most certainly not explained what "non-robust idioms" you're talking about, since no one said anything about expanding globs. The OP did use find . -name 20* which expands globs, but you seem to assume they're aware of that as a given, then obliquely reference that it's a bad idea.

You've most certainly not explained where ls would be a bad idea, therefore encouraging the idea that ls is a useful tool which it is good to parse.
This is obviously not the expanding globs you're talking about, since ParsingLs is not primarily about expanding globs (though it does that too), but about splitting on whitespace.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#13 2018-06-14 09:23:58

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

Re: sorting script output ...

Hi everyone,

Thank you all for taking the time to respond to this.  It has been mildly amusing to see how this sort of topic can spark such debate.  I'm a CAD software consultant & instructor and I know that when there are multiple ways of achieving the same end result then you need to refine the requirements to refine the solution.  So here are a few more specific points to consider (this may add some fuel to the file ...):

  • My question about sorting is a little academic.  Piping the for loop into sort does do what I want.  As I mentioned before I have this backup script which runs each day, and I had an idea for improving it which I though I may need the sorting for - I'm not convinced I need it yet.

  • My backup drive (mounted at /mnt/seagate) should only ever be used for storing the output of my backup script, which only generates folders named in YYYY-MM-DD format.  The drive is attached to a raspberry pi (running Arch) which I've setup with the sole purpose of driving the backup drive and running my backup script.  So it's very unlikely that /mnt/seagate will contain anything other than folders with this name format.

  • Having said the above, I want the script to work correctly if there ever are any exceptions, so I'm using the find command and -name 20* just to, obviously, make sure that it excludes anything that does not start with 20YY-MM-DD.  Wouldn't it be better if I could somehow test each folder name to see if it is a vaild YYYY-MM-DD format?  Is it possible to find folders where the name conforms to a valid date format?

  • Ultimately what I want to do with this bit of code is add it to my backup script so that the backup script will automatically delete any folders with a name that is a date older than say 10 days.  So I want the condition for deletion to be based on the folder name, and not creation/modification metadata.  I know that if the backup and cleanup works correctly each day then each day there should only be one folder to delete (the oldest one) but if a backup ends up taking more than one day, or if the raspberry pi is offline for a day or two, then there may be more than one folder to cleanup.

  • It has been suggested here that I could use -mtime 10 in the find command but that only returns the folder created 10 days ago, and does not return all the folders older than that as well.  This is why I've got the for loop to look at the name of each folder and determine if the date represented by the name is older than 10 days ago.

Cheers.

Offline

#14 2018-06-14 09:32:38

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

Re: sorting script output ...

Also ... I have no idea what "expanding globs" are but it sounds rather unpleasant!

Last edited by jjb2016 (2018-06-14 09:33:05)

Offline

#15 2018-06-14 09:47:23

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: sorting script output ...

find has a -regex / -iregex option to restrict file names.

find . -regextype 'posix-minimal-basic' -type d -regex '.*/[0-9]\{4\}-[0-9][0-9]-[0-9][0-9]'

As for mtime, you want "-mtime +10". Still, if you don't want to rely on the modification timestamp, then this is not for you.

man find wrote:

       Numeric arguments can be specified as
       +n     for greater than n,
       -n     for less than n,
       n      for exactly n.

Edit: Since you already have the date in a sane format (YYYY-MM-DD), you could use string comparison on it instead of converting it to a number with date.

Last edited by progandy (2018-06-14 09:51:42)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#16 2018-06-14 10:20:11

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

Re: sorting script output ...

Side note ....

http://tldp.org/LDP/abs/html/globbingre … N.AEN17572

OK .... i know what globbing is now ... but why "globbing" !?!?  It is described in the link above as "filename expansion" which seems like a logical descriptive name for it.

Offline

#17 2018-06-14 10:31:20

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: sorting script output ...

The first implementation was called "glob", short for "global command", probably referencing the /g global modifier in regular expressions. https://unix.stackexchange.com/question … ommand-huh

Last edited by progandy (2018-06-14 10:32:25)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#18 2018-06-14 10:48:44

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,441
Website

Re: sorting script output ...

jjb2016 wrote:

Ultimately what I want to do with this bit of code is add it to my backup script so that the backup script will automatically delete any folders with a name that is a date older than say 10 days.  So I want the condition for deletion to be based on the folder name, and not creation/modification metadata.

I'm not sure I understand why mtimes would not be ideal for this.  As noted above, find can filter for mtimes older than a given age and it also has a -delete command, so this could all be handled by find.

If there are reasons to think mtime will be different from the dates, then this would not be the right way.  But if there isn't, it actually contains an added bonus.  Assuming some random other file/directory is added that really shouldn't be there (it doesn't match the YYYY-MM-DD format at all) then a find command based on -mtime would eventually clean that out too which would probably be handy.

But on a different note, are you using incrememental backups with rsync?  If so, under most conditions there is little need to remove older backups to save space.  Unless you are frequently creating and deleting many large files, older incremental backups can take up pretty trivial ammounts of space (so deleting them buys you very little).

If you are not using incrememental backups, then deleting older archives is certainly important ... but you should really consider using incremementals.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#19 2018-06-14 11:11:12

jjb2016
Member
From: Oxfordshire
Registered: 2016-02-29
Posts: 73

Re: sorting script output ...

Hmmm ... maybe I should just use -mtime.  This will work well for the next 82 years ...
To delete all backup folders older than 10 days I could use ...

rm -r $(find /mnt/seagate/ -maxdepth 1 -mtime +10 -regextype 'posix-minimal-basic' -type d ! -path /mnt/seagate -regex '.*/20[0-9]\{2\}-[0-9][0-9]-[0-9][0-9]' | tr '\n' ' ')

Does this ring any alarm bells for anyone?

Offline

#20 2018-06-14 11:19:56

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: sorting script output ...

Yes, why do you use command substitution which is fragile and results in the output of find being split on whitespace, when you could just use -exec rm -r {} +

This even lets you skip the subshell from $() since find directly forks and runs rm for you.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#21 2018-06-14 11:21:59

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,441
Website

Re: sorting script output ...

First, there's no need at all for that `tr` command.  As the $() subshell is not quoted, white space is collapsed, and newlines become spaces anyways, for example:

$ printf "first\nsecond\n"
first
second

$ echo $(printf "first\nsecond\n")
first second

$ echo "$(printf "first\nsecond\n")"
first
second

But also there is no need for the subshell at all, nor the `rm` command, just use the -delete flag for `find` (put it *last*, order matters in find's arguments). (edit: see below, -exec does seem necessary)

EDIT: as for -exec rm -r, there's no forking way you should do that tongue use -delete.

Last edited by Trilby (2018-06-14 11:28:52)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#22 2018-06-14 11:23:24

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: sorting script output ...

-delete does not delete non-empty directories...


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#23 2018-06-14 11:25:09

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,441
Website

Re: sorting script output ...

Hmm, I just tested and that seems to be ... then what's all that nonsense in the man page about it implying '-depth'


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#24 2018-06-14 11:29:13

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: sorting script output ...

It works if the find command also matches the files and folders below it, but using -maxdepth and matching on the folder name because apparently -mtime is just completely non-trustworthy... prevents -delete from being able to do its thing by just deleting all files and folders, in the proper order.

Edit:

-depth Process each directory's contents before the directory itself.

No magic there, it des exactly what it says on the tin. It's just that nothing inside $directory matches -maxdepth 1 -name '20[0-9]...'

Last edited by eschwartz (2018-06-14 11:31:31)


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#25 2018-06-14 11:31:26

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: sorting script output ...

Trilby wrote:

Hmm, I just tested and that seems to be ... then what's all that nonsense in the man page about it implying '-depth'

It is not nonsense. "-depth" just changes the algorithm from breadth-first (or some hybrid form) to pure depth-first. This is needed to process and delete the contents of a directory before the directory itself. -maxdepth/-mindepth are separate options.

Last edited by progandy (2018-06-14 11:33:00)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

Board footer

Powered by FluxBB