You are not logged in.

#1 2010-10-29 15:40:04

jwhendy
Member
Registered: 2010-04-01
Posts: 621

File tagging/management [bash help wanted!]

Update: I have a potential solution some might be interested in but my bash-fu is weak... I've updated this post:
- shortened description very, very much
- summarized the idea
- provided bash "template" to do what I want

Any suggestions/help would be greatly appreciated

---
Hi,

I would like to do some reorganization of my work files in order to allow an interlinking semantic file structure of sorts. Think files that could be tagged and thus found through multiple avenues just like one's music or whatever else. I have a vision of what I'd like to do but am lacking the bash skills to make it happen.

Here's my vision.


Actual Files
Most of my work is fairly easy to divide into project folders.

I'll be renaming my files in the spirit of oyepa; something like:

projCode_fileName[tag1-tag2-tag3]_yyyy-mm-dd.ext

I will continue to maintain a project hierarchy of files just like I have been. This will be my main "tree." Think something like this:
,---
| ~/proj1
| ---- ../proj1_file1[tag1-tag2].ext
| ---- ../proj1_file2[tag3]_yyyy-mm-dd.ext
| ~/proj2
| ---- ../proj2_file3[tag2-tag4].ext
| ---- ../proj1_file4[tag1].ext
`---

Hopefully that makes sense. I'll only add dates where appropriate and don't think the tags will actually be a very large list.


Semantic Tree
Here's where I need some bash magic. I want to run a script on my main "tree" and build a symlinked subtree that will create a network of symlinked files and directories based on the tags and file type. For example, the tree above would end up like this:

,---
| ~/tag1
| --- ../proj1_file1[tag1-tag2].ext
| --- ../tag2
| ------- ../proj1_file1[tag1-tag2].ext
|
| ~/tag2
| --- ../proj2_file3[tag2-tag4].ext
| ---../tag1
| ------- ../proj1_file1[tag1-tag2].ext
| --- ../tag4
| ------- ../proj2_file3[tag2-tag4].ext
|
| ~/tag3
| --- ../proj1_file2[tag3]_yyyy-mm-dd.ext
|
| ~/tag4
| --- ../proj2_file3[tag2-tag4].ext
| --- ../tag2
| ------- ../proj2_file3[tag2-tag4].ext
`---

Hope that was clear. The single tagged files are simple. A folder will be created that contains symlinks to all files tagged with that tag. If a file contains multiple tags, a hierarchy is built so that any tag "path" can find the file. tag1, tag2, and tag3 on a file creates these directories
~/tag1/tag2/tag3
~/tag1/tag3/tag2
~/tag2/tag1/tag3
~/tag2/tag3/tag1
~/tag3/tag1/tag2
~/tag3/tag2/tag1

The given file appears in all of these directories as a symlink. For what it's worth, I don't think I'll be

I'd like to do the same based on file type so that each folder also contains a link to the documents, images, and presentations that match the given tags.


Bash method
What would need to happen is like so:
- ls -R in the top tree to get all the file names
- get the project name and tags from the file name (treat the proj ID like a "tag" as well)
--- in my version above, I've been able to extract the tags with:

tags=${ls -1 | sed -e 's/.*\[//' -e 's/\].*$//')

but it took a separate sed step to get the hyphens turned into spaces.
- optional: if easy, sort and strip off all redundant combinations
- build the semantic hierarchy and place the symlinks
--- make each combination of tag dirs
--- ln -s each file inside the various directories


Questions
- How difficult would this be?
- Is bash a stupid way to go about it (ridiculously slow)?

Thanks again for any help. I think it could end up being really neat. If no one is able to assist, not a problem. I'll hack my way through it eventually. My main issue is lack of knowledge with bash.

I'm open to any thoughts/suggestions.

Last edited by jwhendy (2010-11-02 18:05:24)

Offline

#2 2010-10-29 16:28:58

MadTux
Member
Registered: 2009-09-20
Posts: 553

Re: File tagging/management [bash help wanted!]

What about using some kind of document management tool or wiki software such as Zim?

Offline

#3 2010-10-29 16:43:39

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: File tagging/management [bash help wanted!]

@MadTux: what do you have in mind by "document management tool"? Would that be akin to my mentionings above like Beagle, Recoll, strigi, or the like? I've heard of others using Zim. I use emacs Org-mode, so that's a possibility as well. Some kind of "metadata" tracking file that did this... but it seems a little hard to maintain.

Offline

#4 2010-10-29 16:52:30

Bregol
Member
Registered: 2008-08-15
Posts: 175

Re: File tagging/management [bash help wanted!]

jwhendy wrote:

How do you track down dead symlinks or when you move a file?

well, in my backup script, i use "cp -l"... it a little different than symlinking and isn't making a new copy of the file.  Of course, if you need to delete a file, just removing it in one location won't get rid of it in all locations.


Nai haryuvalyë melwa rë

Offline

#5 2010-10-29 19:48:29

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: File tagging/management [bash help wanted!]

What about some scenario like this:

Main dir set, organized by type (perhaps, though not necessary)
,---
| ~/docs
| ---../file1_tag1_tag2_tag3.txt
| ---../file2_tag4_tag5.odt
|
| ~/videos
| ---../file3_tag1_tag3.mov
| ---../file4_tag4_tag6.avi
|
| ~/images
| ---../file5_tag2_tag4.png
| ---../file6_tag1_tag7.jpg
`---

Something like that. Then what about some type of bash script that would hunt through all of those directories, look for unique tags, create any directories necessary and then symlink the files into them? So you'd end up with (docs, video, and images still contain the actual files; I just didn't list them for the sake of brevity. In fact, only docs, videos, and images contain actual files. The rest are entirely symlinked copies):


,---
| ~/docs
| ~/videos
| ~/images
|
| ~/tag1
| ---../file1_tag1_tag2_tag3.txt
| ---../file3_tag1_tag3.mov
| ---../file6_tag1_tag7.jpg
|
| ~/tag2
| ---../file1_tag1_tag2_tag3.txt
| ---../file5_tag2_tag4.png
|
| ~/tag3
| ---../file1_tag1_tag2_tag3.txt
| ---../file3_tag1_tag3.mov
|
| ~/tag4
| ---../file2_tag4_tag5.odt
| ---../file4_tag4_tag6.avi
| ---../file5_tag2_tag4.png
|
| ~/tag5
| ---../file2_tag4_tag5.odt
|
| ~/tag6
| ---../file4_tag4_tag6.avi
|
| ~/tag7
| ---../file6_tag1_tag7.jpg
`---

Is that way too ridiculous? The script could perhaps function like rsync's --delete option so that it would always be up to date:
- delete symlinks for which there is no file anymore
- remove any ~/tag# dirs for which no files are tagged

Rather than be all complicated with checking, I wonder how long rebuilding such a symlink hierarchy would take to build from scratch each time.

What do people think of this? Reinventing the wheel? Waaaay too much work? Stupid? I only know java from my one class during engineering. This seems more suited for something like a bash script since nothing should be all that complex (at least in how I'm thinking of it). Essentially it just needs to run through a list of directories to check (set in a config file), build a list of unique tags, make sure the tag directories are set, and then begin symlinking. A bonus would be to simply update (delete stale links/dirs, create new ones) rather than rebuild each time as then only the first run would be the mammoth.

There's my developing thoughts/vision.

Last edited by jwhendy (2010-10-29 19:49:46)

Offline

#6 2010-11-02 18:09:04

jwhendy
Member
Registered: 2010-04-01
Posts: 621

Re: File tagging/management [bash help wanted!]

I updated my thoughts with a request for bash assistance in the original post. See post #1 for the full and updated description/vision.

Offline

Board footer

Powered by FluxBB