You are not logged in.

#1 2013-01-10 05:10:42

darkfeline
Member
Registered: 2012-02-14
Posts: 94

dantalian — Transparent tag-based file organization system

dantalian is a set of python scripts which simplifies organizing files on any file system that supports hard links.  It provides a simple yet useful metaphor for existing directory structures that allows tag-based organization. 

Website: http://darkfeline.github.io/dantalian/
Github: https://github.com/darkfeline/dantalian

NOTE: The following is out of date.  Check the website for up to date info.

Features:

  • Simple implementation

  • All files (including symlinks to directories) can be tagged indiscriminately.

  • Libraries are transparent. You can interact with them on a basic level with coreutils, e.g. mv, ls, ln.

  • Libraries are portable. Moving is as simple as rsyncing it over and running dantalian fix.

  • Files are tagged on an inode basis.

  • Metadata is "stored" in the directory structure.

  • Files can be moved and/or linked elsewhere without breaking anything.

  • Almost no restrictions on tagged files’ names.

  • FUSE mounting allows a dynamic virtual view of the library, with arbitrary logical tag combinations in an arbitrary directory structure.

News:

I'm back from hiatus, after working on various things and self-improvement.  I've released 0.5 after brief testing making sure things work as advertised (mostly).  Better documentation, and setting up a mailing list and a website will follow (0.6 planned).

Version 0.5 changelog:

  • New FUSE mount tree/node system. Nodes are made/deleted dynamically in a FUSE mounted library. Changes are saved on unmount and loaded on mount. Tree is dumped as a JSON file, so is editable by hand if necessary.

  • Wrote FUSE syscall specifications.

  • Added rmnode socket command.

  • Added unit tests.

  • Bugfixes.

  • Documentation improvements.

  • Added mktag and rmtag commands.

Please try it out!  Any bugs, feature request, thoughts, please feel free to say them.

Last edited by darkfeline (2014-07-30 05:35:56)

Offline

#2 2013-04-26 01:29:09

Jackson Jia
Member
Registered: 2013-04-25
Posts: 1

Re: dantalian — Transparent tag-based file organization system

I am really interested in the tag-based filesystem. But first, I want to know why the name of the python script is dantalian?

Offline

#3 2013-07-22 01:49:20

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

Jackson Jia wrote:

I am really interested in the tag-based filesystem. But first, I want to know why the name of the python script is dantalian?

It's a reference to Dantalion, the 71st demon listed in the Ars Goetia.  A short quote from Wikipedia: "He teaches all arts and sciences, and also declares the secret counsel of anyone, given that he knows the thoughts of all people and can change them at his will. He can also cause love and show the similitude of any person, show the same by means of a vision, and let them be in any part of the world they will."

It's also a reference to The Mystic Archives of Dantalian.  In both cases, Dantalian represents limitless knowledge, all of the knowledge in the world at your fingertips.  I like to think of this as such a library, where you can put anything in it, organize it how you will, and retrieve it later.  I'm writing this to use myself to store and organize all manner of things on my computer smile.

Offline

#4 2013-07-22 05:16:11

Diaz
Member
From: Portugal
Registered: 2008-04-16
Posts: 366

Re: dantalian — Transparent tag-based file organization system

I might look at this just because of the name tongue ( http://myanimelist.net/anime/8915/Dantalian_no_Shoka ).
Actually tag based filesystem might be something interesting to try smile

Offline

#5 2013-08-05 11:24:27

orschiro
Member
Registered: 2009-06-04
Posts: 2,136
Website

Re: dantalian — Transparent tag-based file organization system

I am quite interested in a tag based file system as opposed to the traditional structure where I often do not find my files again *sigh*.

Since I am using Gnome and Nautilus, could you please explain what is the best way to combine your tag base file system approach with Gnome and Nautilus in daily use?

Thanks!

Offline

#6 2013-08-12 10:09:44

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

@Diaz

Glad to hear that.  If you try it out, I hope you keep using it because it is good.

@orschiro

I'm not *entirely* sure if dantalian will help for your problem and use case, since dantalian is somewhat aimed toward finding subsets of files than specific single files, although that certainly depends on how you use it.  I certainly will be writing a few guides and tutorials in the documentation as time goes by.  Right now, I want to add the last few absolutely essential features and make sure things are stable first, along with some more documentation improvements.

Here's a quick-and-dirty guide though, and see if you can get this working:

Make two folders, .library and Library in your home directory.  Create a library in .library, and set up scripts (in .profile, etc.) to mount it in Library on login.  You can then simply create folders in .library for tags, and in Nautilus, you can drag and hold Ctrl (I think, don't remember the exact key) and drop files in the folder/tag you want to create hard links.

That might seem a bit incomprehensible if you're not familiar with how dantalian works basically, but I don't have time to write a comprehensive guide at the moment.

Offline

#7 2013-09-15 20:56:12

mar04
Member
From: Poland
Registered: 2010-02-08
Posts: 117

Re: dantalian — Transparent tag-based file organization system

Interesting project, I would like to see some integration with ranger. Also wanted to let you know I submitted PKGBUILD to AUR: https://aur.archlinux.org/packages/dantalian/

Offline

#8 2013-09-19 05:47:07

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

mar04 wrote:

Interesting project, I would like to see some integration with ranger. Also wanted to let you know I submitted PKGBUILD to AUR: https://aur.archlinux.org/packages/dantalian/

I use ranger as well, so that is definitely on my to-do list.  Because of the way dantalian works, a lot of the basic "features" can already be done in ranger since they are just regular file operations, although I'll look into adding support for the more advanced features too.

Offline

#9 2013-10-06 19:43:02

yulan6248
Member
Registered: 2013-04-06
Posts: 28

Re: dantalian — Transparent tag-based file organization system

This looks awesome! I've been searching on the net for a while for a tag system built upon hard-links, and this looks just like what I need. I'm using ranger as well and would look forward to any update on your project smile

Offline

#10 2013-10-27 07:08:51

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

I've been busy with stuff and classes, so it's not that I abandoned this.  0.5 will be out once I get some more time.  Don't remember exactly how much code I've pushed to the dev branch, but preview for 0.5:

  • Virtual fuse tree is now saved and loaded between mounts and can be modified dynamically while mounted

  • Better documentation and tutorials

  • Various bugfixes, namely clarifying behavior of everything and in edge cases, building on the last point

Planned for future releases, in no particular order:

  • Caching to improve performance for slow things/huge number of files

  • ranger interoperability (mainly guides and tutorials I think, with some short snippets to add to config files)

  • Media metadata tag support

  • Alternate backend support?  Like a database or flat text files and stuff (though I don't think I will write these, the option will be there if others would like to do so)

Offline

#11 2013-12-21 16:32:20

godblessfq
Member
Registered: 2013-12-21
Posts: 1

Re: dantalian — Transparent tag-based file organization system

Looks very interesting!
IMHO, the hardlink approach is less portable than the sql database approach like in the following software.
tmsu
tagsistant
Because it is not possible to hardlink to a portable disk, and most of my files are store in external harddrive. In windows, one of my favorite file manager is xyplorer, it also has tagging ablility. Of course it doesn't use FUSE, instead it stores tagging info in a text file,  the drawback is file operation outside the filemanager isn't tracked which leads to broken database. But it has one good feature: it also stores comment in the data base. We need comment because some info is too long to be put in tags.
I am still learning to program. But for now, as a user, I would suggest somebody develop a tagging software that could handle several databases at the same time, eg. I have several external hard disk, I want to have one tagging data on each of them and load them in my desktop, the database of the portable disk can only be modified when the disk is actually mounted otherwise it is only readable, so I can search for the file in the desktop, I mount the protable disk if I find the file is stored in the portable disk. It may have duplicate items in two seperate database, but the user don't need to worry about that, the  program merges the duplicate in the background.
Sorry about my english, I hope I have managed to make my idea clear.  Basically I want a tagging filemanger that can manage all my files without all the external disk plugged in.
Thank you very much for your time!

(this may be off topic, but I think zotero perhaps is the ultimate inormation manager.)

Last edited by godblessfq (2013-12-21 16:34:07)

Offline

#12 2013-12-31 11:36:28

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

@godblessfq
There are database taggers out there (e.g., tagsistant), but dantalian was created specifically to overcome shortcomings inherent to database implementations of tagging.  The most significant of which is, if you move or even rename a file under a database tagging scheme, everything breaks.

There are a lot of advantages to dantalian, such as transparency.  Want to find a file tagged with 'foo'?  Just open up a browser and go to the 'foo' directory.  No software, no database, no searching, use any browser, whether you're uploading a file in Firefox, saving a file in Chrome, or looking for something in Thunar.  I can't remember everything at the moment, though.

Offline

#13 2013-12-31 13:06:38

likytau
Member
Registered: 2012-09-02
Posts: 142

Re: dantalian — Transparent tag-based file organization system

Hi,
As a current TMSU (+ my own custom filelists system) user I am looking at this and thinking 'separate tag set for each filesystem could be okay.. but I'd like more specifics.'

* Each hardlink occupies some space on disk, obviously. How much does this amount to per tagging, assuming a ext4 partition? Utilities such as 'du' don't help for this, unfortunately smile  (I'm comparing to the -- generous -- estimate of my current TMSU DB: 1482 bytes / tagged file "on average")

* How does it perform when the number of files in a tag rises very high?  (for example, I have one tag here in my TMSU db that's applied to 5000-odd files.). In particular, I am familiar with the experience of filling up a directory with many files, and the read time for that directory promptly ballooning. Does those 'files' only being hardlinks ameliorate this effect?

Offline

#14 2013-12-31 13:50:28

progandy
Member
Registered: 2012-05-17
Posts: 5,184

Re: dantalian — Transparent tag-based file organization system

likytau wrote:

Hi,
As a current TMSU (+ my own custom filelists system) user I am looking at this and thinking 'separate tag set for each filesystem could be okay.. but I'd like more specifics.'

* Each hardlink occupies some space on disk, obviously. How much does this amount to per tagging, assuming a ext4 partition? Utilities such as 'du' don't help for this, unfortunately smile  (I'm comparing to the -- generous -- estimate of my current TMSU DB: 1482 bytes / tagged file "on average")

A hardlink on ext4 requires up to 263 bytes (size of one linear directory entry, I did not count the hashtable).
https://ext4.wiki.kernel.org/index.php/ … ry_Entries

* How does it perform when the number of files in a tag rises very high?  (for example, I have one tag here in my TMSU db that's applied to 5000-odd files.). In particular, I am familiar with the experience of filling up a directory with many files, and the read time for that directory promptly ballooning. Does those 'files' only being hardlinks ameliorate this effect?

Hardlinks behave the same as files, since they are essentially the same. You could also say that two files share the same space on the disk instead of different areas.

Last edited by progandy (2014-01-01 01:16:36)


| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |

Offline

#15 2014-01-01 00:46:31

likytau
Member
Registered: 2012-09-02
Posts: 142

Re: dantalian — Transparent tag-based file organization system

progandy wrote:

A hardlink on ext4 requires up to 263 bytes (size of one directory entry).

Hah, wow. That is good design.
263 - 255 (max filename length) == 8 bytes of header. Probably 4 bits misc(type, etc) plus 4 bytes inode number.

That makes dantalian competitive for <= 5 tags per file.


* How does it perform when the number of files in a tag rises very high?  (for example, I have one tag here in my TMSU db that's applied to 5000-odd files.). In particular, I am familiar with the experience of filling up a directory with many files, and the read time for that directory promptly ballooning. Does those 'files' only being hardlinks ameliorate this effect?

Hardlinks behave the same as files, since they are essentially the same. You could also say that two files share the same space on the disk instead of different memory areas.

Thanks. BTW, disk space != memory (arguably even if you're talking about an SSD, SD card, or USB key)

It sounds to me like I should use Dantalian for my art (low # of tags, medium number of files, file manager access is useful, don't often need to tag directories, ), and TMSU for everything else.


BTW, git master seems broken: 'mkdir -p library;dantalian init library' fails with the following message:

Traceback (most recent call last):
  File "/usr/bin/dantalian", line 37, in <module>
    getattr(commands, args.command)(*args.args)
  File "/usr/lib/python3.3/site-packages/dantalian/commands.py", line 328, in init
    library.init_library(args.root)
NameError: global name 'library' is not defined

Offline

#16 2014-01-01 06:59:14

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

likytau wrote:

Hah, wow. That is good design.
263 - 255 (max filename length) == 8 bytes of header. Probably 4 bits misc(type, etc) plus 4 bytes inode number.

That makes dantalian competitive for <= 5 tags per file.

dantalian relies on the file system for the specific space requirement, but as a rule of thumb half a dozen bytes for the inode number plus however long the name of the file is per file per tag is a good size estimate.  Reasonably, that lies around 20-40 bytes per file per tag.  However, a directory by default takes up a certain amount of space (~4kB?), so the first 4kB or so of files per tag is technically free, since that space would just be reserved empty space in the directory file anyway.

The flip side is that each tag (directory) costs 4kB by default, but again, that ends up being a front-end cost for future files with that tag.

A database (depending on implementation), would cost some 50-200B per file (average/regular case), 20-40B per tag, and 8-16B per tag-file relation.

The tradeoff, of course, is that in dantalian, basic operations, queries, and access are instant and handled directly by the file system/kernel, and you can move or rename files freely smile.

* How does it perform when the number of files in a tag rises very high?  (for example, I have one tag here in my TMSU db that's applied to 5000-odd files.). In particular, I am familiar with the experience of filling up a directory with many files, and the read time for that directory promptly ballooning. Does those 'files' only being hardlinks ameliorate this effect?

I'm working on dantalian to organize some 50GB of images, so performance is a concern for me as well.  I made a note to include a performance section in the documentation, but let me summarize here:

'ls'ing a directory (getting all files with tag foo) is linear to the number of files in the directory (so, number of files with the given tag).  This is unavoidable.  So linear (O(n)) in terms of files per tag.  Number of tags doesn't matter.  Note that 'ls'ing will always be linear, since you have to print each file anyway, you can't really get around that.

(Although, I am thinking about implementing a feature in the FUSE mount for dantalian that limits the number of files in each directory to ~500, say, and to view more you access a MORE directory or somesuch.  So opening such a directory in Thunar/your file browser doesn't hose the system.)

Access is constant (O(1)), regardless of number of tags, number of files, etc.

It sounds to me like I should use Dantalian for my art (low # of tags, medium number of files, file manager access is useful, don't often need to tag directories, ), and TMSU for everything else.

I'd be interested in hearing why you would choose TMSU over dantalian for your other files.  I originally researched a bunch of file tagging programs, including TMSU and tagsistant, before deciding to work on dantalian.  I would be interesting in adding features or modifying the design if there are any significant advantages of database-driven tagging that I overlooked.

BTW, git master seems broken: 'mkdir -p library;dantalian init library' fails with the following message:

Traceback (most recent call last):
  File "/usr/bin/dantalian", line 37, in <module>
    getattr(commands, args.command)(*args.args)
  File "/usr/lib/python3.3/site-packages/dantalian/commands.py", line 328, in init
    library.init_library(args.root)
NameError: global name 'library' is not defined

... Well, that's embarrassing.  Fix pushed.  If you find any other bugs, make a bug report on the tracker (https://github.com/darkfeline/dantalian … state=open) so I'll see it quickly.

Offline

#17 2014-01-01 10:44:54

likytau
Member
Registered: 2012-09-02
Posts: 142

Re: dantalian — Transparent tag-based file organization system

darkfeline wrote:
likytau wrote:

Hah, wow. That is good design.
263 - 255 (max filename length) == 8 bytes of header. Probably 4 bits misc(type, etc) plus 4 bytes inode number.

That makes dantalian competitive for <= 5 tags per file.

dantalian relies on the file system for the specific space requirement, but as a rule of thumb half a dozen bytes for the inode number plus however long the name of the file is per file per tag is a good size estimate.  Reasonably, that lies around 20-40 bytes per file per tag.  However, a directory by default takes up a certain amount of space (~4kB?), so the first 4kB or so of files per tag is technically free, since that space would just be reserved empty space in the directory file anyway.

Well, I'm about to rebuild my system with a btrfs-based /home, so it's quite likely that that will be more competitive.
BTW I didn't intend "<=5 tags per file" to be a criticism -- I don't have time to chuck lots of tags on stuff, so only automated tagging will produce heavy tagsets. OTOH, automated tagging has produced up to 120 tags / file in my current TMSU db.

The flip side is that each tag (directory) costs 4kB by default, but again, that ends up being a front-end cost for future files with that tag.

A database (depending on implementation), would cost some 50-200B per file (average/regular case), 20-40B per tag, and 8-16B per tag-file relation.

The tradeoff, of course, is that in dantalian, basic operations, queries, and access are instant and handled directly by the file system/kernel, and you can move or rename files freely smile.

* How does it perform when the number of files in a tag rises very high?  (for example, I have one tag here in my TMSU db that's applied to 5000-odd files.). In particular, I am familiar with the experience of filling up a directory with many files, and the read time for that directory promptly ballooning. Does those 'files' only being hardlinks ameliorate this effect?

I'm working on dantalian to organize some 50GB of images, so performance is a concern for me as well.  I made a note to include a performance section in the documentation, but let me summarize here:

'ls'ing a directory (getting all files with tag foo) is linear to the number of files in the directory (so, number of files with the given tag).  This is unavoidable.  So linear (O(n)) in terms of files per tag.  Number of tags doesn't matter.  Note that 'ls'ing will always be linear, since you have to print each file anyway, you can't really get around that.

(Although, I am thinking about implementing a feature in the FUSE mount for dantalian that limits the number of files in each directory to ~500, say, and to view more you access a MORE directory or somesuch.  So opening such a directory in Thunar/your file browser doesn't hose the system.)

Access is constant (O(1)), regardless of number of tags, number of files, etc.

Thanks for the detailed info.


It sounds to me like I should use Dantalian for my art (low # of tags, medium number of files, file manager access is useful, don't often need to tag directories, ), and TMSU for everything else.

I'd be interested in hearing why you would choose TMSU over dantalian for your other files.  I originally researched a bunch of file tagging programs, including TMSU and tagsistant, before deciding to work on dantalian.  I would be interesting in adding features or modifying the design if there are any significant advantages of database-driven tagging that I overlooked.

* Cross-filesystem tagging [this is not a hard requirement, but how I've currently got it setup, I have unified tagging for my external hard drive and home directory]. Could be ameliorated by providing a simple 'synchronize tag lists' command (basically just 'look at library A's tagset,  make any tag-directories needed so that library B has all of library A's tags available.')

* Also speed. I'm a prolific user of Python, but it's not great for a CLI program that should have a quick response time[1]. `time dantalian` tells me dantalian spends 0.307s (warm start) just to startup and realize 'oh, the commandline doesn't make sense, I'd better tell the user about that'.
Fortunately this can be somewhat ameliorated by the ability to write queries using just find + sort + comm, no dependencies.

By comparison:
* TMSU does this 'oops, better print help' task in 0.008s
* TMSU can return the tags of a file in 0.021s (3-tag case) to
* TMSU can answer a 2-tag query covering 5377 total files, resolving to 3184 files actually matching the query, in 0.237s. That's still only 66% of the time it takes dantalian to simply start up.

To put this in context, there are three things I think absolutely need to be fast: tag a file, untag a file, run a query. They can currently be fast, but not currently through Dantalian:

* tagging requires up to two operations (mktag, tag) per tag, mktag == 0.311s on my system, tag == 0.307s. That makes a tagging take .3 ... .6 of a second. Marginal cost not really much reduced for multiple tags, because mktag only allows you to add one tag at once.
* untagging -- 0.301s
* query -- unknown, throws an AssertError:

time dantalian --root library find python buildfile
Traceback (most recent call last):
  File "/usr/bin/dantalian", line 40, in <module>
    getattr(commands, args.command)(lib, *args.args)
  File "/usr/lib/python3.3/site-packages/dantalian/commands.py", line 183, in find
    r = lib.find(args.tags)
  File "/usr/lib/python3.3/site-packages/dantalian/library.py", line 391, in find
    dpath.pathfromtag(tags[0], self.root)))
  File "/usr/lib/python3.3/site-packages/dantalian/path.py", line 35, in pathfromtag
    assert istag(tag)
AssertionError

Btw, tag dirs are not going into library/, in fact they're being chucked into whatever directory I'm in! I guess this is a bug too? Maybe it's the cause of the above assert?

* hardlinks make my head explode when backing up. You just can't copy them cross-filesystem without some careful trickery.

* finally, the ability to accidentally put things only-in-the-library scares me (I'm sure dantalian has safeguards against this, but I don't want to be able to do it. Consider if I load in a piece of art, ink it, and then save in the 'inked' tag directory. This feels (maybe mistakenly) like an accident waiting to happen.

Things I do like in dantalian:

* The design. The directory-tagging handling scares me, but otherwise, brilliant.
* Hierarchical tags. Yes, more of this!!! Absolutely YES!
* Stuff being just there in the filesystem, without needing to do anything.
* Files don't have 'not in the db' status. They just aren't tagged yet. This makes intersection(filelist, X tag) easy to implement.

Offline

#18 2014-01-02 00:36:36

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

likytau wrote:

Well, I'm about to rebuild my system with a btrfs-based /home, so it's quite likely that that will be more competitive.
BTW I didn't intend "<=5 tags per file" to be a criticism -- I don't have time to chuck lots of tags on stuff, so only automated tagging will produce heavy tagsets. OTOH, automated tagging has produced up to 120 tags / file in my current TMSU db.

Ah yes, automated tagging.  I'm planning on adding some kind of script for automating tagging easy/common stuff, like mp3 files, but my current position is that dantalian provides the tagging system, and automation can be handled separately, by other scripts and programs.  (For example, I'm doing deduplication with fslint separately instead of doing it "silently" within dantalian)

Thanks for the detailed info.

No problem.

* Cross-filesystem tagging [this is not a hard requirement, but how I've currently got it setup, I have unified tagging for my external hard drive and home directory]. Could be ameliorated by providing a simple 'synchronize tag lists' command (basically just 'look at library A's tagset,  make any tag-directories needed so that library B has all of library A's tags available.')

Yes, by design dantalian doesn't handle that well.  Cross-filesystem tagging is definitely a valid use case, but I'll have to go back to the design board to piece together a good solution.

* Also speed. I'm a prolific user of Python, but it's not great for a CLI program that should have a quick response time[1]. `time dantalian` tells me dantalian spends 0.307s (warm start) just to startup and realize 'oh, the commandline doesn't make sense, I'd better tell the user about that'.
Fortunately this can be somewhat ameliorated by the ability to write queries using just find + sort + comm, no dependencies.

By comparison:
* TMSU does this 'oops, better print help' task in 0.008s
* TMSU can return the tags of a file in 0.021s (3-tag case) to
* TMSU can answer a 2-tag query covering 5377 total files, resolving to 3184 files actually matching the query, in 0.237s. That's still only 66% of the time it takes dantalian to simply start up.

To put this in context, there are three things I think absolutely need to be fast: tag a file, untag a file, run a query. They can currently be fast, but not currently through Dantalian:

* tagging requires up to two operations (mktag, tag) per tag, mktag == 0.311s on my system, tag == 0.307s. That makes a tagging take .3 ... .6 of a second. Marginal cost not really much reduced for multiple tags, because mktag only allows you to add one tag at once.
* untagging -- 0.301s
* query -- unknown, throws an AssertError:

time dantalian --root library find python buildfile
Traceback (most recent call last):
  File "/usr/bin/dantalian", line 40, in <module>
    getattr(commands, args.command)(lib, *args.args)
  File "/usr/lib/python3.3/site-packages/dantalian/commands.py", line 183, in find
    r = lib.find(args.tags)
  File "/usr/lib/python3.3/site-packages/dantalian/library.py", line 391, in find
    dpath.pathfromtag(tags[0], self.root)))
  File "/usr/lib/python3.3/site-packages/dantalian/path.py", line 35, in pathfromtag
    assert istag(tag)
AssertionError

Thanks for the detailed info.  Let me see if I can lay everything out.

First, Python has a pretty heavy time cost just starting up the Python interpreter.  Unavoidable, unless ported to C.

Second, dantalian's operations are comparatively more expensive due to path computations (string manipulation), whereas TMSU (I'm guessing), can use most user input directly.

Third, dantalian is completely unoptimized.  I'm sure there's room for improvement there.

There are a couple of solutions/workarounds here:

dantalian makes it a point to be transparent.  All of its operations can be done with the standard utilities:

mktag = mkdir
tag = ln
untag = rm

and so on.  dantalian merely provides a prettier/smarter interface.

The dantalian script commands were intended for interactive use from a shell.  I never timed it, but when I use it, I don't notice any lag.  (This is the old, 0.0001s is the same as 0.01s spiel)  (Still, I'll take your observations into account.)

dantalian exposes a Python API, so for heavy automation/plugins for other applications, that is preferred.  At any rate, it will remove the repeated startup time of the Python interpretion every single time you're performing an operation.

Also, since I've strived to make dantalian so transparent, it shouldn't be too hard to write a C port for the basic tagging operations.

In any case, I'll make sure to keep timing in mind.

Btw, tag dirs are not going into library/, in fact they're being chucked into whatever directory I'm in! I guess this is a bug too? Maybe it's the cause of the above assert?

* query -- unknown, throws an AssertError:

time dantalian --root library find python buildfile
Traceback (most recent call last):
  File "/usr/bin/dantalian", line 40, in <module>
    getattr(commands, args.command)(lib, *args.args)
  File "/usr/lib/python3.3/site-packages/dantalian/commands.py", line 183, in find
    r = lib.find(args.tags)
  File "/usr/lib/python3.3/site-packages/dantalian/library.py", line 391, in find
    dpath.pathfromtag(tags[0], self.root)))
  File "/usr/lib/python3.3/site-packages/dantalian/path.py", line 35, in pathfromtag
    assert istag(tag)
AssertionError

This is probably due to inadequate documentation on my part, since I wanted to push out 0.5 quickly.

For the former, dantalian handles absolute and relative paths as paths to directories (tags indirectly), whereas tags are prefixed with '//':

in library/foo, d mktag bar makes library/foo/bar, d mktag ../bar makes library/bar, d mktag //bar makes library/bar, d mktag //foo/bar makes library/foo/bar

(Note that I alias d to dantalian.  I don't do this explicitly and leave it to the user whether they want an alias or not)

The latter is probably due to a tag not found due to the above.  Yes, I have improving user-friendliness of scripts on my todo list.

* hardlinks make my head explode when backing up. You just can't copy them cross-filesystem without some careful trickery.

It takes the same kind of mindset as working with pointers in C.  Copying is easy, though: rsync -H copies everything and preserves hard links.

* finally, the ability to accidentally put things only-in-the-library scares me (I'm sure dantalian has safeguards against this, but I don't want to be able to do it. Consider if I load in a piece of art, ink it, and then save in the 'inked' tag directory. This feels (maybe mistakenly) like an accident waiting to happen.

Actually, dantalian doesn't have a safeguard against this, because that is how dantalin is meant to be used! In a traditional (database-driven) tagging system, you have the file stored in the file system and the tagging metadata separate.

However, in dantalian, storing the file in the file system and organizing it with tags is the same thing!  There's no need to organize your files in the file system AND use tags; you do both at the same time in the same way.

File systems were designed to store and organize files.  My line of thinking is, take full advantage of this, which was designed for the purpose of organizing files, instead of having to organize them one way, and have a separate system as well.

I can understand if that is a little hard to grasp or seems scary.  The feeling of lack of "certainty" is mostly illusionary (The files won't suddenly evaporate.  The main danger is saving/tagging a file with only one tag, then untagging it before tagging it as something else.  If you tag first then untag, or simply 'mv' the file, you're safe).  However, dantalian gives you the flexibility of storing all your files separately, then tagging them in a library, if you so choose.  But it offers no safeguards for doing so, and cannot do so reasonably, as (remember) everything is happening directly on the file system level, with minimal tracking data stored by dantalian.

Things I do like in dantalian:

* The design. The directory-tagging handling scares me, but otherwise, brilliant.
* Hierarchical tags. Yes, more of this!!! Absolutely YES!
* Stuff being just there in the filesystem, without needing to do anything.
* Files don't have 'not in the db' status. They just aren't tagged yet. This makes intersection(filelist, X tag) easy to implement.

Thanks, I like it too, and I want it to be the best it can be!

Offline

#19 2014-01-02 05:38:34

likytau
Member
Registered: 2012-09-02
Posts: 142

Re: dantalian — Transparent tag-based file organization system

darkfeline wrote:

Ah yes, automated tagging.  I'm planning on adding some kind of script for automating tagging easy/common stuff, like mp3 files, but my current position is that dantalian provides the tagging system, and automation can be handled separately, by other scripts and programs.  (For example, I'm doing deduplication with fslint separately instead of doing it "silently" within dantalian)

I'd be happy to contribute any shell scripts I put together for autotagging, which I will if I get into using dantalian seriously.

First, Python has a pretty heavy time cost just starting up the Python interpreter.  Unavoidable, unless ported to C.

Second, dantalian's operations are comparatively more expensive due to path computations (string manipulation), whereas TMSU (I'm guessing), can use most user input directly.

Third, dantalian is completely unoptimized.  I'm sure there's room for improvement there.

True, 'time python -c "print()"' returns 0.095 total time, so there's a margin of about 0.200s that could be improved on, on my system.
using -O when generating bytecode will probably give some gains.

There are a couple of solutions/workarounds here:

dantalian makes it a point to be transparent.  All of its operations can be done with the standard utilities:

mktag = mkdir
tag = ln
untag = rm

and so on.  dantalian merely provides a prettier/smarter interface.

The dantalian script commands were intended for interactive use from a shell.  I never timed it, but when I use it, I don't notice any lag.  (This is the old, 0.0001s is the same as 0.01s spiel)  (Still, I'll take your observations into account.)

In my observations, about .150 to .200 s is tolerable, and I can't tell the difference between .100 and anything faster.
Speed concerns could also be ameliorated with better batching (for example, 'mktag' could accept multiple arguments. 'tag' and 'untag' already do, so this would also improve consistency.)

dantalian exposes a Python API, so for heavy automation/plugins for other applications, that is preferred.  At any rate, it will remove the repeated startup time of the Python interpretion every single time you're performing an operation.

That's a good point, and I do have applications in mind for that.

Also, since I've strived to make dantalian so transparent, it shouldn't be too hard to write a C port for the basic tagging operations.

In any case, I'll make sure to keep timing in mind.

in library/foo, d mktag bar makes library/foo/bar, d mktag ../bar makes library/bar, d mktag //bar makes library/bar, d mktag //foo/bar makes library/foo/bar

Yes, this is exactly what I needed to understand, thanks!

(I now find myself doing ls //<tab> .. and of course it fails, but maybe I'll write a bash/zsh completer for that.)

It takes the same kind of mindset as working with pointers in C.  Copying is easy, though: rsync -H copies everything and preserves hard links.

Good tip, thanks!

* finally, the ability to accidentally put things only-in-the-library scares me (I'm sure dantalian has safeguards against this, but I don't want to be able to do it. Consider if I load in a piece of art, ink it, and then save in the 'inked' tag directory. This feels (maybe mistakenly) like an accident waiting to happen.

Actually, dantalian doesn't have a safeguard against this, because that is how dantalin is meant to be used! In a traditional (database-driven) tagging system, you have the file stored in the file system and the tagging metadata separate.

However, in dantalian, storing the file in the file system and organizing it with tags is the same thing!  There's no need to organize your files in the file system AND use tags; you do both at the same time in the same way.

File systems were designed to store and organize files.  My line of thinking is, take full advantage of this, which was designed for the purpose of organizing files, instead of having to organize them one way, and have a separate system as well.

I can understand if that is a little hard to grasp or seems scary.  The feeling of lack of "certainty" is mostly illusionary (The files won't suddenly evaporate.  The main danger is saving/tagging a file with only one tag, then untagging it before tagging it as something else.  If you tag first then untag, or simply 'mv' the file, you're safe).  However, dantalian gives you the flexibility of storing all your files separately, then tagging them in a library, if you so choose.  But it offers no safeguards for doing so, and cannot do so reasonably, as (remember) everything is happening directly on the file system level, with minimal tracking data stored by dantalian.

Yes, I understand that hardlinks are not special and in fact all ordinary files are hardlinks (usually with only a single hardlink per inode, as opposed to the many you can acquire through using a system like dantalian). I guess it just feels like, when it has no 'official' location, the file is lost. Not anything really rational, it's probably just a process of acclimatization I'll have to do.

Things I do like in dantalian:

* The design. The directory-tagging handling scares me, but otherwise, brilliant.
* Hierarchical tags. Yes, more of this!!! Absolutely YES!
* Stuff being just there in the filesystem, without needing to do anything.
* Files don't have 'not in the db' status. They just aren't tagged yet. This makes intersection(filelist, X tag) easy to implement.

Thanks, I like it too, and I want it to be the best it can be!

Yes, that really comes through smile

Last edited by likytau (2014-01-02 05:43:35)

Offline

#20 2014-01-07 12:34:24

likytau
Member
Registered: 2012-09-02
Posts: 142

Re: dantalian — Transparent tag-based file organization system

You may want to be aware that certain apps are extremely unfriendly to 'dantalian tags':

find: File system loop detected; `/home/kau/.googleearth/instance-running-lock/task/4236/root/sys/bus/cpu/devices/cpu0/node0/cpu1/firmware_node/subsystem/devices/PNP0400:00/physical_node/subsystem/drivers/serial/00:06/tty/ttyS0/subsystem/ttyS3/device/subsystem/drivers/iTCO_wdt/iTCO_wdt/subsystem' is part of the same file system loop as `/home/kau/.googleearth/instance-running-lock/task/4236/root/sys/bus/cpu/devices/cpu0/node0/cpu1/firmware_node/subsystem/devices/PNP0400:00/physical_node/subsystem/drivers/serial/00:06/tty/ttyS0/subsystem/ttyS3/device/subsystem'

(repeated with longer/shorter variants for EVER.. or at least until you Ctrl+C)

This is because they effectively symlink to /. WineStuff does that directly. Google Earth seems to link 'instance-running-lock' to /proc/4236/ , which contains task which contains root.. which is /.

Lesson: Don't init in your home directory. It works okay until you want to check the list of tags of something.

(other info: WineStuff also points symlinks at the content of your home directory. In this case 'find' does not get confused though, it just notes that there is a loop.)

(fwiw, the command failed, ultimately, because I had some files that were chowned root.root in my homedir.. this caused find to return status 1, AFAICS.)

Offline

#21 2014-01-17 09:28:55

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

likytau wrote:

You may want to be aware that certain apps are extremely unfriendly to 'dantalian tags':

find: File system loop detected; `/home/kau/.googleearth/instance-running-lock/task/4236/root/sys/bus/cpu/devices/cpu0/node0/cpu1/firmware_node/subsystem/devices/PNP0400:00/physical_node/subsystem/drivers/serial/00:06/tty/ttyS0/subsystem/ttyS3/device/subsystem/drivers/iTCO_wdt/iTCO_wdt/subsystem' is part of the same file system loop as `/home/kau/.googleearth/instance-running-lock/task/4236/root/sys/bus/cpu/devices/cpu0/node0/cpu1/firmware_node/subsystem/devices/PNP0400:00/physical_node/subsystem/drivers/serial/00:06/tty/ttyS0/subsystem/ttyS3/device/subsystem'

(repeated with longer/shorter variants for EVER.. or at least until you Ctrl+C)

This is because they effectively symlink to /. WineStuff does that directly. Google Earth seems to link 'instance-running-lock' to /proc/4236/ , which contains task which contains root.. which is /.

Lesson: Don't init in your home directory. It works okay until you want to check the list of tags of something.

(other info: WineStuff also points symlinks at the content of your home directory. In this case 'find' does not get confused though, it just notes that there is a loop.)

(fwiw, the command failed, ultimately, because I had some files that were chowned root.root in my homedir.. this caused find to return status 1, AFAICS.)

Good catch.  I use dantalian as an archive (e.g., music archive in Music, picture archive in Pictures), so I wouldn't have hit this... bug?

The obvious solution would be to use whitelists/blacklists.  Maybe a sane default to blacklist dotfiles...

However, you shouldn't be running into any problems with dantalian init.  All init does is create .dantalian and some tracking text files.

Offline

#22 2014-01-17 18:42:48

ANOKNUSA
Member
Registered: 2010-10-22
Posts: 2,141

Re: dantalian — Transparent tag-based file organization system

I discovered dantalian a while back and installed it, but only recently got around to using it. I've organized my ebook library, and like how it works a lot. I'll be working on my music library next. Thanks for the work; I'm looking forward to what the future might hold for this project.

Offline

#23 2014-05-16 10:02:53

maikelus
Member
Registered: 2014-05-16
Posts: 2

Re: dantalian — Transparent tag-based file organization system

I was looking for a file-tagging system for a long time, and tried quite a lot, like tagsistant or tmsu. But I didn't like they depend on a database, until I found dantalian, whose approach is really interesting and a quite mature library. My congratulations!

I want to build some GUI to make file organisation easier, and I've been playing with the possibilities of dantalian. I've found some thinks that hasn't worked to me and I want to know if I'm doing something wrong, don't understand or are not tested.

The fisrt one is the library command for intersection tag search. According to the manual it should be "dantalian find TAG1 TAG2" to list the file(s) that has(ve) the tags TAG1 and TAG1, but instead it gives me an error on module path.py: "/usr/local/lib/python3.2/dist-packages/dantalian/path.py"

The second one is related to FUSE. I had no way to make usage of "dantalian mknode PATH TAGS ..." which is supposed to be the equivalent of "dantalian tag TAG FILE ..." in a FUSE mounted library, or not? It returns an error on "/usr/local/lib/python3.2/dist-packages/dantalian/commands.py"

The third one is also FUSE related and is referred to the usage of "virtual tag groups" when mounting a filesystem, according to the JSON file ./dantalian/mount. As an example:

[
   {"mount": "home/maikelus/my_library",
   "tags": ["green", "round", "large"]},
]

When I do "mount /home/maikelus/my_library" it does correctly but no new "tags" tag group is created

Thanks a lot for your patience and thanks again for this useful library.

Regards

Offline

#24 2014-05-19 04:27:34

darkfeline
Member
Registered: 2012-02-14
Posts: 94

Re: dantalian — Transparent tag-based file organization system

Sorry, your English is a little hard to parse for me.

maikelus wrote:

The fisrt one is the library command for intersection tag search. According to the manual it should be "dantalian find TAG1 TAG2" to list the file(s) that has(ve) the tags TAG1 and TAG1, but instead it gives me an error on module path.py: "/usr/local/lib/python3.2/dist-packages/dantalian/path.py"

Can you post the entire error message/stack trace?  The tag search code
is pretty simple, so it's likely a typo or similar error on your part.
I really do need to rewrite the UI though, leaking errors to the user is
bad design on my part.

The second one is related to FUSE. I had no way to make usage of
"dantalian mknode PATH TAGS ..." which is supposed to be the equivalent
of "dantalian tag TAG FILE ..." in a FUSE mounted library, or not? It
returns an error on
"/usr/local/lib/python3.2/dist-packages/dantalian/commands.py"

It's not.  `mknode` is for creating virtual tag groups, the subject of
your next question.  Don't be shy about posting the entire error
message, otherwise it's hard for me to tell what's going on.

The third one is also FUSE related and is referred to the usage of "virtual tag groups" when mounting a filesystem, according to the JSON file ./dantalian/mount. As an example:

[
   {"mount": "home/maikelus/my_library",
   "tags": ["green", "round", "large"]},
]

When I do "mount /home/maikelus/my_library" it does correctly but no new "tags" tag group is created

Hmm, I've removed the documentation for .dantalian/mount quite a long
time ago, I think.  Now you use `mknode` and `rmnode` to create virtual
tag groups, and those groups will be saved between mounts.  Instead of
manually editing JSON, it sounds like what you are trying to do would be
accomplished in the latest version of dantalian with something like the
following:

$ dantalian mount <paths>
$ dantalian mknode green-and-round-and-large //green //round //large

I encourage you to read the documentation for dantalian.  If anything on
there is unclear or needs further explanation, ask me so I can improve
the docs.

Offline

#25 2014-05-23 11:00:26

maikelus
Member
Registered: 2014-05-16
Posts: 2

Re: dantalian — Transparent tag-based file organization system

(First of all, sorry for my English)

My first question, about intersection tag search with command "dantalian find tag1 tag2" is solved. It should be used with double slash at the beginning:

$ dantalian find //images/large //images/color        # Correct

Maybe this should be fixed in the documentation.

It's still a little confusing to me, as to tag files you can use both forms without or with double slash, but not with only one at the beginning:

$ dantalian tag images/large ../files/photo1.jpg      # Correct
$ dantalian tag //images/large ../files/photo1.jpg      # Correct
$ dantalian tag /images/large ../files/photo1.jpg      # Not correct

But this is probably something I miss.

The second question is about "virtual tag groups" when mounting a file system with FUSE.

References to ./dantalian/mount are still in the documentation (in dantalian User Reference), so it should probable be removed.

I'm a novice to FUSE, so I can only foresee its powerfulness but not how it works.

After mounting a library ...

$ dantalian mount ~/my_library &

... I try to make a virtual tag group

$ dantalian mknode ~/my_library/large-color //images/large //images/color

but error message appears:
"Socket command can only be used with fuse"

Is the syntax correct?

Thanks again.

Offline

Board footer

Powered by FluxBB