You are not logged in.

#1 2015-01-08 14:53:53

sas
Member
Registered: 2009-11-24
Posts: 155

Unicode-aware LC_COLLATE sort order that does not ignore dots?

I'm currently using  LC_COLLATE=en_US.utf8  as part of my system locale settings, and it's great that unlike LC_COLLATE=c it understands Unicode (both in the sense that it doesn't break Unicode chars, and that it ignores case differences and diacritical marks when sorting).
But the problem is that it also ignores entire punctuation characters (such as dots), which leads to counter-intuitive and annoying sort orders.

For example:

$ touch foo.txt foo2.txt foó3.txt foo4.txt

$ LC_COLLATE=en_US.utf8 ls
foo2.txt  foó3.txt  foo4.txt  foo.txt

$ LC_COLLATE=c ls
foo.txt  foo2.txt  foo4.txt  fo??3.txt

Neither is satisfactory. This is how those files should be sorted imo:

foo.txt  foo2.txt  foó3.txt  foo4.txt

Is there any LC_COLLATE value which does this?
Surely I can't be the first person to want a sane, Unicode-aware alphabetical sort order for files?

EDIT: Looks like the same question was already asked in 2012 but with no answers... sad

Last edited by sas (2015-01-08 15:07:47)

Offline

Board footer

Powered by FluxBB