You are not logged in.
I'm trying to find a setting for LC_COLLATE that is:
Case insensitive
Accent/diacritical insensitive
Punctuation sensitive
If I use the default based on my LANG setting, I can achive both 1 & 2. But I can only get 3 if I use LC_COLLATE=C (which negates both 1 & 2).
Here's a silly example which illustrates the issue for me. Let's assume I have files named for people which should be sorted first by honorific, then last name:
$ touch mr.richards Mr.Rogers Mr.Stevens mrs.robinson Mrs.Robinson Mrs.Stephens Mrs.Renee Mrs.Renée Mrs.Reno
$ LC_COLLATE=C ls -x
Mr.Rogers
Mr.Stevens
Mrs.Renee
Mrs.Reno
Mrs.Renée
Mrs.Robinson
Mrs.Stephens
mr.richards
mrs.robinson
$ LC_COLLATE=en_US.UTF-8 ls -x
mr.richards
Mr.Rogers
Mrs.Renee
Mrs.Renée
Mrs.Reno
mrs.robinson
Mrs.Robinson
Mrs.Stephens
Mr.Stevens
What I want in this example is:
mr.richards
Mr.Rogers
Mr.Stevens
Mrs.Renee
Mrs.Renée
Mrs.Reno
mrs.robinson
Mrs.Robinson
Mrs.Stephens
This old PostgreSQL post accurately summarizes what I've discovered via Google and man pages: http://archives.postgresql.org/pgsql-sq … g00078.php
Currently, the relevant collation source is /usr/share/i18n/locales/iso14651_t1_common (which is sourced by /usr/share/i18n/locales/iso14651_t1).
Any suggestions? Is there an additional locale I could enable to give me the features I want? Or am I looking at creating a new locale based on en_US.utf8 that doesn't ignore the characters in the UNDEFINED block of /usr/share/i18n/locales/iso14651_t1_common?
For now, at least, I guess I'm going back to LC_COLLATE=C--if only because I'm accustomed to its quirks after all these years.
Thanks for looking!
Barthel
For the record, here is my current setup:
$ locale -a
C
en_US
en_US.iso88591
en_US.utf8
POSIX
/etc/rc.conf snippet
# LOCALIZATION
# ------------
HARDWARECLOCK="UTC"
TIMEZONE="America/Chicago"
KEYMAP="us"
CONSOLEFONT=
CONSOLEMAP=
LOCALE=
DAEMON_LOCALE="no"
USECOLOR="yes"
$ locale also contents of new /etc/locale.conf
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Any technology distinguishable from magic is insufficiently advanced.
- Cleon, _Foundation's Fear_
Offline