You are not logged in.

#1 2007-08-28 20:29:59

msoltyspl
Member
From: Poland
Registered: 2007-07-09
Posts: 11

non 8859-1 charset will cause improper unicode codes (fix included)

I already made a bug report, but it's been a while since anyone commented on it - so I decided to post it here as well, as some people could need it.

Although the rationale behind the change is explained in that bug report, a short recap here:

When dumpkeys / loadkeys combo is used to reload keymap in unicode mode, there's one thing that got forgotten in rc.sysinit - dumpkeys without -c option, dump assuming iso-8859-1 by default - after all, no information about charset is kept after earlier loadkeys.

The final effect of this is subtle but potientially troublesome - for example if you decided to use "pl" keymap (iso-8859-2), it will be dumped as iso-8859-1 (so more or less one to one), and reloaded as such - thus giving you iso-8859-2 keymap in unicode.

This is probably NOT what you want. In unicode mode you expect file names encoded using utf-8. and using proper unicode codes. If we look at letter 'Ą' - its unicode is 0x104, and in iso-8859-2 - 0xA1. So operating in unicode, you expect to have filenames using 0x104 (encoded in utf-8). Currently in Arch you will end with 0xA1 (encoded in utf-8). Another caveat is if you for example mounted ntfs partition through ntfs-3g - you will find yourself unable to reference any files with national-specific characters. And so on...

It's also hard to spot - as plenty of fonts will function properly in both situations - for example iso-8859-2 specific fonts with proper unicode map. But if you happen to use pure unicode font - you will notice the problems immediately. Or log from windows using putty in utf-8, or etc.

Be advised, that if you currently have filenames with national characters, you have a bit of a "mess" in filenames - as mentioned above, utf-8 encoded, but most likely not under proper unicode codes.

Concept patches:

--- rc.sysinit    2007-08-15 06:01:59.000000000 +0200
+++ rc.sysinit.new    2007-08-28 22:08:46.000000000 +0200
@@ -339,7 +339,7 @@
 if [ "$(echo $LOCALE | /bin/grep -i utf)" ]; then
     stat_busy "Setting Consoles to UTF-8"
     /usr/bin/kbd_mode -u
-    /usr/bin/dumpkeys | /bin/loadkeys --unicode
+    /usr/bin/dumpkeys ${KEYMAP_CHARSET:+"-c${KEYMAP_CHARSET}"} | /bin/loadkeys --unicode
     # the $CONSOLE check helps us avoid this when running scripts from cron
     echo 'if [ "$CONSOLE" = "" -a "$TERM" = "linux" -a -t 1 ]; then echo -ne "\e%G"; fi' >>/etc/profile.d/locale.sh
     stat_done
--- rc.conf    2007-01-29 22:51:35.000000000 +0100
+++ rc.conf.new    2007-08-28 22:07:18.000000000 +0200
@@ -19,6 +19,7 @@
 HARDWARECLOCK="localtime"
 TIMEZONE="Canada/Pacific"
 KEYMAP="us"
+KEYMAP_CHARSET=
 CONSOLEFONT=
 CONSOLEMAP=
 USECOLOR="yes"

Analogus changes should be made in initcpio's scripts, should you use keymap hook.

Offline

#2 2007-09-13 12:18:08

hoppik
Member
From: Czech Republic
Registered: 2007-09-12
Posts: 7
Website

Re: non 8859-1 charset will cause improper unicode codes (fix included)

Hello,

i use this patch for fix my problem:

rc.conf:

LOCALE="cs_CZ.UTF-8"
HARDWARECLOCK="UTC"
TIMEZONE="Europe/Prague"
KEYMAP="cz"
KEYMAP_CHARSET="iso-8859-2"
CONSOLEFONT="lat2a-16.psfu"
CONSOLEMAP=
USECOLOR="yes"

/etc/profile:

export LANG="cs_CZ.UTF-8"
export LANGUAGE="cs_CZ.UTF-8"
export LC_CTYPE="cs_CZ.UTF-8"
export LC_MESSAGES="cs_CZ.UTF-8"
export LC_ALL="cs_CZ.UTF-8"
export LC_COLLATE="C"

Diacritics from putty is OK. I can write

ěščřžýáíé
ĚŠČŘŽÝÁÍÉ

and diacritics in all aplications is OK.

But, for example /etc/motd with diacritics is from putty displayed correct,
but from terminal in Arch is bad.

If i have saved from putty file with diacritics charakters and
type on Arch: cat file.txt, diacritics is displayed OK.

If i am on Arch Linux, i can type all small letters with diacritics
without problem.
Because capital letters with diacritics is still problem:
if i activate CAPS LOCK and typing letters with diacritics,
on terminal are small letters with diacritics. If i type first
diacritical punkt, then capital letter, diacritics is bad only
for letters ĚŠČŘŽ, letters ÝÁÍÉ are displayed correct.

Thank you very much for my help and sorry for my english!

Offline

#3 2008-06-25 14:12:16

ConnorBehan
Package Maintainer (PM)
From: Long Island NY
Registered: 2007-07-05
Posts: 1,359
Website

Re: non 8859-1 charset will cause improper unicode codes (fix included)

Incidentally this solved the problem I was having where "Setting consoles to UTF-8 [BUSY]" resulted in a freeze half the time. When I checked rc.sysinit, it had some for loop instead of the line with /usr/bin/dumpkeys, but changing it over to your line still worked.


6EA3 F3F3 B908 2632 A9CB E931 D53A 0445 B47A 0DAB
Great things come in tar.xz packages.

Offline

Board footer

Powered by FluxBB