You are not logged in.

#1 2009-07-23 10:42:12

antis
Member
From: sweden
Registered: 2007-05-18
Posts: 108

locale and character encoding. What to do about these dreadful ÅÄÖ??

It's time for me to get it into my head how this works. Please, help me understand before I go nuts.
I'm from Sweden and we use a few of these weird characters like ÅÄÖ.

If I create a file called "övrigt.txt" in windows, then the file will turn up as "?vrigt.txt" on my Linux pc (At least in the console, sometimes it looks ok in other apps in X). The same is true if I create the file in Linux and copy it to Windows, it will look just as weird on the other side.

As I (probably) can't change the way windows works, my question is what I have to do to have these two systems play nicely with eachother?

This is the output from locale:

LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE=C
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

Is there anything here I should change? I have tried using ISO-8859-1 with no luck. Mind you that I want to have the system wide language set to english. The only thing I want to achieve is that "Ö" on widows should turn up as "Ö" i Linux as well, and vice versa.

Please save my hair from being torn off, I'm going bald here...

Offline

#2 2009-07-26 11:45:59

thisoldman
Member
From: Pittsburgh
Registered: 2009-04-25
Posts: 1,172

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

I've already gone bald. I have very little hair remaining to tear out.

The problem you have is due to the different character encodings: Windows-1252 and UTF-8.  The utilities to change the encodings are iconv, recode and  convmv.

iconv is probably already installed. The other two are available in extra. iconv is older and there are many tutorials on the internet.  recode was designed to replace iconv. The info pages for recode include a tutorial.  convmv is used to translate just the filenames from one encoding to another.

How to get the encoding changes to be automatic is beyond me.  You might try a small vfat partition to store the problem files, using the proper codepage:

/dev/sda1  /dos/c  vfat  umask=002,gid=35,codepage=850

That might work. Someone probably has an elegant solution; I'm sorry that this is just a kludge.

Offline

#3 2009-07-26 12:33:04

schuay
Package Maintainer (PM)
From: Austria
Registered: 2008-08-19
Posts: 564

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Hi, this bug report might be relevant for you: http://bugs.archlinux.org/task/7549
It says "gnome-mount" in the title but afaik it applies globally, not only to gnome.

To preserve specials chars, mount with "-o iocharset=utf8" works for me.

Last edited by schuay (2009-07-26 12:34:01)

Offline

#4 2009-07-27 18:50:50

quarkup
Member
From: Portugal
Registered: 2008-09-07
Posts: 497
Website

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

have you edited

/etc/locale.gen
and updated locale (there is an executable, read the wiki) ??


If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.
Simplicity is the ultimate sophistication.

Offline

#5 2009-07-27 22:19:50

pkerwien
Member
From: Sweden
Registered: 2009-07-06
Posts: 14

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

How do you share your files between the systems; Samba, mounted disk, FTP or something else?

PS. Jag är oxå från Sverige, så man kanske kan hjälpa en Svensson i nöd ;-) DS.


Linux is just like an indian tent: no Gates, no Windows and an Apache inside.

Offline

#6 2009-07-28 07:11:03

EVRAMP
Member
From: Czech Republic
Registered: 2008-10-03
Posts: 173
Website

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Hi antis, if you are talking about USB flash disc then the problem is that some file managers refuse to mount FAT32 file system with UTF-8 encoding since there is not official support for this in kernel. If you want to be able to have correct encoding use a) other file manager (PCManFM works for me) b) make your file manager mount disc with uf8 encoding. At first i tried the second option (http://bbs.archlinux.org/viewtopic.php?id=73804) but i could not get it to work.

Offline

#7 2009-07-28 08:15:45

antis
Member
From: sweden
Registered: 2007-05-18
Posts: 108

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Hey, thanks for all the answers!

I share my files in a number of ways, but mainly trough a web application called Ajaxplorer (very nice btw...). The thing is that as soon as a windows user uploads anything with special chatacters in the file name my programs, xbmc, console etc, refuses to read them correctly. Other ways of sharing is through file copying with usb sticks, ssh etc. It's really not the way of sharing that is the problem I think, but rather the special characters being used sometimes.

I could probably convert the filenames with suggested applications but then I'll set the windows users in trouble when they want to download them again, won't I?

I realize that it's cp1252 that is the bad guy in this drama. Is there no way to set/use cp1252 as a character encoding in Linux? It's probably a bad idea as utf8 seems like the future way to go, but the fact that these two OS's can't communicate too well in this area is pretty useless if you ask me.

To wrap this up I'll answer some questions...
@EVRAMP: I'm actually using pcmanfm, but that is only for me and I'm not dealing very often with vfat partitions to be honest.
@pkervien: Well, I think I mentioned my forms of sharing above. (kul med lite arch-svenskar!)
@quarkup: locale.gen is edited and both sv.SE and en_US have utf-8 and ISO-8859 enabled and generated.

...and to clearify things even further. It doesn't matter if I get or provide a file via a usb stick, samba, ftp or by paper. All I want is for "Ö" to always be "Ö", everywhere.

I can't believe how hard this is to get around. Linus is finish for crying out loud. I thought he'd sorted this out the first thing he did. Maybe he doesn't deal with windows or their users at all wink

Offline

#8 2009-07-28 10:08:55

pkerwien
Member
From: Sweden
Registered: 2009-07-06
Posts: 14

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

I installed AjaXplorer (nice to learn about new software!), created a file calles 'testar åäö.txt' in Windows Vista. I uploaded this file using AjaXplorer and I got this on the server side:

[root@pc2 files]# ls
INSTALL-SelectMeAndClickEdit.txt  recycle_bin  testar åäö.txt

I'm using Firefox 3.5.1 in Windows with character encoding set to UTF-8. I have the same locale as you. So unfortunately, I cannot see your problem neutral


Linux is just like an indian tent: no Gates, no Windows and an Apache inside.

Offline

#9 2009-07-28 18:07:56

Neheb
Member
From: Norway
Registered: 2009-05-23
Posts: 39

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

The font you are using probably don't have support for the special characters. Had the same problem first time I installed arch and tried getting norwegian keyboard to work.

Anyway, just try changing the font the terminal is using and it should work.

Offline

#10 2009-07-28 23:48:14

putte_xvi
Member
From: Sweden
Registered: 2009-04-10
Posts: 22

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

The Windows and Linux system don't have to use the same encoding, as long as they agree on the encoding used to communicate. How this is configured is different for each service.

Have you made sure to set LOCALE in /etc/rc.conf and rebooted (or restarted the relevant daemons)? Otherwise the locale might be correct for your regular user but different for your daemons.

antis wrote:

...and to clearify things even further. It doesn't matter if I get or provide a file via a usb stick, samba, ftp or by paper. All I want is for "Ö" to always be "Ö", everywhere.

That didn't really clarify things much. smile Both systems have to agree on how "Ö" is represented on that particular USB stick, Samba share or FTP server.
* NTFS partitions should be mounted with "nls=utf8".
* Samba shares need "iocharset=utf8".
* FAT partitions should probably be mounted with some combination of codepage and iocharset as suggested in other posts.
* FTP clients must be configured to use the encoding used by the FTP server software.
... and so on for other methods of exchange.

Last edited by putte_xvi (2009-07-28 23:50:01)

Offline

#11 2009-07-29 00:27:01

anrxc
Member
From: Croatia
Registered: 2008-03-22
Posts: 834
Website

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

...and when you setup everything they might still be displayed improperly. You need a terminal emulator that can display the character, a font that can display it and a proper locale.


You need to install an RTFM interface.

Offline

#12 2009-07-30 07:31:52

antis
Member
From: sweden
Registered: 2007-05-18
Posts: 108

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Maybe it is in fact just a display issue. I have never thought about that.
I will look into changing the font and also the different mount options for the various ways of mounting.

Any suggestions for a font that would work ok? Especially for the console.

Offline

#13 2009-07-30 08:15:39

schuay
Package Maintainer (PM)
From: Austria
Registered: 2008-08-19
Posts: 564

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

antis wrote:

Maybe it is in fact just a display issue. I have never thought about that.
I will look into changing the font and also the different mount options for the various ways of mounting.

Any suggestions for a font that would work ok? Especially for the console.

Did you look at the bug report / try the suggested mount options?

Offline

#14 2009-07-30 19:36:21

putte_xvi
Member
From: Sweden
Registered: 2009-04-10
Posts: 22

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

antis wrote:

Any suggestions for a font that would work ok? Especially for the console.

The defaults ought to be OK. UTF-8 with Swedish characters works for me in the console with an UTF-8 locale and no special fonts.

But I think we should narrow down the problem first, so try this:

1. Create an empty directory.
2. Put one of the problematic files in it ("åäö.txt" or something).
3. In the directory, run this command:

python -c 'import os; print [ord(c) for c in os.listdir(".")[0]]'

This will reveal how the filename is actually encoded. If the encoding is correct for your locale it must be a font issue. Otherwise you can turn the investigation to how the files are transferred.

Offline

#15 2009-07-31 09:05:41

antis
Member
From: sweden
Registered: 2007-05-18
Posts: 108

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

I found a directory on my current computer which is called "Lär dig spanska-Grundkurs". Of course, in the terminal (and in pcmanfm) it says "L?r dig spanska-Grundkurs"

Your python script outputs the following:

[76, 228, 114, 32, 100, 105, 103, 32, 115, 112, 97, 110, 115, 107, 97, 45, 71, 114, 117, 110, 100, 107, 117, 114, 115]

I recon these are ascii values and the "ä" of value 228 should be displayed correctly if I have utf-8 set up correctly, right?

I have rc.conf set to LOCALE="en_US.utf8" and the CONSOLEFONT set to nothing. I guess that makes it the default font?

Other than that I have four locales generated from locale-gen:
en_US.UTF-8
en_US.ISO-8859-1
sv_SE.UTF-8
sv_SE.ISO-8859-1

Have I missed anything?

Offline

#16 2009-07-31 12:39:45

putte_xvi
Member
From: Sweden
Registered: 2009-04-10
Posts: 22

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

antis wrote:

I recon these are ascii values and the "ä" of value 228 should be displayed correctly if I have utf-8 set up correctly, right?

Nope, that's latin-1 encoded. ASCII only covers the characters up to 127, and UTF-8 uses two or more bytes for characters outside of ASCII; "ä" would be [195, 164].

So the issue is not with the fonts or the locale, but with whatever you used to transfer the file.

Offline

Board footer

Powered by FluxBB