You are not logged in.

#1 2010-01-07 13:16:29

rekado
Member
From: Shanghai, China
Registered: 2009-01-13
Posts: 98
Website

Unicode characters in text file on flash drive (FAT) scrambled

Hi Archers,
I have a cell phone which I use as a temporary storage device. The file system of the cell's disk is FAT. The disk has its own entry in /etc/fstab:

/dev/disk/by-label/phone   /home/fluffy/phone   vfat   defaults,umask=0000,users    0    0

On the disk there is a C++ source code file "sourcecode.cpp", which contains comments in Simplified Chinese.

$ file ~/phone/sourcecode.cpp
$ file ~/local_backup/sourcecode.cpp

both yield....

sourcecode.cpp: ISO-8859 C program text, with CRLF line terminators

My system's locale is set to en_US.UTF-8.

The Problem
While the contents of the file are displayed correctly on the phone itself (modified version of WinCE), the Chinese comments are garbled when cat-ing the file on either the console (running fbterm) or a terminal emulator in X (urxvt). The result is the same, no matter if I access the original file on the phone or the local copy. I do have a number of file with Chinese filenames which do get displayed correctly; I've noticed however, that copying files with Chinese names from the phone to the local disk results in them being displayed incorrectly.

I've been trying different mounting rules for the disk, but to no avail:

/dev/disk/by-label/phone   /home/fluffy/phone   vfat   noauto,umask=0000,users,iocharset=utf8,codepage=936    0    0

Also, I've tried using iconv to convert the Chinese comments in the file from garbage back to Chinese:

iconv -f ISO-8859-1 -t UTF8 sourcecode.cpp

This changes the type of garbage, yet, it stays gargabe (GIGO).

Any ideas how to get rid of this severe encoding irritation?

Offline

#2 2010-01-07 15:02:22

rekado
Member
From: Shanghai, China
Registered: 2009-01-13
Posts: 98
Website

Re: Unicode characters in text file on flash drive (FAT) scrambled

Hmpf... I gave up just a tad too early.
The following manages to change garbage into valid Simplified Chinese:

iconv -f GB2312 sourcecode.cpp

Now, the question is only how to make that happen automatically and dynamically. How come using cat on a file on a FAT drive leads to partially  scrambled output? I'd like to not have to care about file encodings at all.

I know that FAT has many issues, but how come the files are correctly displayed in Windows (also an English locale)?

Last edited by rekado (2010-01-07 15:02:41)

Offline

#3 2010-01-07 16:33:42

jxy
Member
Registered: 2008-12-03
Posts: 133

Re: Unicode characters in text file on flash drive (FAT) scrambled

It is not an issue related with file system.  It is your mobile phone OS saving Chinese texts using GB2312 encoding, while your locale settings in linux is UTF-8.  Read more at http://en.wikipedia.org/wiki/Character_encoding.

Offline

#4 2010-01-07 16:41:47

rekado
Member
From: Shanghai, China
Registered: 2009-01-13
Posts: 98
Website

Re: Unicode characters in text file on flash drive (FAT) scrambled

I know about character encodings. After reading `man mount` I was assuming that with the iocharset option in /etc/fstab charsets would be automatically converted to UTF8.
Still, why is it that opening the file from Windows doesn't render the contents incorrectly? As far as I know, the locale in my Windows installation at work are set to ISO8859-1 and not to the Chinese GB2312.

Offline

Board footer

Powered by FluxBB