You are not logged in.

#1 2011-11-06 12:16:08

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

[Solved] Convert a binary file to unicode file

How can I convert a binary encoded file to a unicode encoded file?

Last edited by Aegidius (2011-11-06 20:13:55)

Offline

#2 2011-11-06 12:20:44

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: [Solved] Convert a binary file to unicode file

Your source file must be in a language.

Can use iconv.

Offline

#3 2011-11-06 12:36:56

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

I tried with iconv -f binary -t uff-8 worldcitiespop.txt > output.txt but it says

iconv: le conversioni da "binary" e verso "uff-8" non sono supportate

that means that the conversion is not supported.

Offline

#4 2011-11-06 14:10:01

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Convert a binary file to unicode file

Run

LC_ALL=C <command>

to get output in English.
If you enabled C locale, it should work.


How did you get this file, why such conversion is necessary?

Offline

#5 2011-11-06 14:35:46

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

The file is located here.

I neet to convert it to manage it with Java and create a SQL query for my table.

Do copy and paste with KWrite works, but I can't do it because that file has huge number of line and my PC is not good enough to manage it. Would you please do that for me and post the result? sad

Offline

#6 2011-11-06 14:48:36

Vain
Member
Registered: 2008-10-19
Posts: 179
Website

Re: [Solved] Convert a binary file to unicode file

Hi,

that file is compressed (with gzip). Twice. Compressing a file twice usually is a mistake.

$ file worldcitiespop.txt.gz
worldcitiespop.txt.gz: gzip compressed data, from Unix
$ zcat worldcitiespop.txt.gz > worldcitiespop.txt
$ file worldcitiespop.txt
worldcitiespop.txt: gzip compressed data, was "worldcitiespop.txt", from Unix, last modified: Wed Apr 27 22:54:01 2011
$ zcat worldcitiespop.txt > worldcitiespop2.txt
$ file worldcitiespop2.txt 
worldcitiespop2.txt: ISO-8859 text

This has nothing to do with encodings like unicode, you simply have to uncompress it (twice).

Offline

#7 2011-11-06 14:56:34

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Convert a binary file to unicode file

You can try 'split' to cut the big file into manageable pieces.

[karol@black test]$ wc -l < worldcitiespop.txt 
2797246

Last edited by karol (2011-11-06 14:59:28)

Offline

#8 2011-11-06 15:05:47

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

Vain wrote:

Hi,

that file is compressed (with gzip). Twice. Compressing a file twice usually is a mistake.

$ file worldcitiespop.txt.gz
worldcitiespop.txt.gz: gzip compressed data, from Unix
$ zcat worldcitiespop.txt.gz > worldcitiespop.txt
$ file worldcitiespop.txt
worldcitiespop.txt: gzip compressed data, was "worldcitiespop.txt", from Unix, last modified: Wed Apr 27 22:54:01 2011
$ zcat worldcitiespop.txt > worldcitiespop2.txt
$ file worldcitiespop2.txt 
worldcitiespop2.txt: ISO-8859 text

This has nothing to do with encodings like unicode, you simply have to uncompress it (twice).

Don't tell it to me. The text file is in the archive.

Offline

#9 2011-11-06 15:36:48

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Convert a binary file to unicode file

Solved?

Offline

#10 2011-11-06 15:57:42

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

I tried to give wc -l < worldcitiespop.txt command but nothing changed sad

Offline

#11 2011-11-06 16:00:46

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Convert a binary file to unicode file

Aegidius wrote:

I tried to give wc -l < worldcitiespop.txt command but nothing changed sad

Errr ... this command tells you how many lines there are in the file.

You need to use zcat twice, like Vain wrote.

[karol@black test]$ zcat worldcitiespop.txt.gz > worldcitiespop.txt
[karol@black test]$ zcat worldcitiespop.txt > foo.txt
[karol@black test]$ file foo.txt
foo.txt: ISO-8859 text

Offline

#12 2011-11-06 16:57:58

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

Thank you very much smile It worked, except for the char encoding that is ISO-8859 sad

I need unicode encoding. What can I do?

Last edited by Aegidius (2011-11-06 17:04:33)

Offline

#13 2011-11-06 17:21:16

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Convert a binary file to unicode file

Aegidius wrote:

I need unicode encoding. What can I do?

Have you tried iconv?

Offline

#14 2011-11-06 19:59:57

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

I tried iconv -f ISO-8859 -t UTF-8 foo2.txt but I get this error message

iconv: la conversione da "ISO-8859" non è supportata

It says that ISO-8859 is not supported sad

Offline

#15 2011-11-06 20:08:00

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Convert a binary file to unicode file

Please read the man page:

[karol@black test]$ iconv -f ISO-8859-1 -t UTF-8 foo2.txt > foo3.txt
[karol@black test]$ file foo3.txt 
foo3.txt: UTF-8 Unicode text

Offline

#16 2011-11-06 20:13:34

Aegidius
Member
From: Italy
Registered: 2011-06-29
Posts: 288
Website

Re: [Solved] Convert a binary file to unicode file

I really thank you smile You are so kind smile

That worked big_smile

Offline

Board footer

Powered by FluxBB