You are not logged in.
Your source file must be in a language.
Can use iconv.
Offline
I tried with iconv -f binary -t uff-8 worldcitiespop.txt > output.txt but it says
iconv: le conversioni da "binary" e verso "uff-8" non sono supportate
that means that the conversion is not supported.
Offline
Run
LC_ALL=C <command>
to get output in English.
If you enabled C locale, it should work.
How did you get this file, why such conversion is necessary?
Offline
The file is located here.
I neet to convert it to manage it with Java and create a SQL query for my table.
Do copy and paste with KWrite works, but I can't do it because that file has huge number of line and my PC is not good enough to manage it. Would you please do that for me and post the result?
Offline
Hi,
that file is compressed (with gzip). Twice. Compressing a file twice usually is a mistake.
$ file worldcitiespop.txt.gz
worldcitiespop.txt.gz: gzip compressed data, from Unix
$ zcat worldcitiespop.txt.gz > worldcitiespop.txt
$ file worldcitiespop.txt
worldcitiespop.txt: gzip compressed data, was "worldcitiespop.txt", from Unix, last modified: Wed Apr 27 22:54:01 2011
$ zcat worldcitiespop.txt > worldcitiespop2.txt
$ file worldcitiespop2.txt
worldcitiespop2.txt: ISO-8859 text
This has nothing to do with encodings like unicode, you simply have to uncompress it (twice).
Offline
You can try 'split' to cut the big file into manageable pieces.
[karol@black test]$ wc -l < worldcitiespop.txt
2797246
Last edited by karol (2011-11-06 14:59:28)
Offline
Hi,
that file is compressed (with gzip). Twice. Compressing a file twice usually is a mistake.
$ file worldcitiespop.txt.gz worldcitiespop.txt.gz: gzip compressed data, from Unix $ zcat worldcitiespop.txt.gz > worldcitiespop.txt $ file worldcitiespop.txt worldcitiespop.txt: gzip compressed data, was "worldcitiespop.txt", from Unix, last modified: Wed Apr 27 22:54:01 2011 $ zcat worldcitiespop.txt > worldcitiespop2.txt $ file worldcitiespop2.txt worldcitiespop2.txt: ISO-8859 text
This has nothing to do with encodings like unicode, you simply have to uncompress it (twice).
Don't tell it to me. The text file is in the archive.
Offline
Solved?
Offline
I tried to give wc -l < worldcitiespop.txt command but nothing changed
Errr ... this command tells you how many lines there are in the file.
You need to use zcat twice, like Vain wrote.
[karol@black test]$ zcat worldcitiespop.txt.gz > worldcitiespop.txt
[karol@black test]$ zcat worldcitiespop.txt > foo.txt
[karol@black test]$ file foo.txt
foo.txt: ISO-8859 text
Offline
I need unicode encoding. What can I do?
Have you tried iconv?
Offline
Please read the man page:
[karol@black test]$ iconv -f ISO-8859-1 -t UTF-8 foo2.txt > foo3.txt
[karol@black test]$ file foo3.txt
foo3.txt: UTF-8 Unicode text
Offline