You are not logged in.
Pages: 1
when i do the file command it fails to display the encoding of the text files
example. I have this folder
Booklet-1.jpg: image/jpeg; charset=binary
Booklet-2.jpg: image/jpeg; charset=binary
Booklet-3.jpg: image/jpeg; charset=binary
Booklet-4.jpg: image/jpeg; charset=binary
Booklet-5.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - cd 1.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - cd 2.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - cd 3.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - cd 4.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - front and inside.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - front.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - inside 2.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - inside 3.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - inside 4.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - inside 5.jpg: image/jpeg; charset=binary
Roland Kayn - Tektra - inside 6.jpg: image/jpeg; charset=binary
Roland Kayn-Tektra.txt: text/plain; charset=us-ascii
Tektra-cover.jpg: image/jpeg; charset=binary
nuovo.txt: text/plain; charset=us-asciiit says that the encoding of the text files is us-ascii but it should display UTF-8, at least for the file nuovo.txt that I encoded in UTF-8 to see if the file command worked properly
Last edited by alma ata (2017-09-06 08:45:38)
Offline
Offline
Does nuovo.txt contain any UTF-8 characters? If it only contains characters from the original ASCII set (given that UTF-8 is compatible with ASCII for the first 127 characters), `file` will simply detect it as ASCII.
it contains these characters
Roland Kayn - Tektra (1980-82), Cybernetic Music
4-CD-Box, Label: Barooni
I scanned all the booklet pages for your pleasure in very good quality,
so you can read/print them out (in b/w to keep size small). theres lots of information about his
works starting in 1950, and what his cybernetic music is about...
CD 1:
1. Tanar 1
2. Tanar 2
3. Etoral
CD 2:
1. Khyra 1
2. Khyra 2
3. Khyra 3
CD 3:
1. Tarego 1
2. Tarego 2
3. Tarego 3
4. Rhenit
CD 4:
1. Amarun 1
2. Amarun 2-I
3. Amarun 2-II
uploaded in 10
file under 20c-electroaccustic
from vogel
enjoy it...
but i got some doubts because gvim doesn't read it well, it doesn't begin new lines but when it should it shows the "^M" characters
Last edited by alma ata (2017-09-05 13:43:22)
Offline
Utf-8 and ASCII are identical on the lower 7 bits, so there's nothing to worry here. Your newline issue is unrelated to that.
(If you want it to be detected utf-8, you need to add a unicode BOM, but that will knock out some editors.)
"^M" means CR/LF (and no NL) - very likely the result of a windows editor.
Try "pacman -S dos2unix".
Offline
That file looks like ASCII-only.
About the ^M, please provide the output of this command:
xxd nuovo.txt | head
00000000: 526f 6c61 6e64 204b 6179 6e20 2d20 5465 Roland Kayn - Te
00000010: 6b74 7261 2028 3139 3830 2d38 3229 2c20 ktra (1980-82),
00000020: 4379 6265 726e 6574 6963 204d 7573 6963 Cybernetic Music
00000030: 0d34 2d43 442d 426f 782c 204c 6162 656c .4-CD-Box, Label
00000040: 3a20 4261 726f 6f6e 690d 4920 7363 616e : Barooni.I scan
00000050: 6e65 6420 616c 6c20 7468 6520 626f 6f6b ned all the book
00000060: 6c65 7420 7061 6765 7320 666f 7220 796f let pages for yo
00000070: 7572 2070 6c65 6173 7572 6520 696e 2076 ur pleasure in v
00000080: 6572 7920 676f 6f64 2071 7561 6c69 7479 ery good quality
00000090: 2c0d 736f 2079 6f75 2063 616e 2072 6561 ,.so you can reaYour newline issue is unrelated to that.
is related to gvim
other text editors read the file properly
"^M" means CR/LF (and no NL) - very likely the result of a windows editor.
yes, i think so
Offline
Nope, MacOS - there's only a CR, no LF
You can sed or tr \r to \n to fix this.
I'm pretty sure gvim can handle CR-only - at least vim can.
http://vim.wikia.com/wiki/File_format
Offline
Actually, ^M means just CR. You can see ^M at the end of a line if it's DOS-formatted (CRLF), but in your case (no linebreak), it's just CR (as can be seen in the xxd output).
That's pretty odd - if I'm not mistaken, old Mac OS versions (<=9) used to have that, but AFAIK they switched to UNIX-style linebreaks (LF) from OS X on. How was that file created, exactly?
The dos2unix package seth mentioned earlier comes with a `mac2unix` command; you could try to fix it with that.
--edit: Before you fix the file, what's the output of this?
xxd nuovo.txt | tailPerhaps if the file is only CR or only CRLF throughout, (g)vim handles it correctly, otherwise it will start doing weird stuff as can be seen here.
Last edited by ayekat (2017-09-05 15:22:43)
Offline
Actually, ^M means just CR. You can see ^M at the end of a line if it's DOS-formatted (CRLF), but in your case (no linebreak), it's just CR (as can be seen in the xxd output).
That's pretty odd - if I'm not mistaken, old Mac OS versions (<=9) used to have that, but AFAIK they switched to UNIX-style linebreaks (LF) from OS X on. How was that file created, exactly?
The dos2unix package seth mentioned earlier comes with a `mac2unix` command; you could try to fix it with that.
--edit: Before you fix the file, what's the output of this?
xxd nuovo.txt | tailPerhaps if the file is only CR or only CRLF throughout, (g)vim handles it correctly, otherwise it will start doing weird stuff as can be seen here.
00000190: 6f20 310d 322e 2054 6172 6567 6f20 320d o 1.2. Tarego 2.
000001a0: 332e 2054 6172 6567 6f20 330d 342e 2052 3. Tarego 3.4. R
000001b0: 6865 6e69 740d 4344 2034 3a0d 312e 2041 henit.CD 4:.1. A
000001c0: 6d61 7275 6e20 310d 322e 2041 6d61 7275 marun 1.2. Amaru
000001d0: 6e20 322d 490d 332e 2041 6d61 7275 6e20 n 2-I.3. Amarun
000001e0: 322d 4949 0d75 706c 6f61 6465 6420 696e 2-II.uploaded in
000001f0: 2031 300d 6669 6c65 2075 6e64 6572 2032 10.file under 2
00000200: 3063 2d65 6c65 6374 726f 6163 6375 7374 0c-electroaccust
00000210: 6963 0d66 726f 6d20 766f 6765 6c0d 656e ic.from vogel.en
00000220: 6a6f 7920 6974 2e2e 2e joy it...yes vi and vim don't show this file correctly
i don't know how they created this file, i found the file when i downloaded a music album
as i said other text editors like mousepad have no problm with this file, gvim does
Offline
Alright, it kind of makes sense now: you used the `--mime` option, so you couldn't see the comment of `file`:
$ printf 'Hello\rWorld' > test.txt
$ xxd test.txt
00000000: 4865 6c6c 6f0d 576f 726c 64 Hello.World
$ file test.txt
test.txt: ASCII text, with CR line terminators
$ file --mime test.txt
test.txt: text/plain; charset=us-asciiOpenend in vim, it does indeed show the carriage returns as ^M.
However, as explained in the article linked by seth, if the `ffs` option contains `mac` (not by default), the line endings should be displayed correctly:
Run vim
:set ffs=unix,dos,mac
:e nuovo.txt
tadaaa!
Offline
but in your case (no linebreak), it's just CR (as can be seen in the xxd output).
how did you see it? i'm not into xxd
Alright, it kind of makes sense now: you used the `--mime` option, so you couldn't see the comment of `file`:
i didn't use the -mime option
where did you see that i used that option?
Run vim
:set ffs=unix,dos,mac
:e nuovo.txt
tadaaa!
this method works
other text editors don't need anything more
vim needs this workaround
tomorrow i will read tha article posted above
Last edited by alma ata (2017-09-05 20:52:52)
Offline
setting ffs is required because it's set through the "nocompatible" call in /usr/share/vim/vimfiles/archlinux.vim
CR is 0d, LF is 0a
Offline
how did you see it? i'm not into xxd
xxd prints a hexadecimal representation of data alongside with a "human-readable" representation next to it. When looking at the "newlines", we can see the following:
00000210: 6963 0d66 726f 6d20 766f 6765 6c0d 656e ic.from vogel.en
^^ ^^ ^ ^There is also the `hexdump` command, which performs a similar task.
i didn't use the -mime option
See my invocation of `file` again: I need to pass --mime in order to get an output like
nuovo.txt: text/plain; charset=us-asciiCheck the output of
which file
pacman -Qo $(which file)I suspect you have either defined an alias somewhere, or you are using a different version of `file`.
setting ffs is required because it's set through the "nocompatible" call in /usr/share/vim/vimfiles/archlinux.vim
When using nocompatible, ffs is set to `unix,dos` on my machine by default (and the file is not properly displayed either).
ayekat goes and fixes this in his vimrc. --edit done
Last edited by ayekat (2017-09-06 07:20:47)
Offline
Offline
See my invocation of `file` again: I need to pass --mime in order to get an output like
nuovo.txt: text/plain; charset=us-asciiCheck the output of
which file pacman -Qo $(which file)I suspect you have either defined an alias somewhere, or you are using a different version of `file`.
yes now I recognize that i've used the -mime option. I needed it to see the encoding
I thought to find something like UTF-8 or ISO-etc... but I've only found us-ascii
Last edited by alma ata (2017-09-06 08:05:37)
Offline
Pages: 1