You are not logged in.
my /etc/vimrc has set encoding=utf-8 ... ie: switching VIM from a native one-byte editor to a multi-byte editor
my LANG=en_US.UTF-8
eg:
touch whatever.txt
file -bi whatever.txt ... inode/x-empty: charset=binary ... as expected
echo "один дба три" > whatever.txt
file -bi whatever.txt ... text/plain: charset=utf-8 ... as expected
echo "one two three" >> whatever.txt
file -bi whatever.txt ... text/plain: charset=utf-8 ... as expected
vim whatever.txt ... deleting first line (ie: the russian one) with dd and saving with :wq
file -bi whatever.txt ... text/plain: charset=us-ascii ... not expected: want to keep the same UTF-8 encoding regardless of the actual content of the file
Last edited by ivanborodin (2016-07-25 00:52:20)
Offline
The `file` command tries to guess the encoding based on the contents of the file, but us-ascii is identical to and a subset of utf-8. So unless there are characters that are outside the ascii range, it can't tell the difference.
More advanced programs, e.g. editors like vim, can use modeline metadata to declare the preferred (save-as) fileencoding, but the `file` command won't test for that.
However, you could add a Byte order mark if it really really bothers you.
Managing AUR repos The Right Way -- aurpublish (now a standalone tool)
Offline
You are right. I did have to review some notes on UTF-8. Coming from Windows I use to always save my source files in native UTF-16 but just forgot that the BOM is always present on UTF-16 files which is not the case for UTF-8: with/without BOM, BOM not required, furthermore, BOM not recommended at all; eg: I tried a UTF-8 BASH script with BOM ... it won't run.
Thank you very much for your answer.
Offline