You are not logged in.
Pages: 1
Hey folks:
So, I have a perl script that sends utf-8 encoded text to stdout. When I call the script from the shell and try to redirect the output to a file, however, the file ends up encoded as ascii. Specifically, when I call
./nameofscript.pl > outfile.txt
and then
file outfile.txt
I get output:
outfile.txt: ASCII text
Some questions about this:
(1) Is there a way to force the shell to write to files in UTF-8? (by the way, my locale is set to en_US.UTF-8) Can this be done in some configuration file at the system level? (by the way, I'm using urxvtperl)
(2) Is it possible that this is in fact a non-problem? Specifically, could it be the case that if you give the shell a stream of UTF-8 that consists of ONLY the ascii subset, then it automatically writes the stuff to a file in ascii, but if the stream contains non-ascii characters then the shell writes in UTF-8?
(3) Yes, I could use iconv like so:
.perlscript.pl | iconv > outfile.txt
(without options, iconv translates UTF-8 to UTF-8) But this seems kind of silly -- my script is already outputing UTF-8. I would be so much happier if good old '>' just wrote utf8.
Finally, I apologize is this is really the wrong forum for this post. I've been googling all over the place about this for a few hours, and just cannot find any good answers. (and by the way, I am running arch!)
Thanks!
Offline
I never knew '>' filters streams.
mod action: Moving from Newbie Corner to Programming & Scripting as this is likely to go deep in that direction.
Edit: Sarcasm aside, I don't think the stream is being modified in any way. Either your script is outputting wrong or the file type is being detected wrong.
Last edited by fsckd (2011-12-15 19:42:34)
aur S & M :: forum rules :: Community Ethos
Resources for Women, POC, LGBT*, and allies
Offline
(3) Yes, I could use iconv like so:
.perlscript.pl | iconv > outfile.txt
Because when you do that, file then reports the file a UTF8??
I think it's a non-issue; file just reports your file as ASCII because only contains ASCII characters so there's no possible way to tell anything else, but as soon as there'll be a non-ASCII character it will say UTF8.
Offline
good point jjacky, that was stupid of me not to check. Indeed, when I pipe the result through iconv, 'file' still says the result is ASCII.
And now that I think about it, this surely doesn't matter, since, as I understand it, ASCII = utf8 for a file that only contains the first 255 characters.
thanks all!
Offline
Actually ASCII == utf8 only for the first 127 characters. The 'characters' between 128 and 255 are not valid utf8.
Offline
Pages: 1