You are not logged in.

#1 2011-12-15 19:00:14

stuvjordan
Member
Registered: 2011-08-09
Posts: 26

getting bash to output utf-8

Hey folks:

So, I have a perl script that sends utf-8 encoded text to stdout.  When I call the script from the shell and try to redirect the output to a file, however, the file ends up encoded as ascii.  Specifically, when I call

./nameofscript.pl > outfile.txt

and then

file outfile.txt

I get output:

outfile.txt: ASCII text

Some questions about this:


(1) Is there a way to force the shell to write to files in UTF-8?  (by the way, my locale is set to en_US.UTF-8)  Can this be done in some configuration file at the system level?  (by the way, I'm using urxvtperl)

(2) Is it possible that this is in fact a non-problem?  Specifically, could it be the case that if you give the shell a stream of UTF-8 that consists of ONLY the ascii subset, then it automatically writes the stuff to a file in ascii, but if the stream contains non-ascii characters then the shell writes in UTF-8?

(3) Yes, I could use iconv like so:

.perlscript.pl | iconv > outfile.txt 

(without options, iconv translates UTF-8 to UTF-8)  But this seems kind of silly -- my script is already outputing UTF-8.  I would be so much happier if good old '>' just wrote utf8.


Finally, I apologize is this is really the wrong forum for this post.  I've been googling all over the place about this for a few hours, and just cannot find any good answers. (and by the way, I am running arch!)

Thanks!

Offline

#2 2011-12-15 19:40:28

fsckd
Forum Fellow
Registered: 2009-06-15
Posts: 4,173

Re: getting bash to output utf-8

I never knew '>' filters streams.

mod action: Moving from Newbie Corner to Programming & Scripting as this is likely to go deep in that direction.

Edit: Sarcasm aside, I don't think the stream is being modified in any way. Either your script is outputting wrong or the file type is being detected wrong.

Last edited by fsckd (2011-12-15 19:42:34)


aur S & M :: forum rules :: Community Ethos
Resources for Women, POC, LGBT*, and allies

Offline

#3 2011-12-15 19:57:23

jjacky
Member
Registered: 2011-11-09
Posts: 347
Website

Re: getting bash to output utf-8

stuvjordan wrote:

(3) Yes, I could use iconv like so:

.perlscript.pl | iconv > outfile.txt 

Because when you do that, file then reports the file a UTF8??
I think it's a non-issue; file just reports your file as ASCII because only contains ASCII characters so there's no possible way to tell anything else, but as soon as there'll be a non-ASCII character it will say UTF8.

Offline

#4 2011-12-15 20:11:29

stuvjordan
Member
Registered: 2011-08-09
Posts: 26

Re: getting bash to output utf-8

good point jjacky, that was stupid of me not to check.  Indeed, when I pipe the result through iconv, 'file' still says the result is ASCII.

And now that I think about it, this surely doesn't matter, since, as I understand it, ASCII = utf8 for a file that only contains the first 255 characters.

thanks all!

Offline

#5 2011-12-15 22:46:24

rockin turtle
Member
From: Montana, USA
Registered: 2009-10-22
Posts: 227

Re: getting bash to output utf-8

Actually ASCII == utf8 only for the first 127 characters.  The 'characters' between 128 and 255 are not valid utf8.

Offline

Board footer

Powered by FluxBB