You are not logged in.

#1 2008-09-21 08:15:38

aardwolf
Member
From: Belgium
Registered: 2005-07-23
Posts: 304

Binary data in terminal

If you watch binary data in the terminal, it'll typically behave strange.

Example:

2pocu9t.jpg

In the above example, two things act "strange" after viewing the binary data:

-the word "lode" in red became garbage
-any input I type becomes garbage

Typing reset fixes it.

Since this appears to be normal behavior in linux (nobody appears to consider this a bug), but the terminal still allows you to view binary data, I wonder: is this specified? Is it specified which binary data symbol makes the terminal start displaying your input as weird characters? If nothing is specified about it, what's the reason why the makers of bash chose to let their terminal behave like this and not do "normal" after displaying binary data?

Offline

#2 2008-09-21 21:49:58

carlocci
Member
From: Padova - Italy
Registered: 2008-02-12
Posts: 368

Re: Binary data in terminal

Usually a C program manipulating strings goes haywire if it processes a sequence with NUL in it (because NUL is the string terminator in C).
Bash is written in C.
Feel free to end the syllogism

If nothing is specified about it, what's the reason why the makers of bash chose to let their terminal behave like this and not do "normal" after displaying binary data?

It's due to the way C behaves with strings.

Last edited by carlocci (2008-09-21 21:51:57)

Offline

#3 2008-09-22 05:53:07

dav7
Member
From: Australia
Registered: 2008-02-08
Posts: 674

Re: Binary data in terminal

Uh... no. No no no. C irks have nothing to do with this - I don't think so, at least.

Within terminal emulators, there are 3 character table sets, G0, G1 and G2, each can contain a set of character glyphs, and one of these is loaded into GL so it's the one that's actively used. G0 is the set you typically have loaded. Certain escape sequences can switch the character table into and out of different modes; the mode you're describing above is called the "box drawing" or "line drawing" or "special characters" mode. This mode remaps certain characters to different glyphs so that operations such as drawing of box characters can be done by simply echoing ASCII characters to the screen.

Different environments require different behavior to select these drawing characters and put them into play. In X11, one needs to switch the mode desired into G0. At the console, G1 is typically already loaded with the box drawing character set, so all that's required is to switch GL into G1.

Where ^[ is meant to mean the escape character, ASCII code 27...

The sequence ^[(0 switches G0 into line drawing mode, as applicable for X11, and can be sent to the terminal via

echo -e '\e(0'

The sequence ^[(B switches G0 back into standard or normal mode, and can similarly be sent via

echo -e '\e(B'

In console mode, the key combination CTRL+N will send the ^N sequence to the terminal (defined in ANSI as SO, defined in POSIX to mean LS1), switching it to the G1 character set, and CTRL+O will send what is defined in ANSI as SO but defined in POSIX as LS0, switching it back to G1. However, these key combinations are interpreted to mean other things when pressed in a terminal environment, and only take effect when their character code counterparts (SO or LS1, which is 14 decimal, 0xE hexadecimal, or 016 octal, and SI or LS0, which is 15 / 0xF / 017), are echoed to the terminal. To achieve this, try the commands below (where words contained in < and > are intended to be pressed as key combos). To echo SO, send

echo <CTRL+V><CTRL+N>

And to echo SI, send:

echo <CTRL+V><CTRL+O>

Therefore, When you cat a binary stream of data, there is a high chance the sequence 27, 40, 48 (switch G1 into GL), 27, 40, 66 (switch G0 into GL), 14 (SO) or 15 (SI) is likely to occur, in which case your terminal obeys these perfectly valid escape sequences, switching into or out of the box drawing character set and displaying whatever data proceeds these commands using that character set.

The chances of this occuring are threefold in console mode, since thise mode by default requires a 3rd of the chars required by X11 terminal emulators to switch to the box drawing character set, so console mode is more likely to be susceptible to this issue.

It is for this reason that terminals may also clear (the UNIX 'clear' command typically sends a longer sequence, but the sequence 27 99 appears to erase the display) and I have personally seen this sequence sequence sent to my terminal at least once), or exhibit other odd side effects - the result of different escape sequences that are being sent to the terminal.

A fun example:

When in box drawing mode, the following keys are mapped to the following alternate box drawing glyphs:

j   bottom right
m   bottom left

k   top right
l   top left

q   horiz line
x   vert line

With that knowledge, run the following code and observe how a square box is drawn on the screen.

echo -e '\e(0lqqqqqqqqqqk\nx          x\nx          x\nx          x\nmqqqqqqqqqqj\e(B'

Or, a little more spaced out (indentation added purely for conciseness):

echo -e '\e(0'
   echo lqqqqqqqqqqk
   echo x          x
   echo x          x
   echo x          x
   echo mqqqqqqqqqqj
echo '\e(B'

Yes, that's how ncurses works too, in case you wondered.
Also, "reset" can be replaced by "tput reset" - this sends the same escape sequences (more escape sequences!) as "reset", but doesn't delay at all.

References:

- the "console_codes" manpage
- man dtterm(5), not available in Arch but mirrored at this ancient HP web url: http://h30097.www3.hp.com/docs/base_doc … 00____.HTM


For curious minds:

- Piping the output of something that uses escape sequences through a script like this PHP code (where " \e" is what you want to replace the escape character to):

php -r 'echo str_replace(chr("27"), " \e", `tput reset`);'; echo

is an easy way to investigate what sequences are being sent to the terminal. Simple commands like "clear", "tput reset" ("reset" sends some signals directly to the terminal emulator process itself I think, NOT escape sequences, so is beyond the scope of this post), and so on are good starting points.

- Additionally, you can substitute editing of the output stream of an application via PHP or another language with redirection. Redirecting the output of a program to a file is a good way to poke about inside the file, although beware that the files created with this method should only be looked at with simple editors like nano or e3, which don't try to have a go at figuring out the file content and/or "cleaning" it, like vi(m) or emacs might. NOTE that even when you're redirecting the output of a program to a file, it still accepts input! If your application for example accepts the key F10 to quit it, press F10 after you think your app is done loading, and the command you ran to start the app (for example "mc > mc-output") should quit.

- Ncurses applications are a good place to learn about escape sequences without diving into sourcecode because the ncurses library uses a lot of undocumented escape sequences which do interesting things. You'll almost certainly want to redirect the output of the command to a file as per the method above, and additionally be prepared to do a LOT of digging around inside the output, as the escape sequences will be interspersed with a LOT of other characters, those being the program's perfectly normal output.


Whew, that was quite a post. I'm glad that I had a good memory when I was somewhere between 8 to 11, when I learnt about DOS escape sequences. That was 9 to 6 years ago, and the knowledge has helped me handle UNIX's infentesimally more complex escape sequences and find them actually learnable. tongue

EDIT: Updated a bit of the text, fixed a typo, added more info

-dav7

Last edited by dav7 (2008-09-23 23:47:25)


Windows was made for looking at success from a distance through a wall of oversimplicity. Linux removes the wall, so you can just walk up to success and make it your own.
--
Reinventing the wheel is fun. You get to redefine pi.

Offline

#4 2008-09-22 07:27:23

SiC
Member
From: Liverpool, England
Registered: 2008-01-10
Posts: 430

Re: Binary data in terminal

Thanks dav7, I've used Unix based systems for the better part of 13 years, and I never knew that smile Learn something everyday big_smile

Offline

#5 2008-09-22 08:32:18

pauldonnelly
Member
Registered: 2006-06-19
Posts: 776

Re: Binary data in terminal

aardwolf wrote:

If nothing is specified about it, what's the reason why the makers of bash chose to let their terminal behave like this and not do "normal" after displaying binary data?

It's because there's no way for the terminal to know you're displaying "binary data". It's just letters and characters, and how is the terminal supposed to know whether you're looking at ASCII art, text, or gibberish?

Offline

#6 2008-09-22 11:35:38

aardwolf
Member
From: Belgium
Registered: 2005-07-23
Posts: 304

Re: Binary data in terminal

Thanks a lot dav7, that's the best explanation I could have imagined smile

Offline

#7 2008-09-22 15:28:17

carlocci
Member
From: Padova - Italy
Registered: 2008-02-12
Posts: 368

Re: Binary data in terminal

Thank you dav7 for correcting my wrong explanation: that was amusing to read

Offline

#8 2008-09-22 19:40:37

FreakGuard
Member
Registered: 2008-04-27
Posts: 103

Re: Binary data in terminal

Good information, that explains some behaviour on the serial console smile

Offline

#9 2008-09-23 05:22:49

B-Con
Member
From: USA
Registered: 2007-12-17
Posts: 554
Website

Re: Binary data in terminal

I've wondered this myself. Thanks for all the info -- it makes an excellent bookmark.

Offline

#10 2008-09-23 07:48:14

wuischke
Member
From: Suisse Romande
Registered: 2007-01-06
Posts: 630

Re: Binary data in terminal

Thanks for this explanation, I didn't know about this. (I, too, made a bookmark.)

Offline

#11 2008-09-23 12:45:09

dav7
Member
From: Australia
Registered: 2008-02-08
Posts: 674

Re: Binary data in terminal

Wow, cool big_smile

I just came back to fish the box drawing chars out of my own post and found all these unexpected responses big_smile

-dav7

Last edited by dav7 (2008-09-23 12:45:34)


Windows was made for looking at success from a distance through a wall of oversimplicity. Linux removes the wall, so you can just walk up to success and make it your own.
--
Reinventing the wheel is fun. You get to redefine pi.

Offline

#12 2008-09-23 14:17:56

Onwards
Member
From: Pakistan
Registered: 2007-04-18
Posts: 108

Re: Binary data in terminal

Wow, it's on reddit/linux's main page !! smile

Last edited by Onwards (2008-09-23 14:18:51)

Offline

#13 2008-09-23 14:26:01

moljac024
Member
From: Serbia
Registered: 2008-01-29
Posts: 2,676

Re: Binary data in terminal

So why doesn't this happen with xterm, urxvt or gnome-terminal ?


The day Microsoft makes a product that doesn't suck, is the day they make a vacuum cleaner.
--------------------------------------------------------------------------------------------------------------
But if they tell you that I've lost my mind, maybe it's not gone just a little hard to find...

Offline

#14 2008-09-23 15:18:15

dav7
Member
From: Australia
Registered: 2008-02-08
Posts: 674

Re: Binary data in terminal

Onwards: Wow big_smile

moljac024: It does. All terminal emulators support escape sequences, and only very few (read: early video display-based systems from around the 1950s to the 1970s, maybe a few rare others) didn't support the box drawing character set. But all modern X terminal emulators do support it.

Note: I'm not entirely sure if LS0 and LS1 are defined in POSIX. I just guessumed that they were. tongue

-dav7

Last edited by dav7 (2008-09-23 15:20:34)


Windows was made for looking at success from a distance through a wall of oversimplicity. Linux removes the wall, so you can just walk up to success and make it your own.
--
Reinventing the wheel is fun. You get to redefine pi.

Offline

#15 2008-09-23 15:24:44

andre.ramaciotti
Member
From: Brazil
Registered: 2007-04-06
Posts: 649

Re: Binary data in terminal

The box drawing characters do work on urxvt, but after 'cat /dev/random', the terminal comes back to normal. I thought it was the shell I use (zsh), but on vc it doesn't come back to normal.


(lambda ())

Offline

#16 2008-09-23 15:25:05

Arkane
Member
From: Switzerland
Registered: 2008-02-18
Posts: 263

Re: Binary data in terminal

True, I tried cat'ing binary files in urxvt and didn't see that behavior.
But I did get intermittent clears, and a few subtle alterations to my keymap, like I could only type slashes with numlock on.

Last edited by Arkane (2008-09-23 15:27:57)


What does not kill you will hurt a lot.

Offline

#17 2008-09-23 17:54:44

aardwolf
Member
From: Belgium
Registered: 2005-07-23
Posts: 304

Re: Binary data in terminal

It behaves different on a different computer of mine! New screenshot - different looking character set, and the red prompt always looks the same here

2exvehj.jpg

Offline

#18 2008-09-23 18:42:57

alanhaggai
Member
From: World Wide Web
Registered: 2008-08-15
Posts: 25
Website

Re: Binary data in terminal

This time, you are using UTF-8 encoding. With recent installations, I have never had trouble with displaying binary data in terminal.


The difference makes the difference.

Offline

Board footer

Powered by FluxBB