You are not logged in.

#1 2017-08-25 01:00:58

jernst
Member
From: Silicon Valley
Registered: 2014-03-04
Posts: 290
Website

[Solved] Character set in packager field

pacman -Si ruby-rack

when run in my default shell, emits the packager line as:

Packager :     Bart

When run as

LC_ALL=en_US.UTF-8 pacman -Si ruby-rack

it prints the full line, but with two unprintable characters.

The same info on the package's website shows the full name with what I assume is all its Polish glyph glory.

How do I get the name printed completely and correctly on my American terminal? Adding locale pl_PL.UTF-8 to the system did not help.

Actually, my real issue is that Anatol's gem2arch script (here) barfs when it attempts to parse that very output complaining about an invalid byte sequence (assuming ASCII). If it isn't supposed to assume ASCII, what is it supposed to assume? od tells me the two byte in question are 0xc5 and 0x82.

Last edited by jernst (2017-08-27 18:29:23)

Offline

#2 2017-08-25 01:04:02

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: [Solved] Character set in packager field

Works fine for me (en_NZ.UTF-8). I would say your locale is broken.


Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#3 2017-08-25 01:10:17

jernst
Member
From: Silicon Valley
Registered: 2014-03-04
Posts: 290
Website

Re: [Solved] Character set in packager field

How could it be broken? locale -a reports C, POSIX, en_US.utf8 and now pl_PL.utf8. What else should I be checking?

Offline

#4 2017-08-25 01:49:41

circleface
Member
Registered: 2012-05-26
Posts: 639

Re: [Solved] Character set in packager field

It could also be your font.  Make sure you are using a font that supports those characters.

Edit:  It also works fine for me with default English.

Last edited by circleface (2017-08-25 01:51:07)

Offline

#5 2017-08-25 03:56:56

jernst
Member
From: Silicon Valley
Registered: 2014-03-04
Posts: 290
Website

Re: [Solved] Character set in packager field

If I change the font in konsole, there is no change in what appears to be printed. I tried Liberation Mono, Inconsolata, Noto Mono and others. So that doesn't seem to be it.

If I do

pacman -Si ruby-rack | cat

I get all characters printed, but two of them show up as a question mark inside a rhombus. Without the cat (see above), it truncates output.

But back to my original question. I gather that

pacman -Qi ruby-rack

gets what it prints from /var/lib/pacman/local/ruby-rack-2.0.1-2/desc. This file contains non-ASCII characters. How are they supposed to be interpreted? That cannot really depend on my system's locale, because the content of that file was downloaded from the repo, and the same file will show up locally regardless how anybody has configured their system.

Offline

#6 2017-08-25 04:32:43

Scimmia
Fellow
Registered: 2012-09-01
Posts: 11,539

Re: [Solved] Character set in packager field

jernst wrote:

gets what it prints from /var/lib/pacman/local/ruby-rack-2.0.1-2/desc. This file contains non-ASCII characters.

What non-ASCII characters does it contain, exactly?

Offline

#7 2017-08-25 04:34:23

jernst
Member
From: Silicon Valley
Registered: 2014-03-04
Posts: 290
Website

Re: [Solved] Character set in packager field

See above: od tells me the two byte in question are 0xc5 and 0x82, apparently somehow representing the strike-through lower-case L glyph.

Offline

#8 2017-08-25 04:40:17

Scimmia
Fellow
Registered: 2012-09-01
Posts: 11,539

Re: [Solved] Character set in packager field

Ah, I missed that part. Anyway, yeah, that is perfectly valid UTF-8.

http://www.fileformat.info/info/unicode … /index.htm

Are you using a non-unicode aware terminal?

Offline

#9 2017-08-27 18:28:34

jernst
Member
From: Silicon Valley
Registered: 2014-03-04
Posts: 290
Website

Re: [Solved] Character set in packager field

I found it. When using KDE, all of those need to be set to the correct locale:

1. /etc/locale.conf
2. ~/.config/plasma-localerc
3. LANG or LC_* in the startup script of the shell, if given (e.g. ~/.bashrc)
4. Encoding set by the terminal in the terminal program's preferences (e.g. "Set Encoding" / "Unicode" / "UTF-8" in konsole context menu)

Why just setting the encoding in the terminal preferences and env variable in the shell doesn't do the trick isn't clear to me.

On the original question: the answer is apparently: UTF-8.

Last edited by jernst (2017-08-27 18:29:03)

Offline

Board footer

Powered by FluxBB