You are not logged in.

#1 2006-09-30 15:50:07

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

UTF-8 in console apps

I know that this has been discussed already, and that there are solutions to some of the problems, but as the default locale is now a UTF-8 one, it seems important to me that also console apps work with utf-8. Could we maybe construct a clear list of currently non-working apps (and solutions, if possible), so that the devs are encouraged to do the necessary updates?

Those I know of so far are:

'dialog' - which is quite easily fixed, it needs a change of configure options (--with-ncursesw), though I am not sure about the nls bit, I haven't tried that.

'mc' - for which there are patches which are not quite perfect, but at least mc is then usable in a unicode console. I have been using a version with the gentoo patch for a while and it has been good enough for my purposes.

Offline

#2 2006-09-30 21:38:33

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: UTF-8 in console apps

At least mc, nano, coreutils, ncurses, id3lib, taglib etc. should be fixed.

Some links:
http://bbs.archlinux.org/viewtopic.php?p=194235#194235 (edit: oh, that is not for UTF-8, now I posted it as http://bugs.archlinux.org/task/5487)
http://bugs.archlinux.org/task/4652
http://bugs.archlinux.org/task/4418
http://bugs.archlinux.org/task/4756 (see also taglib-rcc and id3lib-rcc in AUR)


to live is to die

Offline

#3 2006-10-01 05:51:23

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

Re: UTF-8 in console apps

The development version of nano (nano-1.9.99pre1) seems to work, though I only tested it briefly. It needs one extra configure option: --enable-utf8

Offline

#4 2006-10-01 07:17:54

dtw
Forum Fellow
From: UK
Registered: 2004-08-03
Posts: 4,439
Website

Re: UTF-8 in console apps

Go for it, guys!  You deserve better support!

Offline

#5 2006-10-01 09:51:05

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: UTF-8 in console apps

gradgrind wrote:

The development version of nano (nano-1.9.99pre1) seems to work, though I only tested it briefly. It needs one extra configure option: --enable-utf8

Yes, it works. There are also a bunch of patches for MC. Ncurses also can be patched, haven't tried this however. And I'm sure I've seen patches for coreutils also.

I use uk_UA.KOI8-U, but want to switch to UTF-8.
I'll try to manage my time and test UTF-8 extensively. Will add one or two VMware machines to my testing collection.  big_smile

It would be very nice to have full support for both UTF-8 and non-UTF-8 systems in 0.8.  wink
Currently even non-UTF-8 locale support has some bugs (http://bugs.archlinux.org/task/5487, for example).


to live is to die

Offline

#6 2006-10-02 10:30:06

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

Re: UTF-8 in console apps

Here's a modified PKGBUILD for lynx:

pkgname=lynx
pkgver=2.8.5
pkgrel=5
pkgdesc="A text browser for the World Wide Web"
arch=(i686 x86_64)
depends=('ncurses' 'openssl')
source=(http://lynx.isc.org/release/${pkgname}${pkgver}.tar.gz)
url="http://lynx.isc.org"
md5sums=('5f516a10596bd52c677f9bfd9579bc28')

build() {
  cd $startdir/src/${pkgname}2-8-5
  ./configure --prefix=/usr --with-ssl --with-screen=ncursesw --enable-locale-charset
  make || return 1
  make DESTDIR=$startdir/pkg install
  sed -i "s|^#LOCALE_CHARSET.*|LOCALE_CHARSET:TRUE|" $startdir/pkg/usr/lib/lynx.cfg
  sed -i "s|^#ASSUME_CHARSET.*|ASSUME_CHARSET:utf-8|" $startdir/pkg/usr/lib/lynx.cfg
}

Offline

#7 2006-10-02 11:01:26

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: UTF-8 in console apps

Nice. I'll try it. Post it to bugtracker too.


to live is to die

Offline

#8 2006-10-02 16:21:13

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: UTF-8 in console apps

Ok, few things: with the dialog compile switch, does this break non-utf8 setups at all?

please post here the packages that have problems, and any way to reproduce it (for silly americans like me who don't speak moon-languages smile ), I will get to them on a case-by-case basis.

Offline

#9 2006-10-02 16:45:10

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: UTF-8 in console apps

Good to know that you are with us, phrakture! ;-)
I'll get to my Linux box tomorrow. Vmware, patch & makepkg will be my friends.  big_smile


to live is to die

Offline

#10 2006-10-02 20:28:21

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

Re: UTF-8 in console apps

phrakture wrote:

Ok, few things: with the dialog compile switch, does this break non-utf8 setups at all?

I'm not absolutely sure, but pretty sure it does, also the mc patch. These console apps don't seem to be very flexible. I think the lynx mod can cope via its option menu with various encodings.

But even if there are such breakages, shouldn't the utf8 versions be the standard ones and the non-utf8 ones be the ones hanging out in AUR?

Offline

#11 2006-10-02 20:35:22

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: UTF-8 in console apps

gradgrind wrote:

But even if there are such breakages, shouldn't the utf8 versions be the standard ones and the non-utf8 ones be the ones hanging out in AUR?

I'd agree with that, but then some people might not.  In the case of breakages, we may have to figure out if it's worth providing two versions or something goofy.  I'd say swith to UTF8 for now and wait for complaints.... /shrug

Offline

#12 2006-10-03 01:25:29

codemac
Member
From: Cliche Tech Place
Registered: 2005-05-13
Posts: 794
Website

Re: UTF-8 in console apps

The real issue is that utf-8 is not the default locale on someone's machine.  You have to switch to it.  So I don't see where breaking everyone's terminal on install is a good idea.

That being said, all these packages should be fixed, and hopefully none break tongue

Offline

#13 2006-10-03 04:39:23

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

Re: UTF-8 in console apps

codemac wrote:

The real issue is that utf-8 is not the default locale on someone's machine.  You have to switch to it.  So I don't see where breaking everyone's terminal on install is a good idea.

What do you mean? If I do a fresh install, I get LOCALE="en_US.UTF-8" in rc.conf until I change it. The result is that I can't use (standard) mc at all in a console and as soon as I use non-ASCII characters some of the other apps make a mess. Do you mean 'default' in some other way?

Offline

#14 2006-10-03 05:57:30

Purch
Member
From: Finland
Registered: 2006-02-23
Posts: 229

Re: UTF-8 in console apps

Very nice discussion about UTF guys. I should be using UTF already, but I have been lazy to get information howto. It would be sweet to have UTF wiki pages, like gentoo has (I just googled).

My vote for wiki smile

Offline

#15 2006-10-03 08:27:36

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: UTF-8 in console apps

phrakture wrote:
gradgrind wrote:

But even if there are such breakages, shouldn't the utf8 versions be the standard ones and the non-utf8 ones be the ones hanging out in AUR?

I'd agree with that, but then some people might not.  In the case of breakages, we may have to figure out if it's worth providing two versions or something goofy.  I'd say swith to UTF8 for now and wait for complaints.... /shrug

There shouldn't even be a discussion about moving non-UTF-8 packages to AUR. This will break systems for many users which use non-Latin alphabet.
Applications that don't have UTF-8 support (or have it, but it is broken) should be patched. And there should always be a choice which locale and character encoding to use.

I don't see UTF-8 as well established standard for most users' systems in near future (few years at least).
Yes, UTF-8 solves many problems, but for this to be true (and not cause another problems) all applications should support it! Before this don't happen - it is wise to keep support for non-UTF-8 encodings too.

The good news are that applications that are based on Qt or GTK+ already should support UTF-8. See http://bugs.archlinux.org/task/5487, however. BTW, can anyone of devs reading this thread fix this bug?
There are also problems with GTK1 (http://bugs.archlinux.org/task/4652), but I don't think it will be easy to fix them and if it's worth fixing because we have GTK2 for long time.
The bad news - UTF-8 is hard to support in console apps due to their nature. For example, while it's easy to support UTF-8 fonts rendering in X terminal, it's hard do do that in text console. That's why many applications have broken display of non-Latin chars in text mode (especially Cyrillic chars). That's why there are patches for ncurses and mc (patches for slang; slang2 already supports Unicode).
mc-utf8 in Community still has some display glitches.
See http://bugs.archlinux.org/task/4418 for patches for coreutils and better way of patching mc.
BTW, coreutils 6.3 are out, but I don't know if they support Unicode better, haven't seen anything about this in changelog.


to live is to die

Offline

#16 2006-10-03 16:02:59

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: UTF-8 in console apps

codemac wrote:

The real issue is that utf-8 is not the default locale on someone's machine.  You have to switch to it.  So I don't see where breaking everyone's terminal on install is a good idea.

Unless we force the default, which is, IMO, a good idea.

Offline

#17 2006-10-03 16:40:36

Romashka
Forum Fellow
Registered: 2005-12-07
Posts: 1,054

Re: UTF-8 in console apps

phrakture wrote:
codemac wrote:

The real issue is that utf-8 is not the default locale on someone's machine.  You have to switch to it.  So I don't see where breaking everyone's terminal on install is a good idea.

Unless we force the default, which is, IMO, a good idea.

IMO forcing default to UTF-8 is bad idea, at least in current situation.
There is LOCALE="en_US.utf8" already in default rc.conf. Isn't it enought?
BTW, does empty CONSOLEFONT= work fine with UTF-8? I remember older Arch versions used LatArCyrHeb16 (not sure if I named it correctly).


to live is to die

Offline

#18 2006-10-03 19:04:12

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

Re: UTF-8 in console apps

Romashka wrote:

BTW, does empty CONSOLEFONT= work fine with UTF-8? I remember older Arch versions used LatArCyrHeb16 (not sure if I named it correctly).

I think it's missing quite a lot of glyphs (I mean the 'default' font), but it's ok for some of us West/Central Europeans (the few non-ASCII German characters are ok).

Offline

#19 2006-10-03 19:55:16

damjan
Member
Registered: 2006-05-30
Posts: 452

Re: UTF-8 in console apps

I must say UTF-8 works great for me.

The only console application I use is vim and it supports UTF-8 locales just fine.

Some of the rare, other console applications I use, that are dialog based, like "make menuconfig" usually don't need to work with anything but ASCII anyway.

And the only other problem is GTK+1 ... but I don't applications based on it either... It's unfortunate that I need to have it installed because of a stupid dependacy in kdeutils for xmms.

Offline

#20 2006-10-09 19:10:51

mmccaskill
Member
From: NC
Registered: 2005-02-21
Posts: 163

Re: UTF-8 in console apps

I get a few funky characters with UTF-8 with xcalc. I take it xcalc can't handle UTF-8?

Offline

#21 2006-10-11 12:59:50

Eliatamby
Member
Registered: 2005-05-06
Posts: 80

Re: UTF-8 in console apps

I get issues with lynx (i'm guessing this is ncurses?), mc, and id3lib.  I also find that many central european (french) symbols don't appear at all.

edit: I do recall id3lib has a utf8 patch available.  That's what the error dialog says at least

Offline

#22 2007-01-03 20:49:28

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,925

Re: UTF-8 in console apps

It appears a recent update has fixed the UTF-8 support for MC.

On a system that hasn't been updated since late november, i need to use mc -a.

On my uptodate desktop and laptop i can start mc without the -a and get everything in place as it should be.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#23 2007-01-04 08:41:04

gradgrind
Member
From: Germany
Registered: 2005-10-06
Posts: 921

Re: UTF-8 in console apps

Lone_Wolf wrote:

It appears a recent update has fixed the UTF-8 support for MC.

On a system that hasn't been updated since late november, i need to use mc -a.

On my uptodate desktop and laptop i can start mc without the -a and get everything in place as it should be.

Indeed, that does seem to be the case - I wonder what did it!

Of course, if you want to actually deal with utf8 files, you'll still need to use mc-utf8 from [community], which seems to work pretty well.

Offline

#24 2007-03-23 00:03:24

colinzhengj
Member
From: Cambridge, MA
Registered: 2007-03-20
Posts: 23
Website

Re: UTF-8 in console apps

I was fiddling with mc-mp (an mc spin-off), since it boasts of cleaner code and smaller memory footage.

mc-mp seems not to handle utf-8 (don't think it's a slang2 problem---can anyway verify this?). On the contrary, properly patched mc supports utf-8 well.

Sadly i had to move away from mc-mp...

Offline

#25 2008-03-18 00:00:59

turtle
Member
From: Czestochowa, Poland
Registered: 2006-02-05
Posts: 20
Website

Re: UTF-8 in console apps

In order to make aspell work correctly with UTF-8 the PKGBULID for aspell should be improved. Namely, the line

./configure --prefix=/usr

should be at least changed to

Code:

./configure --prefix=/usr --enable-curses=/usr/lib/libcursesw.so

Offline

Board footer

Powered by FluxBB