You are not logged in.

#1 2017-08-03 17:48:27

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Messed up UTF-8 characters

Last time, when I went to update Arch Linux, I had a power surge, so I couldn't use. So I was using Windows until I could solve this issue.
When I then fixed my system, the only thing that I did was, with arch-chroot do a full system upgrade with "pacman -Syu"

The thing is that, when I rebooted my laptop, some filenames in the terminal appear like this "Im<>genes" o "M<>sica" instead of "Imágenes" and "Música". (Where <> is U+FFFD)

Now I can't open any file of these folders, how can I solve this?

Note: When I updated my system, I also set labels to my hard drives, like this:

[davo@Arch-Laptop ~]$ lsblk -f
NAME   FSTYPE LABEL                     UUID                                 MOUNTPOINT
sda                                                                          
sda1 ntfs   Reservado para el sistema 5E4C04074C03D8A3                     
sda2 ntfs                             842606A4260696F8                     /mnt/Disk
sda3 ntfs                             88926C75926C6A20                     
sda4                                                                       
sda5 ext4   /boot                     72a3fb1b-10e0-4683-a6e4-47038d16e4b1 /boot
sda6 ext4   Arch /                    a1a1e16d-f494-4b98-979d-207d31528270 /
sda7 ext4   /home                     0106a379-9868-44da-ad52-22c05c645342 /home
sda8 swap                             c0a31eb3-f80c-47e6-a226-7d5623dbf776 [SWAP]
sr0

[url=https://i.imgur.com/vIyCDDE.png]
  vIyCDDEl.png
[/url]

Offline

#2 2017-08-03 17:53:45

ugjka
Member
From: Latvia
Registered: 2014-04-01
Posts: 1,815
Website

Re: Messed up UTF-8 characters

Try running fc-cache that should update the font cache

Last edited by ugjka (2017-08-03 17:53:57)


https://ugjka.net
paru > yay | webcord > discord
pacman -S spotify-launcher
mount /dev/disk/by-...

Offline

#3 2017-08-03 17:57:27

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

ugjka wrote:

Try running fc-cache that should update the font cache

I don't think it's a font related problem. I can't open them with any program

Offline

#4 2017-08-03 18:17:11

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

Please provide the output of

locale
locale -a
localectl
mount

You can rule out fc issues easily by trying a linux console (ctrl+alt+f1)

Offline

#5 2017-08-03 18:43:39

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

seth wrote:

Please provide the output of

locale
locale -a
localectl
mount

You can rule out fc issues easily by trying a linux console (ctrl+alt+f1)

[davo@Arch-Laptop ~]$ locale
LANG=es_ES.UTF-8
LC_CTYPE=
LC_NUMERIC=
LC_TIME=
LC_COLLATE=
LC_MONETARY=
LC_MESSAGES=
LC_PAPER=
LC_NAME=
LC_ADDRESS=
LC_TELEPHONE=
LC_MEASUREMENT=
LC_IDENTIFICATION=
LC_ALL=
[davo@Arch-Laptop ~]$ locale -a
C
es_ES.utf8
POSIX
[davo@Arch-Laptop ~]$ localectl
   System Locale: LANG=es_ES.UTF-8
       VC Keymap: es
      X11 Layout: n/a
[davo@Arch-Laptop ~]$ mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sys on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
dev on /dev type devtmpfs (rw,nosuid,relatime,size=1914924k,nr_inodes=478731,mode=755)
run on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
/dev/sda6 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/sda2 on /mnt/Disk type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096)
/dev/sda7 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda5 on /boot type ext4 (rw,relatime,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=383968k,mode=700,uid=1000,gid=100)

Offline

#6 2017-08-03 18:51:15

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

The fishy part is all the unset LC variables.

cat /etc/locale.conf
printenv | grep LC_

Offline

#7 2017-08-03 19:07:29

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

seth wrote:

The fishy part is all the unset LC variables.

cat /etc/locale.conf
printenv | grep LC_

I think they have been like that forever and I never had any problems

Offline

#8 2017-08-03 19:09:24

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: Messed up UTF-8 characters

That isn't the output you were asked for. The fact that you "think" they were like that is not the same as knowing what you are doing. If you want help, provide the output asked for.


Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#9 2017-08-03 19:14:57

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

jasonwryan wrote:

That isn't the output you were asked for. The fact that you "think" they were like that is not the same as knowing what you are doing. If you want help, provide the output asked for.

Sorry, here it is

[davo@Arch-Laptop ~]$ cat /etc/locale.conf
LANG="es_ES.UTF-8"
[davo@Arch-Laptop ~]$ printenv | grep LC_
LC_MEASUREMENT=
LC_PAPER=
LC_MONETARY=
LC_NAME=
LC_COLLATE=
LC_CTYPE=
LC_ADDRESS=
LC_NUMERIC=
LC_MESSAGES=
LC_TELEPHONE=
LC_IDENTIFICATION=
LC_TIME=

Offline

#10 2017-08-03 19:50:49

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

Try

ls | iconv -t iso8859-1
ls | iconv -t UTF-8

and see whether either prints the accurate filenames.

The typical cause is that the process that wrote the file used non-unicode encoding while the reading one now does.

But there's also the screenshot that looks like the system prints UTF-8 but the shell interprets them at some iso8859 code or C. For whatever reason....
(lsblk uses ASCII art here when I set the locale to non-unicode)

But we know that your shell unsets most locales - somewhere.
Even if this does not provide an immediate answer, it might lead to the cause.

Typical offenders would be /etc/profile, /etc/environment and shell specific includes as well as startup scripts to your desktop session.
=> try the behavior on a linux console (to rule out the DE session messes things up) and, in case it happens there as well, lookup the mentioned files and (if they're not the cause) tell us which shell you use.

If it's not a problem in the linux console, we need to know about your desktop session and how you log in there (GDM, startx, ...)

Offline

#11 2017-08-03 20:23:25

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

seth wrote:

Try

ls | iconv -t iso8859-1
ls | iconv -t UTF-8

and see whether either prints the accurate filenames.

The typical cause is that the process that wrote the file used non-unicode encoding while the reading one now does.

But there's also the screenshot that looks like the system prints UTF-8 but the shell interprets them at some iso8859 code or C. For whatever reason....
(lsblk uses ASCII art here when I set the locale to non-unicode)

But we know that your shell unsets most locales - somewhere.
Even if this does not provide an immediate answer, it might lead to the cause.

Typical offenders would be /etc/profile, /etc/environment and shell specific includes as well as startup scripts to your desktop session.
=> try the behavior on a linux console (to rule out the DE session messes things up) and, in case it happens there as well, lookup the mentioned files and (if they're not the cause) tell us which shell you use.

If it's not a problem in the linux console, we need to know about your desktop session and how you log in there (GDM, startx, ...)

The filename encoding is UTF-8 (as far as I know)

[davo@Arch-Laptop ~]$ ls | iconv -t iso8859-1
Backup
BackupNew
Bak
Cara A.png
Cara B.png
Carne Estudiante.png
CERTIFICADO.pdf
cevesa-venta-246329.pdf
David Davoiconv: secuencia de entrada ilegal en la posicin 114
[davo@Arch-Laptop ~]$ ls | iconv -t UTF-8
Backup
BackupNew
Bak
Cara A.png
Cara B.png
Carne Estudiante.png
CERTIFICADO.pdf
cevesa-venta-246329.pdf
David Davo 5.odt
Descargas
Desktop
DNI david.pdf
Documents
Im<>genes
InvProy (1).pdf
InvProy.pdf
libinput-list-devices.txt
list-props.txt
M<>sica
Plantillas
P<>blico
Scripts
simkl-kodi2
Sync
test.png
unison.log
V<>deos

Using Konsole (The KDE Terminal), when I try to write some á é í or ú, it simply writes an "?". E.g: "ls M?sica" (Which returns the listing of the folder)
Using my Linux terminal, it displays the oct UTF-8 encoding, so for á, it would be $'\303\241', so Imágenes get displayed as

'Im'$'\303\241''genes'

And when I try to write it, the character gets displayed fine. E.g: "ls Música" (Which also returns the listing of the folder)

I log in using startx, and my .xinitrc only has "exec startkde" on it. My shell is bash.

Btw I started running 'grep -rnw ~/ -e "LC_.*\="', but I've only found git hooks (yet)

Last edited by daviddavo (2017-08-03 20:29:23)

Offline

#12 2017-08-03 20:36:11

loqs
Member
Registered: 2014-03-06
Posts: 17,502

Re: Messed up UTF-8 characters

daviddavo wrote:

my .xinitrc only has "exec startkde" on it.

Xinit#xinitrc  Please see the second note in that section.

Offline

#13 2017-08-03 20:49:58

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

Using my Linux terminal, it displays the oct UTF-8 encoding,

LC_CTYPE=es_ES.UTF-8 ls

Offline

#14 2017-08-03 21:01:15

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Messed up UTF-8 characters

What if you do 'LC_ALL=es_ES.UTF-8 locale' and  'LC_ALL=es_ES.UTF-8 ls'? I have to agree with seth that it looks fishy that your LC variables are not set.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#15 2017-08-03 21:19:11

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

I edited /etc/locale.conf and added LC_CTYPE=es_ES.UTF-8 and now it works

[davo@Arch-Laptop ~]$ printenv | grep LC_
LC_MEASUREMENT=
LC_PAPER=
LC_MONETARY=
LC_NAME=
LC_COLLATE=
LC_CTYPE=es_ES.UTF-8
LC_ADDRESS=
LC_NUMERIC=
LC_MESSAGES=
LC_TELEPHONE=
LC_IDENTIFICATION=
LC_TIME=

Thank you to everybody!

Offline

#16 2017-08-03 21:21:24

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

Something's wrong with either your /etc/profile.d/locale.sh or /etc/profile

Offline

#17 2017-08-03 21:42:57

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

seth wrote:

Something's wrong with either your /etc/profile.d/locale.sh or /etc/profile

This is ran with "startkde"

[davo@Arch-Laptop ~]$ cat .config/plasma-locale-settings.sh 
# Generated script, do not edit
# Exports language-format specific env vars from startkde.
# This script has been generated from kcmshell5 formats.
# It will automatically be overwritten from there.
export LANG=es_ES.UTF-8
export LANGUAGE=es:en_GB

I deleted the file, and my locales are still empty

Last edited by daviddavo (2017-08-03 21:48:47)

Offline

#18 2017-08-03 21:50:34

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

Because:

seth wrote:

Something's wrong with either your /etc/profile.d/locale.sh or /etc/profile

Either the former is broken/nonexistent or the latter doesn't source it.

Offline

#19 2017-08-03 22:04:17

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

seth wrote:

Because:

seth wrote:

Something's wrong with either your /etc/profile.d/locale.sh or /etc/profile

Either the former is broken/nonexistent or the latter doesn't source it.

Is there something wrong with them?

[davo@Arch-Laptop ~]$ cat /etc/profile.d/locale.sh 
#!/bin/sh

if [ -z "$LANG" ]; then
  if [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ]; then
    . "$XDG_CONFIG_HOME/locale.conf"
  elif [ -n "$HOME" ] && [ -r "$HOME/.config/locale.conf" ]; then
    . "$HOME/.config/locale.conf"
  elif [ -r /etc/locale.conf ]; then
    . /etc/locale.conf
  fi
fi

LANG=${LANG:-C}
export LANG
[ -n "$LC_CTYPE" ]          && export LC_CTYPE
[ -n "$LC_NUMERIC" ]        && export LC_NUMERIC
[ -n "$LC_TIME" ]           && export LC_TIME
[ -n "$LC_COLLATE" ]        && export LC_COLLATE
[ -n "$LC_MONETARY" ]       && export LC_MONETARY
[ -n "$LC_MESSAGES" ]       && export LC_MESSAGES
[ -n "$LC_PAPER" ]          && export LC_PAPER
[ -n "$LC_NAME" ]           && export LC_NAME
[ -n "$LC_ADDRESS" ]        && export LC_ADDRESS
[ -n "$LC_TELEPHONE" ]      && export LC_TELEPHONE
[ -n "$LC_MEASUREMENT" ]    && export LC_MEASUREMENT
[ -n "$LC_IDENTIFICATION" ] && export LC_IDENTIFICATION
[davo@Arch-Laptop ~]$ cat /etc/profile
# /etc/profile

#Set our umask
umask 022

# Set our default path
PATH="/usr/local/sbin:/usr/local/bin:/usr/bin"
export PATH

# Load profiles from /etc/profile.d
if test -d /etc/profile.d/; then
        for profile in /etc/profile.d/*.sh; do
                test -r "$profile" && . "$profile"
        done
        unset profile
fi

# Source global bash config
if test "$PS1" && test "$BASH" && test -z ${POSIXLY_CORRECT+x} && test -r /etc/bash.bashrc; then
        . /etc/bash.bashrc
fi

# Termcap is outdated, old, and crusty, kill it.
unset TERMCAP

# Man is much better than us at figuring this out
unset MANPATH

Offline

#20 2017-08-03 22:24:58

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Messed up UTF-8 characters

What's in your /etc/locale.conf ?


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#21 2017-08-03 23:33:29

loqs
Member
Registered: 2014-03-06
Posts: 17,502

Re: Messed up UTF-8 characters

@ROOKIE /etc/locale.conf is in post #9
The output of the following please (may well be none and assumes shell is bash or compatible)

$ [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ] && echo "XDG_CONFIG_HOME/locale.conf has contents" && cat "$XDG_CONFIG_HOME/locale.conf"
$ [ -n "$HOME" ] && [ -r "$HOME/.config/locale.conf" ] && echo "$HOME/.config/locale.conf has contents" && cat "$HOME/.config/locale.conf"

Edit:
For the LC_* vars set to an empty string https://bugs.archlinux.org/task/54988
Edit2:
Replaced

$ [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ] && echo "$HOME/.config/locale.conf has contents" && cat "$XDG_CONFIG_HOME/locale.conf"

with

$ [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ] && echo "XDG_CONFIG_HOME/locale.conf has contents" && cat "$XDG_CONFIG_HOME/locale.conf"

thanks seth

Last edited by loqs (2017-08-04 11:13:30)

Offline

#22 2017-08-04 06:15:24

seth
Member
Registered: 2012-09-03
Posts: 51,956

Re: Messed up UTF-8 characters

Should be "$XDG_CONFIG_HOME" instead of "$HOME" in the first line.

@daviddavo, nothing wrong with those files - just I'm still on systemd 233 (time for an update) and we're witnessing the ongoing and interesting study on exactly how many different ways systemd can break your system ... m(

Debian meanwhile just reverted the breaking patch.

As a workaround for the particular issue one could probably explicitly unset empty values in /etc/profile.d/locale.sh

Offline

#23 2017-08-04 08:23:11

daviddavo
Member
Registered: 2015-12-30
Posts: 64

Re: Messed up UTF-8 characters

loqs wrote:

@ROOKIE /etc/locale.conf is in post #9
The output of the following please (may well be none and assumes shell is bash or compatible)

$ [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ] && echo "$HOME/.config/locale.conf has contents" && cat "$XDG_CONFIG_HOME/locale.conf"
$ [ -n "$HOME" ] && [ -r "$HOME/.config/locale.conf" ] && echo "$HOME/.config/locale.conf has contents" && cat "$HOME/.config/locale.conf"

Edit:
For the LC_* vars set to an empty string https://bugs.archlinux.org/task/54988

Nothing

[davo@Arch-Laptop ~]$ [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ] && echo "$HOME/.config/locale.conf has contents" && cat "$XDG_CONFIG_HOME/locale.conf"
[davo@Arch-Laptop ~]$ [ -n "$XDG_CONFIG_HOME" ] && [ -r "$XDG_CONFIG_HOME/locale.conf" ] && echo "$HOME/.config/locale.conf has contents" && cat "$XDG_CONFIG_HOME/locale.conf"

Offline

#24 2017-08-04 10:11:31

ugjka
Member
From: Latvia
Registered: 2014-04-01
Posts: 1,815
Website

Re: Messed up UTF-8 characters

seth wrote:

Should be "$XDG_CONFIG_HOME" instead of "$HOME" in the first line.

@daviddavo, nothing wrong with those files - just I'm still on systemd 233 (time for an update) and we're witnessing the ongoing and interesting study on exactly how many different ways systemd can break your system ... m(

Debian meanwhile just reverted the breaking patch.

As a workaround for the particular issue one could probably explicitly unset empty values in /etc/profile.d/locale.sh

So it is systemd fault? I just had some similar problem with some LC_* being unset https://bbs.archlinux.org/viewtopic.php?id=228781


https://ugjka.net
paru > yay | webcord > discord
pacman -S spotify-launcher
mount /dev/disk/by-...

Offline

#25 2017-08-04 12:25:48

loqs
Member
Registered: 2014-03-06
Posts: 17,502

Re: Messed up UTF-8 characters

seth wrote:

As a workaround for the particular issue one could probably explicitly unset empty values in /etc/profile.d/locale.sh

Possibly something like the following (placed in /etc/profile.d/0_fix.sh to leave locale.sh alone)

$ for lcv in $(env | grep ^LC_); do [ -z $(echo $lcv | cut -d= -f2) ] && unset $(echo $lcv | cut -d= -f1); done

Offline

Board footer

Powered by FluxBB