You are not logged in.

#1 2008-11-11 03:52:35

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

[SOLVED] Unreadable id3 tags, unknown encoding

I have a some Japanese songs whose id3 tags are unreadable in easytag, id3info and mpd. They all say the tags are a bunch of question marks, but English text (including & and such) is displayed.

Of all the apps I found in the repos, mutagen was the most likely candidate to convert it with mid3iconv, with the -p and -d switches so it reports back and doesn't alter it (many apps will happily overwrite wrong data), and the -e switch with various encodings to try it out.

I tried the following, which will show the tags decoded from all the encodings that exist with iconv. It was the most thorough method I could think of.

for ENCODING in $(iconv --list | sed 's/..$//'); do
echo -----
echo $ENCODING
mid3iconv -p -d -e $ENCODING track\ 03.mp3
read
done

1167 times pressing enter later, none were correct!
~95% was question marks.
Sometimes the tags were blank and text that was English became garbled.
With some UTF-16 encodings a bunch of the same (wrong) Kanji appeared in only 1 field.

Does anyone have a good idea what I can try now?

I have uploaded an example file, so you can try out converting it, but I'm not sure if it's allowed to be posted on the forums.

I'll try foobar2000 in wine tomorrow.

Last resort is finding all the album info online and updating all the tags manually (not really looking forward to that though, since it's 900 mp3s).

Last edited by Procyon (2009-02-23 17:31:21)

Offline

#2 2008-11-11 11:42:54

EnvoyRising
Member
Registered: 2008-08-08
Posts: 118

Re: [SOLVED] Unreadable id3 tags, unknown encoding

Picard will automatically tag files according to the entries in the musicbrainz database. There aren't a lot of Korean albums in that database, but perhaps you will have better luck with your Japanese albums?

Also, instead of easytag, consider exfalso. It read all my utf-8 encoded tags flawlessly.

lastly, which client are you using for mpd? I've used gmpc (still using actually), mpc, and ncmpc and they all read my utf-8 tags just fine. (for the latter two I imagine this depends on having a terminal that supports unicode)

Offline

#3 2008-11-11 12:14:47

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

They aren't utf-8, because lots of other unicode stuff works great. Looking into ~/.mpd/mpd.db with firefox in different Japanese encodings also didn't help.

I only found 2/26 albums on musicbrainz.

I will give exfalso a try.

I also tried out foobar2000 in wine, and it can read everything but does not seem to be able to change the tags to unicode. So the information is there, and it really is just an encoding problem.

Offline

#4 2008-11-11 12:44:07

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

Exfalso wasn't able to read them. It could read my other unicode mp3s fine though.

I will try out various Windows retaggers. EDIT: Can't find one. If you know a good one let me know.

Last edited by Procyon (2008-11-11 13:09:36)

Offline

#5 2008-11-11 20:00:23

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

I could not fix the problem. It's hard to find information on this.

I resorted to manually editing the tags. I found album info for all in the same format. And I already made an ex script (sourceable in vim for fine-tuning!) that will make an id3v2 script out of it. (plus it turned out to be only 300 mp3s)

If anyone still wants to take a crack at it, if you post here I'll e-mail you the link to the example mp3 file.

Offline

#6 2008-11-11 20:20:21

Nezmer
Member
Registered: 2008-10-24
Posts: 559
Website

Re: [SOLVED] Unreadable id3 tags, unknown encoding

I used to 'fix' some non-Unicode tags using EasyTag .

'Settings>Preferences>ID3 Tag Settings'

you can specify the input(reading) encoding and the output(writing) encoding .

Reading : the Japanese encoding the tags are using .
Writing : UTF-8 .

Just select all the files in a folder and save .


English is not my native language .

Offline

#7 2008-11-11 20:36:42

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

In easytag I tried out the Japanese encodings and the Windows encodings. None work.

Offline

#8 2009-02-22 00:24:03

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

I need to bump this.

I have some new files (over 200), the filenames are UTF-8, but the tags are in unknown encoding. It's not sjis, eucjp or iso-2022-jp.

id3info gives stuff like:
|fMr DJCD Ĥ@
hexdump of that string from the file:
C7 DA C7 E6 C7 7C 93 DB C6 F2 C7 66 C7 4D C7 72 20 44 4A 43 44 20 B2 C4 A4 40 A8 F7

Easytag thinks the tags are empty.

I tried to convert it with iconv, but none of the encodings work. I also tried to open the output in firefox but nothing there works either.

EDIT:

I noticed iconv behaved differently with catting a file that just has the code above (the hexadecimal) instead of piping from id3info.
I even got kana with: BIG5HKSCS
The only problem is that this encoding doesn't display kanji, but quite different Chinese characters.

Last edited by Procyon (2009-02-22 01:25:36)

Offline

#9 2009-02-22 11:23:37

GogglesGuy
Member
From: Rocket City
Registered: 2005-03-29
Posts: 610
Website

Re: [SOLVED] Unreadable id3 tags, unknown encoding

There's always the possibility those tags were written out incorrectly. It would be handy to know what the supposedly text should be. The hex dump put through iconv gives these back (in UTF-8):

SHIFT-JIS
ヌレヌ貮|呑ニiconv: illegal input sequence at position 9
-----
SHIFT_JIS
ヌレヌ貮|呑ニiconv: illegal input sequence at position 9
-----
SHIFT_JISX0213
ヌレヌ貮|呑ニ怤fヌMヌr DJCD イト、@ィiconv: illegal input sequence at position 27
-----
SJIS-OPEN
ヌレヌ貮|呑ニfヌMヌr DJCD イト、@ィiconv: illegal input sequence at position 27
-----
SJIS-WIN
ヌレヌ貮|呑ニfヌMヌr DJCD イト、@ィiconv: illegal input sequence at position 27
-----
SJIS
ヌレヌ貮|呑ニiconv: illegal input sequence at position 9

If the first part of these strings makes sense, then perhaps there's some invalid input in the tags.

Offline

#10 2009-02-22 11:45:47

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

I almost figured it out, it's definitely the Traditional Chinese encoding, BIG5.

Can you try the output of BIG5 and BIG5HKSCS?

I got the correct string by opening the file in Firefox and choosing the encoding there:

BIG5:
マリア様がみてる DJCD 第一卷
BIG5HKSCS:
マリア昞がみてる DJCD 第一卷

With iconv gives the same wrong string for BIG5HKSCS, but it gives 'illegal input sequence' for BIG5.

Easytag seems to use iconv and gives the same result (any time there is an illegal output in iconv, the tag will become completely empty)

So why does iconv fail with BIG5?

Offline

#11 2009-02-22 17:34:42

GogglesGuy
Member
From: Rocket City
Registered: 2005-03-29
Posts: 610
Website

Re: [SOLVED] Unreadable id3 tags, unknown encoding

BIG-5
iconv: illegal input sequence at position 6
------
BIG5-HKSCS
マリア昞がみてる DJCD 第一卷
-----

It fails because the input is only partially Big-5. The first three letters are Big-5

0xC7DA 0xC7E6 0xC77C

But then comes:

0x93DB

Which is not part of the original Big-5, but rather part of the Hong Kong Supplementary Character Set. Some software will automatically use the HKSCS even if only BIG-5 is selected (see the article regarding codepage 950/951 on Windows) . Iconv won't, which is the correct thing to do.

Google Translate gives me:

I have one昞Maria DJCD卷

So it seems alright!

Last edited by GogglesGuy (2009-02-22 18:15:12)

Offline

#12 2009-02-22 18:19:14

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [SOLVED] Unreadable id3 tags, unknown encoding

Exactly the same problem it seems.

I found out that BIG5 has extensions.

HKSCS (Hong Kong Cantonese) is the wrong extension, because 昞 is the wrong character.

What I need is an extension that includes Japanese-only characters (like 様). So the one Firefox uses.

Well, in the mean time I solved the problem by dumping all the id3tags to one text file, using Firefox to convert it, and making one large id3tag script (went pretty fast in vim).

But for the future it would be great to have easytag use more BIG5 extensions.

EDIT:

It seems we did some similar research in the mean time.

The problem I mentioned above was about the iconv illegal character thing.

EDIT:

I am marking the thread as solved. A workaround seems to be the only way. It is impossible to get easytag to work this way. You can't manually add encodings, and glibc-devs seem to see this as very low priority.

Last edited by Procyon (2009-02-23 17:31:10)

Offline

Board footer

Powered by FluxBB