You are not logged in.

#1 2010-09-25 12:16:53

jiehong
Member
Registered: 2009-03-19
Posts: 63
Website

[Solved] Bash scripting and sed substitution

Hello!

I am writing a script in order to substitute strings from one array to another one in texts.

For only one case it is working as the following :

sed '/ā/s/\(.*\)ā\(.*\)/\1a\21/g' temp.txt > temp2.txt

which converts ā in a word by the same word with a normal "a" and the number 1 at the end of the word (māng > mang1)

For many cases i've made some arrays and containing the rules in a srcipt file :

# These are the 4 databases containing the strings that are suposed to be replaced

data1[1]=ā
data1[2]=ē
data1[3]=ī
data1[4]=ō
data1[5]=ū
data1[6]=ǖ

data2[7]=á
data2[8]=é
data2[9]=í
data2[10]=ó
data2[11]=ú
data2[12]=ǘ

data3[13]=ǎ
data3[14]=ě
data3[15]=ǐ
data3[16]=ǒ
data3[17]=ǔ
data3[18]=ǚ

data4[19]=à
data4[20]=è
data4[21]=ì
data4[22]=ò
data4[23]=ù
data4[24]=ǜ

# This is the data base of output correspondances
data[1]=a
data[2]=e
data[3]=i
data[4]=o
data[5]=u
data[6]=ü

count=1
for base in {1..4} # For each database
do
    for case in {1..6} # For each case
    do
        sed "/${data${base}[$count]}/s/\(.*\)${data${base}[$count]}\(.*\)/\1${data[$case]}\2$base/g" temp.txt > temp2.txt
        let "count+=1" #go to the next case in the database
        cat temp2.txt > temp.txt
    done
done 

I have a substitution issue in the sed line. In fact I am trying to make a double substitution and it doesn't works.
Like the first substitution ${data${base}[$count]} make 3 substitutions at a time… but I can't make it to work.
In that case it would give me, for instance, the string contained in data2[3].

I hope you understand what i mean. And i'd like to know how to deal with that substitution issue if you have an idea…

Last edited by jiehong (2010-09-26 07:49:25)

Offline

#2 2010-09-25 18:41:16

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [Solved] Bash scripting and sed substitution

1. There is this
# var=foo
# foo=bar
# echo ${!var}
bar

I don't know how to use two variables though.
# vara=f
# varb=oo
# foo=bar
# echo ${!$vara$varb}
Bad substitution!

2. eval can do it in non-complex cases by passing the special characters you need  $ and {} by just using single quotes or escapes.
# data1[1]=A
# data[1]=a
# set=1
# vowel=1
# echo BAR | sed "s/$(eval echo \$\{data$set[$vowel]\})/$(eval echo \$\{data[$vowel]\})/g"
BaR

Offline

#3 2010-09-25 19:47:44

skanky
Member
From: WAIS
Registered: 2009-10-23
Posts: 1,847

Re: [Solved] Bash scripting and sed substitution

Try using ' for the sed call and breaking the ' quotes. Eg:

sed '/'data${base}[$count]'/s/\(.*\)'data${base}[$count]'\(.*\)/\1'${data[$case]}'\2'$base'/g'  etc.

Note, I may have missed/messed a quote or other character in that, so check it first.


"...one cannot be angry when one looks at a penguin."  - John Ruskin
"Life in general is a bit shit, and so too is the internet. And that's all there is." - scepticisle

Offline

#4 2010-09-25 20:24:48

jiehong
Member
Registered: 2009-03-19
Posts: 63
Website

Re: [Solved] Bash scripting and sed substitution

I've implemented what Procyon told in the part 2 and it's working with a small adaptation, which is great!!

I've just an issue now because the number will go right after a word but at the end of the ligne… even if words are spaced by a space… like :

hǎo
hào
wō wó wǒ wò wo

become :

hao3
hao4
wo wo wo wo wo1234

my sed ligne is now :

sed "/$(eval echo \$\{data$base[$count]\})/s/\(.*\)$(eval echo \$\{data$base[$count]\})\(.*\)/\1${data[$case]}\2$base/g" temp.txt > temp2.txt

Last edited by jiehong (2010-09-25 20:27:57)

Offline

#5 2010-09-25 22:45:50

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [Solved] Bash scripting and sed substitution

@jiehong: You have to replace .* (match any character) with [^ ]* match non-space characters.

You don't need the initial address search either, because a failed substitution command leaves the line intact.

sed "s/\([^ ]*\)$(eval echo \$\{data$base[$count]\})\([^ ]*\)/\1${data[$case]}\2$base/g" temp.txt > temp2.txt

This still won't work for words with two syllables though. I.e. pīnyīn becomes pinyin11 and not pin1yin1.

Offline

#6 2010-09-26 07:48:58

jiehong
Member
Registered: 2009-03-19
Posts: 63
Website

Re: [Solved] Bash scripting and sed substitution

I see. I actually tried with \(.\+\) instead of \(.*\) but it did not work either. and I couldn't found out how to stop at a space.

For words with 2 syllabes I don't think it's really possible… well, it would need a more complicated analysis of words…
So thanks Procyon because it works.

I decided to make that script to get me used to regular expressions (and I need that script too) and it's not that easy but very powerful.

Thanks for your time smile

Offline

Board footer

Powered by FluxBB