You are not logged in.
Hello!
I am writing a script in order to substitute strings from one array to another one in texts.
For only one case it is working as the following :
sed '/ā/s/\(.*\)ā\(.*\)/\1a\21/g' temp.txt > temp2.txt
which converts ā in a word by the same word with a normal "a" and the number 1 at the end of the word (māng > mang1)
For many cases i've made some arrays and containing the rules in a srcipt file :
# These are the 4 databases containing the strings that are suposed to be replaced
data1[1]=ā
data1[2]=ē
data1[3]=ī
data1[4]=ō
data1[5]=ū
data1[6]=ǖ
data2[7]=á
data2[8]=é
data2[9]=í
data2[10]=ó
data2[11]=ú
data2[12]=ǘ
data3[13]=ǎ
data3[14]=ě
data3[15]=ǐ
data3[16]=ǒ
data3[17]=ǔ
data3[18]=ǚ
data4[19]=à
data4[20]=è
data4[21]=ì
data4[22]=ò
data4[23]=ù
data4[24]=ǜ
# This is the data base of output correspondances
data[1]=a
data[2]=e
data[3]=i
data[4]=o
data[5]=u
data[6]=ü
count=1
for base in {1..4} # For each database
do
for case in {1..6} # For each case
do
sed "/${data${base}[$count]}/s/\(.*\)${data${base}[$count]}\(.*\)/\1${data[$case]}\2$base/g" temp.txt > temp2.txt
let "count+=1" #go to the next case in the database
cat temp2.txt > temp.txt
done
done
I have a substitution issue in the sed line. In fact I am trying to make a double substitution and it doesn't works.
Like the first substitution ${data${base}[$count]} make 3 substitutions at a time… but I can't make it to work.
In that case it would give me, for instance, the string contained in data2[3].
I hope you understand what i mean. And i'd like to know how to deal with that substitution issue if you have an idea…
Last edited by jiehong (2010-09-26 07:49:25)
Offline
1. There is this
# var=foo
# foo=bar
# echo ${!var}
bar
I don't know how to use two variables though.
# vara=f
# varb=oo
# foo=bar
# echo ${!$vara$varb}
Bad substitution!
2. eval can do it in non-complex cases by passing the special characters you need $ and {} by just using single quotes or escapes.
# data1[1]=A
# data[1]=a
# set=1
# vowel=1
# echo BAR | sed "s/$(eval echo \$\{data$set[$vowel]\})/$(eval echo \$\{data[$vowel]\})/g"
BaR
Offline
Try using ' for the sed call and breaking the ' quotes. Eg:
sed '/'data${base}[$count]'/s/\(.*\)'data${base}[$count]'\(.*\)/\1'${data[$case]}'\2'$base'/g' etc.
Note, I may have missed/messed a quote or other character in that, so check it first.
"...one cannot be angry when one looks at a penguin." - John Ruskin
"Life in general is a bit shit, and so too is the internet. And that's all there is." - scepticisle
Offline
I've implemented what Procyon told in the part 2 and it's working with a small adaptation, which is great!!
I've just an issue now because the number will go right after a word but at the end of the ligne… even if words are spaced by a space… like :
hǎo
hào
wō wó wǒ wò wo
become :
hao3
hao4
wo wo wo wo wo1234
my sed ligne is now :
sed "/$(eval echo \$\{data$base[$count]\})/s/\(.*\)$(eval echo \$\{data$base[$count]\})\(.*\)/\1${data[$case]}\2$base/g" temp.txt > temp2.txt
Last edited by jiehong (2010-09-25 20:27:57)
Offline
@jiehong: You have to replace .* (match any character) with [^ ]* match non-space characters.
You don't need the initial address search either, because a failed substitution command leaves the line intact.
sed "s/\([^ ]*\)$(eval echo \$\{data$base[$count]\})\([^ ]*\)/\1${data[$case]}\2$base/g" temp.txt > temp2.txt
This still won't work for words with two syllables though. I.e. pīnyīn becomes pinyin11 and not pin1yin1.
Offline
I see. I actually tried with \(.\+\) instead of \(.*\) but it did not work either. and I couldn't found out how to stop at a space.
For words with 2 syllabes I don't think it's really possible… well, it would need a more complicated analysis of words…
So thanks Procyon because it works.
I decided to make that script to get me used to regular expressions (and I need that script too) and it's not that easy but very powerful.
Thanks for your time
Offline