You are not logged in.

#1 2009-09-12 20:27:04

dmlawrence
Member
Registered: 2008-10-01
Posts: 21

Text processing question

I have a vCard file that I need to modify by removing all non-digit characters from telephone numbers.

Here's the exact process that needs to occur:
If a line begins with "TEL", then remove all non-digit characters from the portion of the line following ":".  (Every line contains only one colon.)

I have no experience with sed or awk, but I was able to hack a solution together using awk:

awk -F: '/^TEL/ { gsub(/[^[:digit:]]/, "", $2); print $1 ":" $2 } !/^TEL/ { print }'

Is there a simpler way to accomplish my goal?  The code above feels somewhat redundant.

I initially attempted to use sed, but I couldn't figure out how to remove non-digits from only part of a line.

-David

Offline

#2 2009-09-13 01:25:36

tlvb
Member
From: Sweden
Registered: 2008-10-06
Posts: 297
Website

Re: Text processing question

I think this works:

sed ":x;s/^\(TEL.*:[^0-9]*\)[0-9]\+/\1/;tx"

Last edited by tlvb (2009-09-13 02:17:51)


I need a sorted list of all random numbers, so that I can retrieve a suitable one later with a binary search instead of having to iterate through the generation process every time.

Offline

#3 2009-09-13 12:32:40

dmlawrence
Member
Registered: 2008-10-01
Posts: 21

Re: Text processing question

That actually does the opposite of what I need, but this works:

sed ":x;s/^\(TEL.*:[0-9]*\)[^0-9]\+/\1/;tx"

What does the "tx" do?  That is the piece that I was missing before.

Thanks.

Offline

#4 2009-09-13 13:13:15

tlvb
Member
From: Sweden
Registered: 2008-10-06
Posts: 297
Website

Re: Text processing question

Haha, I blame that it was late when I began coding it, but it's a simple move of a ^ as you've already done.
The explanation of tx is that I could only get the s/// to substitute the first number of digits, eg TEL11:22x33 -> TEL11:x33 instead of -> TEL11:x
so what I do is that in the beginning I define a label x, with ":x" and in the end I jump to label x if a s/// successfully substituted something with "tx".
IE
"TEL11:22x33" --[s///]-> "TEL11:x33"
s/// was successfull, jump to x
"TEL11:x33" --[s///]-> "TEL11:x"
s/// was successfull, jump to x
"TEL11:x" --[s///]-> "TEL11:x"
s/// was unsuccessfull, parse next next row of input

Last edited by tlvb (2009-09-13 13:13:58)


I need a sorted list of all random numbers, so that I can retrieve a suitable one later with a binary search instead of having to iterate through the generation process every time.

Offline

#5 2009-09-13 15:18:53

dmlawrence
Member
Registered: 2008-10-01
Posts: 21

Re: Text processing question

That makes sense.  Thanks.

I was able to simplify the script a bit:

sed ":x;s/\(^TEL.*:.*\)[^0-9]/\1/;tx"

works as well.

Last edited by dmlawrence (2009-09-13 15:34:07)

Offline

Board footer

Powered by FluxBB