You are not logged in.

#1 2010-10-24 09:57:58

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,671
Website

sed to substitute out "bad" words in an array

I need to have sed read in a file (single column of words) and remove any that are on a blacklist as defined in an array.  I'm kinda there but I don't know how to do process the entire array and THEN write out the resulting "censored" file.

Example input list (let's call the file 'foo'):

corn
romaine
arugula
keyboard
carrot
yogurt
peas

Here is the bash script that only works when there is exactly one word in the ignore array:

#!/bin/bash
ignore=(keyboard yogurt)
for i in ${ignore}; do
 sed -e "s/$i//g" -e '/^$/d' foo>bar
done

The flaw in my design here I think is that I keep overwriting 'bar' rather than doing all the processing, then writing out 'bar'.  Suggestions are welcomed - I'm always glad to learn and thanks in advance.

Last edited by graysky (2010-10-24 10:03:37)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2010-10-24 10:17:36

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,487
Website

Re: sed to substitute out "bad" words in an array

Are you sure this is not homework?  tongue

Offline

#3 2010-10-24 10:23:03

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,671
Website

Re: sed to substitute out "bad" words in an array

@Allan - Homework... I haven't done homework in more years than I care to recall!  I want to implement this feature in my AUR package for modprobed_db.  The script within keeps track of kernel modules probed so that users can build a kernel with 'make localmodconfig' and only get the modules they need.  The blacklist array will contain modules that get built by packages like virtualbox and nvidia, for example: (nvidia vboxdrv vboxnetflt vboxnetadp) that we don't want probed for the compilation.  In the real case, the input file will be /var/log/modprobe.long that my script in that package generates - I only presented the problem in simple "salad" terms to make it easier for the community to see what I want to do tongue

Last edited by graysky (2010-10-24 10:29:47)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#4 2010-10-24 10:35:24

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,487
Website

Re: sed to substitute out "bad" words in an array

OK then...  had to check tongue

allan@mugen ~ 
> cat foo.txt 
one
two
three

allan@mugen ~ 
> sed '/^two$/d' foo.txt 
one
three

Offline

#5 2010-10-24 10:38:56

portix
Member
Registered: 2009-01-13
Posts: 757

Re: sed to substitute out "bad" words in an array

If you use an array it has to be

for i in ${ignore[*]}; do ...

to iterate over all elements.

Offline

#6 2010-10-24 10:44:27

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,671
Website

Re: sed to substitute out "bad" words in an array

@allan - I can do that part... I just can't parlay the code into working with my array.

@portix - I tried your suggestion but got the same result.

#!/bin/bash
ignore=(keyboard yogurt)
for i in ${ignore[*]}; do
 sed -e "s/$i//g" -e '/^$/d' foo>bar
done

So 'bar' still contains the blacklisted word 'keyboard':

corn
romaine
arugula
keyboard
carrot
peas

Last edited by graysky (2010-10-24 10:45:45)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#7 2010-10-24 11:07:12

olvar
Member
Registered: 2009-11-13
Posts: 97

Re: sed to substitute out "bad" words in an array

after your sed command you may want to move the new file to the old:

for i in.... ; then
   sed .... foo > bar
   mv bar foo
fi

so you are actually filtering the stuff out

Last edited by olvar (2010-10-24 11:08:30)

Offline

#8 2010-10-24 14:06:49

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: sed to substitute out "bad" words in an array

build a filter and call sed once. backup optional.

modfilter=""
for ign in "${ignore[@]}"; do
  modfilter+='/\<'"$ign"'\>/d;'
done

sed -i.orig "$modfilter" inputlist

Offline

#9 2010-10-24 23:39:20

barto
Member
From: Budapest, Hungary
Registered: 2009-10-22
Posts: 88

Re: sed to substitute out "bad" words in an array

I would do the filtering with grep -v, using extended regexp.

#!/bin/bash
ignore=(keyboard yogurt)
grep -Ev "`echo ${ignore[*]} | sed 's/ /|/g'`" foo > bar

“First principle, Clarice. Simplicity” – Dr. Hannibal Lecter

Offline

#10 2010-10-24 23:52:54

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: sed to substitute out "bad" words in an array

Word boundaries are important here. You probably don't want to ignore snd-sgalaxy, sg, videobuf-dma-sg, and ipmi_msghandler if you only put 'sg' in the ignore array.

Offline

#11 2010-10-25 06:56:30

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,671
Website

Re: sed to substitute out "bad" words in an array

@barto - thanks for the code!  I updated my package accordingly.

Last edited by graysky (2010-10-25 06:56:49)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#12 2010-10-25 23:17:44

barto
Member
From: Budapest, Hungary
Registered: 2009-10-22
Posts: 88

Re: sed to substitute out "bad" words in an array

@falconindy - Thanks for your notice, you are right.
Adding the option -x to grep to make it match whole lines should solve the issue.


“First principle, Clarice. Simplicity” – Dr. Hannibal Lecter

Offline

Board footer

Powered by FluxBB