You are not logged in.

#1 2009-06-08 14:45:38

ArchArael
Member
Registered: 2005-06-14
Posts: 504

Suggestion about string substitution

Hi guys,

I have been asked to modify a bunch of html files exported by a program named...mindmeister I guess.

It is a sort of faq and it is structured this way so the user can jump from a faq to another easily.

Well, I have something like 100 files or more and in any file I have the navigation repeated.

So in any page on there are 100 same links to other pages i order to be able to navigate through.

I have been asked to insert another page. This is the second time I have being asked to modify this faq list.

This is not a problem for me. The first time I just used Dreamweaver and I inserted easily another file. And I'm going to use it again because I cannot see how to do the same thing easily and simply

I just was wondering how could I done this without Dreamweaver.

I could use sed, sure, but imagine putting this in sed sintax:

This is the part of the navigation I have to modify in every file.

<div class="maintopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionedirelazioni.html"> 4. Estinzione di relazioni </a></span>
  </div>
  <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Dapartediprocuratore.html"> 4.1 Da parte di procuratore </a></span>
  </div>
  <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionefineanno.html"> 4.2 Estinzione fine anno </a></span>
  </div>
  <div class="maintopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="(NUOVO)FormularioA,ReT.html"> 5. (NUOVO) Formulario A, R e T </a></span>
  </div>

Now suppose I wanted to add faq 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript

My new navigation should be this one:

<div class="maintopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionedirelazioni.html"> 4. Estinzione di relazioni </a></span>
  </div>
  <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Dapartediprocuratore.html"> 4.1 Da parte di procuratore </a></span>
  </div>
  <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionefineanno.html"> 4.2 Estinzione fine anno </a></span>
  </div>
 <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Crappy proprietary program.html"> 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript </a></span>
  </div>
  <div class="maintopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="(NUOVO)FormularioA,ReT.html"> 5. (NUOVO) Formulario A, R e T </a></span>
  </div>

How would you do this? Doing this in sed would be unconfortable and time consuming.

I can do this with dreamweaver easily by just using code substitution in all files and the specifying to substitute this:

  <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionefineanno.html"> 4.2 Estinzione fine anno </a></span>
  </div>

with this:

 <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionefineanno.html"> 4.2 Estinzione fine anno </a></span>
  </div>
 <div class="subtopic">
    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Crappy proprietary program.html"> 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript </a></span>
  </div>

I hope this is just my own and not unix filters limit.

This is just my curiosity. As I wrote before I can accomplish this task with dreamwaver, but I don't like much that approach.

I would like something more unix.

Edited because of errors notified by Procyon.

Last edited by ArchArael (2009-06-09 21:42:08)

Offline

#2 2009-06-08 15:26:34

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: Suggestion about string substitution

That last one isn't right, right? You're substituting a maintopic with two subtopics.

Anyway it looks like you can just append something after you find the text that matches that last subtopic. This sed command will look for subtopic and then "4.2 Estinzione" and if it matches append the new link a line after.

for file in *.html; do
sed -i '/<div class="subtopic">/{n;s/4\.2 Estinzione/&/;T;n;
a \  <div class="subtopic">
a \    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Crappy proprietary program.html"> 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript </a></span>
a \  </div>
}' $file
done

Offline

#3 2009-06-09 08:25:55

ArchArael
Member
Registered: 2005-06-14
Posts: 504

Re: Suggestion about string substitution

Thank you Procycon. You were right and I have corrected my opening post.

I know I could use sed, as I wrote in the opening post, but basically I was searching for some simpler solution.

In any case some chars must be quoted although in sed is possible to disable globing. Maybe this step is just unavoidable.

I could also have new lines or spaces so the initial pattern could not be the same in all pages. So that's another problem.

Anyway I will study and try your solution. Thank you.

Last edited by ArchArael (2009-06-09 08:28:34)

Offline

#4 2009-06-09 13:27:42

Cerebral
Forum Fellow
From: Waterloo, ON, CA
Registered: 2005-04-08
Posts: 3,108
Website

Re: Suggestion about string substitution

I'm not sure how much simpler you expect to get than Procyon's example, to be honest.   It does exactly what you used to do (basic search and insert after).  What in your opinion would be more simple?

I can even expand it out and comment it for you to make it clearer:

for file in *.html; do       # Process every html file in the current dir
    sed -i '
/<div class="subtopic">/ {   # If we see a line that looks like <div class="subtopic">, then perform the commands between { }
    n;                       # Read in the following line from the file (ie <img src="blah blah...)
    s/4\.2 Estinzione/&/;    # Attempt to replace "4.2 Estinzione" with itself (ie. no end effect, performed for the sake of the next command)
    T;                       # If the previous replacement failed, skip the remainder of the commands (ie. if "4.2 Estinzione" exists on this line, continue, otherwise stop)
    n;                       # Read in the following line from the file (ie. </div>)
                             # The following 3 lines say "after the current line, insert this line"
a \  <div class="subtopic"> 
a \    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Crappy proprietary program.html"> 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript </a></span>
a \  </div>
}                            # end of block of actions to execute when you see <div class="subtopic"> -- note, leave this quote-backslash here, it is necessary -->' \
    $file                    # do the previous actions on the current HTML file
done

(note that this is still a valid sed script even with the comments - you could copy and paste this and it would run. smile

Offline

#5 2009-06-09 20:43:04

ArchArael
Member
Registered: 2005-06-14
Posts: 504

Re: Suggestion about string substitution

Thank you for comments Cerebral. Probably with gnu filters there aren't simpler solutions. Comparing Dreamweaver and gnu filters is like comparing oranges and apples so sorry about that. The script is nice but as I can see it doesn't cover all cases. I'm not sure that every div is formated the same way. Perhaps I could have this:

<div class="subtopic"><img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionefineanno.html"> 4.2 Estinzione fine anno </a></span></div>

Or this:

<div class="subtopic">

<img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Estinzionefineanno.html"> 4.2 Estinzione fine anno </a></span>


</div>

Or some other case similar to the previous two mixing tabs and spaces. In Dreamweaver I can substitute code without wondering about spaces or tabs.

How do I resolve these cases in sed? Would this be easier to do using tr or some other filter?

Offline

#6 2009-06-09 21:19:29

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: Suggestion about string substitution

You can make it pretty complex, but it's not exactly the right tool.

This will just search for <a href="Estinzionefineanno.html">, if you want the subtopic one too it shouldn't be too hard to change.
It will look for a </div> after that link. Then it will make some linebreaks (for the next sed command, I'm not sure if it can all go in one). One before the link, so there aren't <div>s before it and one after every </div>
That way the </div> we want to append to will be really easy to find.

for file in *.html; do
sed '/<a href="Estinzionefineanno.html">/{s/<a href="Estinzionefineanno.html">.*<\/div>/&/;tdontsearch;:finddiv;N;s/<a href="Estinzionefineanno.html">.*<\/div>/&/;Tfinddiv;:dontsearch;s/<a href="Estinzionefineanno.html">/\n&/;s/<\/div>/&\n/g}' "$file" | sed '/<a href="Estinzionefineanno.html">/{s/<\/div>/&/;tdontsearch;:finddiv;N;s/<\/div>/&/;Tfinddiv;:dontsearch;
a \  <div class="subtopic"> 
a \    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Crappy proprietary program.html"> 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript </a></span>
a \  </div>
}' > "$file".new
mv "$file".new "$file"
done

edit: extra check
edit2: and you will get newlines directly after <span> that will give problems right, but you can prepend that information before the substitution/search if it always appears that way anyway.

Last edited by Procyon (2009-06-09 21:32:31)

Offline

#7 2009-06-09 21:25:51

Gen2ly
Member
From: Sevierville, TN
Registered: 2009-03-06
Posts: 1,529
Website

Re: Suggestion about string substitution

Procyon wrote:
for file in *.html; do
sed -i '/<div class="subtopic">/{n;s/4\.2 Estinzione/&/;T;n;
a \  <div class="subtopic">
a \    <img src="Res/images/arrow.gif" alt=""><span class="unselect"><a href="Crappy proprietary program.html"> 4.3 Crappy hardcoded navigation due to crappy proprietary program unable to export the navigation as javascript </a></span>
a \  </div>
}' $file
done

Ohohooho big_smile


Setting Up a Scripting Environment | Proud donor to wikipedia - link

Offline

#8 2009-06-09 21:47:08

ArchArael
Member
Registered: 2005-06-14
Posts: 504

Re: Suggestion about string substitution

Procyon wrote:

You can make it pretty complex, but it's not exactly the right tool.

Thank you. I will try to improve my sed knowledge. By the way you are pretty expert with sed.

@Gen2ly big_smile Well, I was a little bit pissed because of this ugly task.

Thank you all guys for your suggestions and your time. wink

Offline

#9 2009-06-10 11:26:22

Cerebral
Forum Fellow
From: Waterloo, ON, CA
Registered: 2005-04-08
Posts: 3,108
Website

Re: Suggestion about string substitution

Oh, also, if the navigation is the exact same on every page, you might want to take a look at Server-Side Includes, which would allow you to write the navigation in one file, and include it in all others.

Offline

#10 2009-06-10 20:19:38

ArchArael
Member
Registered: 2005-06-14
Posts: 504

Re: Suggestion about string substitution

Cerebral wrote:

Oh, also, if the navigation is the exact same on every page, you might want to take a look at Server-Side Includes, which would allow you to write the navigation in one file, and include it in all others.

Wow...that's really nice. I didn't know about that.

I cannot use it in this case because these pages are going to be published on a moodle platform installation. Still it's really interesting option.

Last edited by ArchArael (2009-06-10 20:23:54)

Offline

#11 2009-06-10 20:35:47

Cerebral
Forum Fellow
From: Waterloo, ON, CA
Registered: 2005-04-08
Posts: 3,108
Website

Re: Suggestion about string substitution

Moodle seems to be just a bunch of PHP that needs a webserver, like Apache, to run.

http://docs.moodle.org/en/Installing_AMP

Apache supports SSI - http://www.ssi-developer.net/ssi/

Offline

#12 2009-06-12 15:44:20

ArchArael
Member
Registered: 2005-06-14
Posts: 504

Re: Suggestion about string substitution

I see...so this should be default and should work on any apache installation. The only problem is that the user can also download all files as zipped archive and this is a problem.

I was thinking to create a javascript that re-creates the menu. This should work off and on line. It will not work if the user blocks javascripts.

Offline

Board footer

Powered by FluxBB