You are not logged in.

#1 2013-09-01 17:34:52

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Text file - replace a pattern with part of the previous line? -SOLVED

Hello, I'm stuck with sed/awk/grep...

So I have a file with lines like this:

Nice bunch of words <STUFF> <STUFF> <IMPORTANT_DELIMITER_TYPE_1>  <STUFF> <IMPORTANT_DELIMITER_TYPE_2> <STUFF>
Even Nicer bunch of Words <STUFF> <IMPORTANT_DELIMITER_TYPE_1> <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words <STUFF> <STUFF> <STUFF><IMPORTANT_DELIMITER_TYPE_1><STUFF>

Then, I want to move the "important delimiters" to new lines (might be better not to do this in fact...)

Nice bunch of words <STUFF> <STUFF> 
<IMPORTANT_DELIMITER_TYPE_1>  <STUFF> 
<IMPORTANT_DELIMITER_TYPE_2> <STUFF>
Even Nicer bunch of Words <STUFF> 
<IMPORTANT_DELIMITER_TYPE_1> <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words <STUFF> <STUFF> <STUFF>
<IMPORTANT_DELIMITER_TYPE_1><STUFF>

And finally, I want to replace the important delimiters with the content of the line they came from originally, up to the first angle bracket:

Nice bunch of words <STUFF> <STUFF> 
Nice bunch of words  <STUFF> 
Nice bunch of words <STUFF>
Even Nicer bunch of Words <STUFF> 
Even Nicer bunch of Words <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words<STUFF>

How can I accomplish this using absolutely anything at all that doesn't involve too much manual effort (the file is about 30,000 lines of this stuff)

Thanks !

Last edited by darkbeanies (2013-09-01 19:18:29)

Offline

#2 2013-09-01 17:49:35

Xyne
Forum Fellow
Registered: 2008-08-03
Posts: 6,965
Website

Re: Text file - replace a pattern with part of the previous line? -SOLVED

Without knowing what <STUFF>, <IMPORTANT_DELIMITER_TYPE_*>, etc. are, or where the first angle bracket is (or whether it is left or right), there isn't enough information to propose a solution. Please post a concrete example.

That said, I suspect it would be trivial to load the entire file in a script (e.g. Python, Perl) and use some simple replacements to achieve what you want (unless those 30k lines are ridiculously long). I don't doubt that this can be done with awk and/or sed, but I prefer scripts for anything beyond simple one-liners. Of course, a wild awk/sed wizard may appear and revel us with the hidden beauty of his arcane knowledge.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#3 2013-09-01 18:03:26

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Re: Text file - replace a pattern with part of the previous line? -SOLVED

Okay, basically important delimiter type 1,2,etc... are KNOWN.  They are called <I> <II> <III> <IV> etc. so I can just search for that...

<STUFF> is irrelevant for this purpose I think.

All lines start with some bunch of words, which are ALWAYS followed by a < angle bracket.  So I can just grab ALL words, before the FIRST < angle bracket.

So, I basically want to replace all <I>, <II>, <III> etc.  with a new line, AND any and all words that occur on the original line, before that first angle bracket.

Does that make sense yet?


Also, the first code block pretty much IS a concrete example, if you just replace the <important delimiters> with roman numerals.  I thought that would help clarify things...

Last edited by darkbeanies (2013-09-01 18:12:30)

Offline

#4 2013-09-01 18:26:14

Xyne
Forum Fellow
Registered: 2008-08-03
Posts: 6,965
Website

Re: Text file - replace a pattern with part of the previous line? -SOLVED

"<STUFF>" is not irrelevant. In your original example

Nice bunch of words <STUFF> <STUFF> <IMPORTANT_DELIMITER_TYPE_1>  <STUFF> <IMPORTANT_DELIMITER_TYPE_2> <STUFF>
Even Nicer bunch of Words <STUFF> <IMPORTANT_DELIMITER_TYPE_1> <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words <STUFF> <STUFF> <STUFF><IMPORTANT_DELIMITER_TYPE_1><STUFF>

you want to break the lines along the delimiters:

Nice bunch of words <STUFF> <STUFF> 
<IMPORTANT_DELIMITER_TYPE_1>  <STUFF> 
<IMPORTANT_DELIMITER_TYPE_2> <STUFF>
Even Nicer bunch of Words <STUFF> 
<IMPORTANT_DELIMITER_TYPE_1> <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words <STUFF> <STUFF> <STUFF>
<IMPORTANT_DELIMITER_TYPE_1><STUFF>

and then you say that you want to replace the delimiters with the contents before the first angle bracket:

Nice bunch of words <STUFF> <STUFF> 
Nice bunch of words  <STUFF> 
Nice bunch of words <STUFF>
Even Nicer bunch of Words <STUFF> 
Even Nicer bunch of Words <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words <STUFF> <STUFF> <STUFF>
Wonderful bunch of Words<STUFF>

but you have clearly made a distinction between the "nice bunch of words" and "<STUFF>", otherwise the output would have been

Nice bunch of words <STUFF> <STUFF> 
Nice bunch of words <STUFF> <STUFF>  <STUFF> 
...

So, do you want everything up to  the first delimiter, or do you want everything up to <STUFF> in the replacement? If you only want the "nicer words" then you need some way to distinguish between them and "stuff" programmatically.


edit
Here's a trivial script that will split the lines along the delimiters and replace them with the contents of the line before the first delimiter:

#!/usr/bin/env python3

import re
import sys

def main(args=None):
  for line in sys.stdin:
    # Trim trailing newline.
    line = line.rstrip('\n')
    # Split by delimiters.
    parts = re.split(r'<[^>]+>', line)
    print(parts[0])
    for p in parts[1:]:
      print(parts[0] + p)

if __name__ == '__main__':
  try:
    main()
  except (KeyboardInterrupt, BrokenPipeError):
    pass

Usage

path/to/script < /path/to/input file

Last edited by Xyne (2013-09-01 18:34:55)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#5 2013-09-01 18:32:39

2ManyDogs
Forum Fellow
Registered: 2012-01-15
Posts: 4,648

Re: Text file - replace a pattern with part of the previous line? -SOLVED

Do have a basic understanding of any script language (bash, python, perl, etc) or of awk or sed? Are you asking us to write a script for you, or just point you in the right direction? If you'd like pointers, it would help if we knew what tools you already understand, even if only at a basic level.

Offline

#6 2013-09-01 18:39:19

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Re: Text file - replace a pattern with part of the previous line? -SOLVED

Let's try again with a simpler example...

WORDS <CRAP> <DELIMITER> <MORE CRAP>

desired output:

WORDS <CRAP> 
WORDS <MORE CRAP>

Any good?

Offline

#7 2013-09-01 18:49:44

Xyne
Forum Fellow
Registered: 2008-08-03
Posts: 6,965
Website

Re: Text file - replace a pattern with part of the previous line? -SOLVED

darkbeanies wrote:

Let's try again with a simpler example...

WORDS <CRAP> <DELIMITER> <MORE CRAP>

desired output:

WORDS <CRAP> 
WORDS <MORE CRAP>

Any good?

No. The script can't magically tell the difference between WORDS and <CRAP>. What defines a sequence of bytes as "<CRAP>"?


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#8 2013-09-01 18:52:03

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Re: Text file - replace a pattern with part of the previous line? -SOLVED

@Xyne, i tried your script, but it made my mouse cursor turn to crosshairs and the bash said:

./script.sh: line 7: syntax error near unexpected token `('
./script.sh: line 7: `def main(args=None):'

Offline

#9 2013-09-01 18:53:44

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Re: Text file - replace a pattern with part of the previous line? -SOLVED

Xyne wrote:

What defines a sequence of bytes as "<CRAP>"?

It is not before the first < angle bracket, and it is not called <DELIMITER>

???

Offline

#10 2013-09-01 19:03:06

zorro
Member
Registered: 2011-11-18
Posts: 47

Re: Text file - replace a pattern with part of the previous line? -SOLVED

If you can replace your roman numeral delimeters with a single character delimeter (maybe using search/replace in an editor), your input could look like this:

Nice bunch of words <STUFF1>!<STUFF2>!<STUFF3>!<STUFF4>
Even Nicer bunch of Words <STUFF5>!<STUFF6>!<STUFF7>!<STUFF8>
Wonderful bunch of Words <STUFF9>!<STUFF10>!<STUFF11>!<STUFF12>

I have used the '!' char, you will need to choose one that doesn't exist in the input file.

Running the following sed script

sed -r ':loop; s/([^<]*)<([^!]*)!/\1<\2\n\1/;t loop' < input.txt

Generates

Nice bunch of words <STUFF1>
Nice bunch of words <STUFF2>
Nice bunch of words <STUFF3>
Nice bunch of words <STUFF4>
Even Nicer bunch of Words <STUFF5>
Even Nicer bunch of Words <STUFF6>
Even Nicer bunch of Words <STUFF7>
Even Nicer bunch of Words <STUFF8>
Wonderful bunch of Words <STUFF9>
Wonderful bunch of Words <STUFF10>
Wonderful bunch of Words <STUFF11>
Wonderful bunch of Words <STUFF12>

Last edited by zorro (2013-09-01 19:03:58)

Offline

#11 2013-09-01 19:07:54

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Re: Text file - replace a pattern with part of the previous line? -SOLVED

You the man ZORRO!!!!   YOU the  ****ING MAAAAAAAAAAN!!!!!!!!!!!!!!!!!!!!!!!!!!

Offline

#12 2013-09-01 19:13:38

Xyne
Forum Fellow
Registered: 2008-08-03
Posts: 6,965
Website

Re: Text file - replace a pattern with part of the previous line? -SOLVED

darkbeanies wrote:

@Xyne, i tried your script, but it made my mouse cursor turn to crosshairs and the bash said:

./script.sh: line 7: syntax error near unexpected token `('
./script.sh: line 7: `def main(args=None):'

Did you omit "#!/usr/bin/env python3" from the top of the file?
Do you have the python package installed?


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#13 2013-09-01 19:17:50

darkbeanies
Member
Registered: 2009-01-14
Posts: 142

Re: Text file - replace a pattern with part of the previous line? -SOLVED

Yeah, I have python, and I didn't omit anything.  Possibly python could be out of date/out of sync I suppose.  Anyway, that sed one-liner works great.  Thanks a lot, I've been puzzling/googling for hours now...

Offline

Board footer

Powered by FluxBB