You are not logged in.

#1 2011-10-30 15:34:08

Jankosevic
Member
Registered: 2008-07-06
Posts: 82

Edit text files

Hi,

I am looking for a solution of a maybe not so difficult problem.
I have hundreds of structured text files containing data. Now I want to look for a specific string and shift all lines beneath it to a fixed line downwards. For instance, the keyword "#key" is in line 40 and I want to move the lines 41-49 to 80-88.

Do you have an idea how I can achieve that?
Thx!

Offline

#2 2011-10-30 16:03:54

lagagnon
Member
From: an Island in the Pacific...
Registered: 2009-12-10
Posts: 1,087
Website

Re: Edit text files

Personally I would write a bash script to do that using "sed" the stream editor. If the structured text files are all columnar based it might be easier to use "awk" rather  than "sed", but you would need to make that decision yourself, based on the structure of your text files. Either way you should be able to do that quite quickly once you've become even just a little bit familiar with either sed or awk.

For example, to search recursively through directories, looking in all the files for a particular string, and to replace that string with something else, this command should work:

find ./ -type f -exec sed -i ‘s/string1/string2/’ {} \;

So combining "find" with "sed" or "awk" should do what you want.


Philosophy is looking for a black cat in a dark room. Metaphysics is looking for a black cat in a dark room that isn't there. Religion is looking for a black cat in a dark room that isn't there and shouting "I found it!". Science is looking for a black cat in a dark room with a flashlight.

Offline

#3 2011-10-30 16:06:53

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Edit text files

If you want to do the edits by hand, use vim.

Offline

#4 2011-10-30 16:50:18

Jankosevic
Member
Registered: 2008-07-06
Posts: 82

Re: Edit text files

Thank you!
As it should work automatically the sed method sounds promising. However, after reading a bit on sed I am not sure yet how I can do it.
I want to move parts of my text files to a specific line so that the same data starts at the same row in all files in order to make them comparable.
But I cannot find a function for that.

Last edited by Jankosevic (2011-10-30 16:50:35)

Offline

#5 2011-10-30 17:25:54

quigybo
Member
Registered: 2009-01-15
Posts: 223

Re: Edit text files

printf '%s\n' {a..k} | sed '
    # search for something
    /c/ {
        # move to the next line, writing out the current line
        n
        # read in the next 2 lines, appending to the pattern space
        N; N;
        # copy these 3 lines to the hold space
        h
        # clear the current pattern space
        d
    }
    # append the lines in hold space after a particular line
    9G
'

This should be enough to get you started, for more info on the commands used see here. I'm not sure if it is possible to tell sed to run a command n times, so you may just need to write the N command 8 times (for your situation).

Offline

#6 2011-10-30 19:37:08

rockin turtle
Member
From: Montana, USA
Registered: 2009-10-22
Posts: 227

Re: Edit text files

I believe the following script will do what you want. It creates a subdirectory named 'tmp' in your current directory and modifies the choosen files into that subdirectory. The original files will be unmodified. If 'tmp' already exists in the current directory, this script will exit without doing anything.

#!/bin/bash

# this script assumes you want to insert 39 blank lines after each occurance of the line containing the key regex.

nl=$'\\\n'
unset insert
for i in $(seq 1 39)
do
	insert="$insert$nl"
done

regex="$1"
shift

# Make a subdirectory named 'tmp' in the current directory. Exit if directory already exists.
mkdir ./tmp 2>/dev/null || { echo "unable to create temporary directory './tmp'" && exit 1; }

# create a copy of each file with inserted lines into the 'tmp' subdirectory
for f in "$@"; do
	sed -e "/$regex/ a$insert" "$f" > "./tmp/$f"
done

If your text files all end in .xyz, for example, you could invoke it like this:

$ ./scriptname '#key' *.xyz

If you want the key to be a word by itself, you could use:

$ ./scriptname '\B#key\b' *.xyz

(This wouldn't match strings like 'x#key' or '#keyx' or 'x#keyx')

Regular expressions have a lot of gotchas that you need to be aware of.  The regex '\B#key\b' will not match the string 'x#key' but will match '##key'.

If you like the results contained in ./tmp/*, you can move the files back (and overwrite the original files in the process) by doing:

$ mv tmp/* .
$ rmdir tmp

Offline

#7 2011-10-30 19:48:43

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Edit text files

If the intent is just to compare files, wouldn't "diff" (with appropriate options enabled) do what you want?

Another possibility, if you really insist on inserting the whitespace, would involve formulating a patch file that inserts the extra lines into any file you specify. Now apply it to a list of files matching your criterion. You could generate the list using, say, a grep/awk pipeline like this:

grep --recursive -c '#key' ./ | grep :40$ | awk -F':' '{print $1}'

Offline

#8 2011-10-30 19:58:01

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Edit text files

Another possibility:

Ed: line and word substitutions.

Offline

#9 2011-10-30 21:26:27

quigybo
Member
Registered: 2009-01-15
Posts: 223

Re: Edit text files

In my previous reply I thought you wanted to move the text relative to the rest of the file, if you just want to insert whitespace something like this will suffice:

printf '%s\n' {a..k} | awk '
    # print every line unaltered
    { print }
    # search for something
    /c/ {
        # insert enough blank lines such that the next line becomes l.10
        for (i = NR; i < 9; ++i)
            printf "\n"
    }
'

To avoid any X-Y problems, could you tell us why you need stuff on a specific line number?

Offline

#10 2011-10-30 21:51:33

Jankosevic
Member
Registered: 2008-07-06
Posts: 82

Re: Edit text files

Thanks a lot for all your effort!
Apparently there are a lot of things possible and I will try to understand and alter your code in order to fit my needs.

In fact, I want to read those data files (txt) in another program. The structure of one file looks like this, for instance:

#Section1
Field1,Field2,Field3
Value1,Value1,Value1
Value2,Value2,Value2
Value3,Value3,Value3

#Section2
Field1,Field2,Field3
Value1,Value1,Value1
Value2,Value2,Value2
Value3,Value3,Value3

#Section3
Field1,Field2,Field3
Value1,Value1,Value1
Value2,Value2,Value2
Value3,Value3,Value3

The number of rows of Section2 varies from each file to each file. So in order to read all hundreds of files at once in my app Section3 needs to be always at a specific line. So it is not useful to add the same amount of blank lines to each file.

Last edited by Jankosevic (2011-10-30 21:52:31)

Offline

#11 2011-10-30 22:02:27

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Edit text files

If you're trying to parse text data, maybe flex+bison could help. Just come up with a basic grammar that describes the format of your data files, and voila.

Offline

#12 2011-10-30 22:26:17

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Edit text files

Jankosevic wrote:

In fact, I want to read those data files (txt) in another program.
The number of rows of Section2 varies from each file to each file. So in order to read all hundreds of files at once in my app Section3 needs to be always at a specific line. So it is not useful to add the same amount of blank lines to each file.

Also, I think quigybo is on the right track, by pointing out the possibility you have an X-Y problem.

This stuff about specific line numbers is just an intermediate step that you think will accomplish the goal you want. First you talk about wanting to compare files, now you say you want to read these files from another program. What are you actually doing?

Even if this is work related and you can't go into specifics due to sensitivity issues, you could still be doing a lot more to make your problem and goals clearer. We can't help you if you keep giving us crap.

Offline

#13 2011-10-30 22:54:35

Jankosevic
Member
Registered: 2008-07-06
Posts: 82

Re: Edit text files

/dev/zero wrote:

First you talk about wanting to compare files, now you say you want to read these files from another program. What are you actually doing?

I said the files should be comparable which is necessary if I want to read all those text files in a program which reads data from a specific line number downwards. I thought I made it clear, but apparently not. Sorry for that.
Moreover it was not my intention to waste your time or make my work unclear by "hiding information".I just asked for an idea how it would be possible so that I can do it by myself.

To sum it up again:
- I got text files with the structure described above
- my program reads the data (sections) based on a given line number
- the sections do not always have the same amount of rows
-> thus, section3 is sometimes at line number 40, sometimes on 43 or 45.
- to make my program read the text files correctly I think the best approach is to set Section3 in all files to the same line number

In case I was still to vague in my explanation please ask or just ignore my request. I am glad for any help, but don't expect any.
Thx! smile

Offline

#14 2011-10-30 23:23:15

rockin turtle
Member
From: Montana, USA
Registered: 2009-10-22
Posts: 227

Re: Edit text files

If you just want to get at the data between section 2 and 3 (for instance), you can do:

$ sed -ne '/^#Section2/,/^#Section/ p' file

This will also output the '#Section2' and '#Section3' lines.  If you don't want that you could do

$ sed -ne '/^#Section2/,/^#Section/ p' file | sed -e '/^#Section/ d'

Then use your program to analyze the output data instead of reading from specific line numbers.

Offline

#15 2011-10-30 23:24:24

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Edit text files

Jankosevic wrote:

Moreover it was not my intention to waste your time or make my work unclear by "hiding information".I just asked for an idea how it would be possible so that I can do it by myself.

You're not wasting my time if I get to tackle an interesting problem. Everyone learns, everyone wins.

Part of tackling interesting problems, though, is ensuring that you choose the most efficient and stable solution for the actual underlying problem.

I just have this feeling that you're making things more complicated than they need to be, but it's hard to be sure because I don't really have a firm grasp of what your underlying goal is.

Offline

#16 2011-10-31 00:04:04

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 20,324

Re: Edit text files

Moderator comment:

Jankosevic wrote:

I am looking for a solution of a maybe not so difficult problem.
I have hundreds of structured text files containing data. Now I want to look for a specific string and shift all lines beneath it to a fixed line downwards. For instance, the keyword "#key" is in line 40 and I want to move the lines 41-49 to 80-88. ...Do you have an idea how I can achieve that?

I hate to pour water on this, but I shall.  Is there any chance this is a homework problem?  The problem statement strikes me as such.  Tell us a bit more about the nature of these files to convince me otherwise.


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Online

#17 2011-10-31 00:49:00

denisfalqueto
Member
From: ES, Brazil
Registered: 2006-03-24
Posts: 197

Re: Edit text files

Jankosevic wrote:

- my program reads the data (sections) based on a given line number
- the sections do not always have the same amount of rows
-> thus, section3 is sometimes at line number 40, sometimes on 43 or 45.
- to make my program read the text files correctly I think the best approach is to set Section3 in all files to the same line number

If the program was written by you, there is your problem. The program shouldn't read lines at specific lines. You should parse for specific points in the text to start processing. I would say that AWK is perfect for that. It is even fun to write AWK scripts.


Satisfied users don't rant, so you'll never know how many of us there are.

Offline

#18 2011-10-31 20:05:24

Jankosevic
Member
Registered: 2008-07-06
Posts: 82

Re: Edit text files

The program is written in IDL and uses the READ_ASCII and ASCII_TEMPLATE functions. The option DATA_START=lines_to_skip is set to some specific lines for the three sections.
My intention is not to justify myself over and over again. I just asked for a little help on a (maybe) simple problem.

If it is not possible to align all Sections in all text files to the same line number then I just need to see what else is possible.
But, in peace, thx for all the effort by all of you thinking about this problem!

Offline

#19 2011-10-31 20:08:59

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Edit text files

Jankosevic wrote:

The program is written in IDL and uses the READ_ASCII and ASCII_TEMPLATE functions. The option DATA_START=lines_to_skip is set to some specific lines for the three sections.

See, that's what I'm talking about. This information should have gone in the first post, not the last.

I hope you found some of the tips useful anyway.

Offline

#20 2011-11-01 00:12:18

denisfalqueto
Member
From: ES, Brazil
Registered: 2006-03-24
Posts: 197

Re: Edit text files

Jankosevic wrote:

The program is written in IDL and uses the READ_ASCII and ASCII_TEMPLATE functions. The option DATA_START=lines_to_skip is set to some specific lines for the three sections.
My intention is not to justify myself over and over again. I just asked for a little help on a (maybe) simple problem.

If it is not possible to align all Sections in all text files to the same line number then I just need to see what else is possible.
But, in peace, thx for all the effort by all of you thinking about this problem!

AWK is perfect for that. See below an example. It may not solve completely your problem, but can set you in the right direction:

#/bin/awk -f

# If the line starts with 'Section 3'
/^Section 3/ {
  # If the number of records read until now is less than your desired number
  if (FNR < 80) {
    # Lets add empty lines to the output
    i = 80 - FNR; 
    while (i > 0) {
      print "\n";
       i--;
    }
  };
  # Print the current line after all
  print $0
}
# The common case: print the line
{print $0}

You can see that 80 is the number that Section 3 should start. You should replace it with your desired starting line number.


Satisfied users don't rant, so you'll never know how many of us there are.

Offline

#21 2011-11-01 10:52:28

quigybo
Member
Registered: 2009-01-15
Posts: 223

Re: Edit text files

@denisfalqueto: you should use printf instead of print, as print already appends a newline (check your output with `nl -ba`). Actually the awk I posted above does the same thing, but you are right that FNR should be used instead of NR, since the OP probably wants to process multiple files at a time.

@OP: if you want to go instead the route of one file per section similar to rockin turtle's suggestion, then you may want to look into csplit.

Offline

#22 2011-11-01 12:41:07

denisfalqueto
Member
From: ES, Brazil
Registered: 2006-03-24
Posts: 197

Re: Edit text files

@quigybo: good catch. The final print $0 in the first block is not needed, because AWK will execute both blocks anyway.


Satisfied users don't rant, so you'll never know how many of us there are.

Offline

Board footer

Powered by FluxBB