Reading part of a line and zero padding a file

jnwebb · 2008-08-06 16:40:44

So I have hundreds of files similar to the following text file:

PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
3006 0.005000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
6.1557 4.8601 2.5948 -2.1577 -5.6472 -13.2543 -11.1966 6
0.7748 8.7211 11.5924 9.5104 9.4419 10.7948 6.0172 7
-5.5127 -4.7920 7.8851 9.5068 2.1157 -14.7572 -13.0219 8
0.0000 0.0000
......... continues for several thousand lines, the file differ at the end...some have two sets of zeros, some three, some four, some 0 etc...

I would like to read each file and:

1. if the number after 3006 is 0.01 then I would like to add all sets of zeros to the end of the line and make it have 4096 lines and change 3006 to 4096 at the top.

2. if the number after 4096 is .005 is would like to run it through a program that will give change it to 0.01 and then do as #1

3. if the number after 3006 is any other number then remove the file.

Any help

Last edited by jnwebb (2008-08-06 16:41:17)

Procyon · 2008-08-06 17:37:10

Is 3006 always line nr 2?

Is that a tab after 3006?

When you say you want it to have 4096 lines do you mean only the data? So that which has a ruler at the right?

How do you know where the data starts? Is it always line nr 3? And do they all start with a tab? Do they need the ruler at the right too?

jnwebb · 2008-08-06 18:57:40

3006 is always line number 2 and there are 2 spaces after it.

I actually want 4096 records...sorry...so I want 586 lines of data total with a ruler at the right. The data always starts on the 3rd line. each data point has 10 spaces in it.

jnwebb · 2008-08-06 19:23:20

oh another thing the number 3006 is different for each file

Procyon · 2008-08-06 19:48:27

Some things I don't understand. Maybe if I put up a file you can tell me what's wrong. I think it'll be more clear for both of us.

awkscript.awk:

#!awk -f
(NR==2) {
if ($2==0.001) { fillfile="yes"; $1=4096} 
else if ($2==0.005) { fillfile="yes"; $1=4096; $2=0.001; system("echo run it through a program here")}
else { system("echo why not rm " FILENAME) }
print $0}

(NR!=2) {print $0}

END { if (fillfile=="yes") {
for (i=NR-1;i<=15;i++) {
print "  0.00000  0.00000 etc ",i }}}

awk -f awkscript.awk data1.txt

NB because you change records in line 2, $0 is recalculated and it will lose the formatting of the blanks.

EDIT: off by one in filling in the for loop

Last edited by Procyon (2008-08-06 19:56:47)

piotroxp · 2008-08-07 08:43:28

Use function pointers (If I've understood the problem correctly).

Set up a function pointer, do the checks and assign the valid function, execute the pointed function.

http://www.newty.de/fpt/fpt.html#defi

Last edited by piotroxp (2008-08-07 08:50:18)

jnwebb · 2008-08-07 14:07:34

First off, thank you for your help. Second, just to reiterate, I am a complete newbie @ this. When I try the above script it only changes the 3006 to 4096 and the .005 to .01. No zeros are added at the end of file.

I also need to run it through the program (which changes the time step so that it automatically comes out w/ .001) before I change it from 3006 to 4096 and then add zeros at the end. It also must have the line numbers at the end.

Procyon · 2008-08-07 14:31:58

Ok then this approach of awk -f script.awk data.txt > newdata.txt is not what you want.

I think it should use sed -i for live editing called from the awk script.

Are the zeroes omitted because it only goes to 15? (I did that for comfortable testing)

And you're saying if "$2" is 0.005 you don't want this script to change it, right?, because yours does it?

I'll post a new script later.

jnwebb · 2008-08-07 15:33:05

Let me restate that total goal of this...for my own good!! If I start with a file such as this:

PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
3006 0.005000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
8.5943 -2.4925

(Each file is different in where they end, what column, row, etc., and in the # @ 3006, and 0.00500)

Step 1: Read the number where 0.00500 is.
IF not 0.00500 or 0.01000
THEN remove file (I already have them saved somewhere else...just remove them)
and print the filename in dumpstep.txt (to keep track of which files are not used)

IF 0.00500
THEN: look at # where 3006 is:
IF > 8192 THEN remove file and print filename in dumprec1.txt
ELSE
I need to run the following from the command line for each file (this will convert the file to .01 timestep):
crsdos4
1
y
filename
filename
y
5
.005
3006 (or whatever # is in that spot in the original file)
7F10.4
2
N
1
NEW
n

IF 0.01000 and # @3006 is > 4096 THEN remove and print filename in dumprec2.txt

Step 2: Now I should only have files that have 0.01000 in the 0.00500 spot and numbers <= 4096 is the 3006 spot.

IF the counter at the rightmost column read 586 or greater...do nothing

ELSE
I need to put zeros into each column so that all records have a total of 586 lines of data and they must
have the counter column. I also need the # @3006 to be changed to 4096... but only after all other steps have been done.

PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
4096 0.01000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
8.5943 -2.4925 0.0000 0.0000 0.0000 0.0000 0.0000 6
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 7
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 8
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 9
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 10
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 11
.
.
.
.
.
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 585
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 586

WOW I feel like a jerk even asking this ??? Thanks to anyone who is nice enough to help

Procyon · 2008-08-07 17:41:57

Well, here is what I came up with, using bash instead of awk.

NB
1. bash can't do decimal comparison. So I used bc. If you don't have bc, then it needs to become a bit more complex with awk perhaps?
2. I suck with logic like || and &&, so check that most thoroughly.
3. When first number is >4096 and second number is 0.01 your program is run, but the file is also deleted. is this right?

Since it might be easy to miss, I did this for easy testing:
1. echo rm $file instead of rm
2. echo write $file to dumpfile1.txt instead of a proper name and echo $file >> propername.txt
3. twice, once in if [[ $nr3 -ge 20 ]] and again in while [[ $nr3 -le 20 ]], it goes to 20 instead of 586.
4. echo debug stuff
5. hexdump instead of crsdos4

#!/bin/bash
for file in data*; do

echo
echo processing $file

#get number 1
nr1=$(sed -ne '2s#^[[:blank:]]*\([0-9]\+\).*#\1#p' $file)

#get number 2
nr2=$(sed -ne '2s#^[[:blank:]]*[0-9]\+[[:blank:]]*\([0-9]\+\.*[0-9]\+\).*#\1#p' $file)

echo 1 = $nr1 2 = $nr2

#error check
if [[ -z $nr1 ]] || [[ -z $nr2 ]]; then echo error processing $file; continue; fi

#nr2 must be 0.01 or 0.005
if [[ $(echo "$nr2 != 0.01" | bc) -eq 1 ]] && [[ $(echo "$nr2 != 0.005" | bc) -eq 1 ]]; then echo rm $file; echo write $file to dumpfile1.txt; continue; fi

#nr2 0.005 can't have nr1 over 8192
if [[ $(echo "$nr2 == 0.005" | bc) -eq 1 ]]; then
if [[ $nr1 -gt 8192 ]]; then echo rm $file; echo write $file to dumpfile2.txt; continue; fi
else echo "1
y
$file
$file
y
5
.005
$nr1
7F10.4
2
N
1
NEW
n" | hexdump -C  #replace with crsdos4
fi

if [[ $(echo "$nr2 == 0.01" | bc) -eq 1 ]] && [[ $nr1 -gt 4096 ]]; then echo rm $file; echo write $file to dumpfile3.txt; continue; fi

#get last number
nr3=$(sed -ne '$s#^.*\b\([0-9]\+\)$#\1#p' $file)

echo last number = $nr3

#error check
if [[ -z $nr3 ]]; then echo error in getting last number; continue; fi

if [[ $nr3 -ge 20 ]]; then echo file had more than 586 records, nothing to do; continue
else
#get ready for next number
nr3=$(($nr3+1))
while [[ $nr3 -le 20 ]]; do
echo "    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000        $nr3"
nr3=$(($nr3+1))
done >> $file

#replace old first number with 4096
sed -i '2s#'"$nr1"'#4096#' $file

fi

done

jnwebb · 2008-08-07 19:13:02

Thank you...can you walk me through this line

nr1=$(sed -ne '2s#^[[:blank:]]*$[0-9]\+$.*#\1#p' $file)

Procyon · 2008-08-07 19:41:36

Sure:

sed -ne '2s#^[[:blank:]]*$[0-9]\+$.*#\1#p' $file

On line 2 do substitution that looks like:
- ^ (I think it's redundant, but it's good practice)
- 0-infinite blanks (not necessary, because the number is at the start, but I thought whatever)
- 1-infinite amount of digits that are remembered
- 0-infinite other characters
- (You could put an excessive $ there too, just like ^. I'd have done it but I forgot)

And change it into
- the first part that was remember

And print it (because it was called with -n, (because we don't want to print everything) it has to be explicit).

jnwebb · 2008-08-07 20:02:32

if i put this in the cmd line:

nr1=$(sed -ne '2s#^[[:blank:]]*$[0-9]\+$.*#\1#p' "test.cth")

for a sample file the same as posted:

PEAK ACCEL. =   307.6032 in/sec2  @  t =   3.07 sec
3006  0.005000     Tstart =  0.00 sec , Tstop = 15.03 sec
   -0.9709   -0.9582   -0.9497   -0.9798    -1.0209   -1.0377    -1.0487         1
   -1.1092   -1.0958   -0.7594   -0.5974    -1.2448   -1.9784    -1.1627         2
   -0.3853   -1.5358   -3.5072   -4.0851    -2.4065   -0.0700     3.1149         3
    4.1113    0.3742     0.6078    3.8566     5.8099    3.8851      0.0263         4
   -1.9638    0.9546     4.1485   -1.0938  -10.0133  -10.8820     0.1033        5
    8.5943   -2.4925

echo $nr1

i get an empty line

Procyon · 2008-08-07 20:32:18

It must be an older sed version that doesn't have [[:___:]] in its regex.

Change this line
nr1=$(sed -ne '2s#^[[:blank:]]*$[0-9]\+$.*#\1#p' $file)

to
nr1=$(sed -ne '2s#^$[0-9]\+$.*#\1#p' $file)

It's not even needed there, because it's always at the start right? otherwise append ' *' to ^

And
nr2=$(sed -ne '2s#^[[:blank:]]*[0-9]\+[[:blank:]]*$[0-9]\+\.*[0-9]\+$.*#\1#p' $file)

to
nr2=$(sed -ne '2s#^[0-9]\+ \+$[0-9]\+\.*[0-9]\+$.*#\1#p' $file)

Because there are only spaces between it right?

Unless it's \+ that gives trouble.
Here are some tests:
sed -e 's#e\+##' <<< "aee" #should give a
sed -e 's#[[:alpha:]]##g' <<< "1a2b3c" #should give 123

jnwebb · 2008-08-07 20:47:16

sed -e 's#e\+##' <<< "aee"
gives blank line

sed -e 's#[[:alpha:]]##g' <<< "1a2b3c" #should give 123
works and give 123

I tried:
nr1=$(sed -ne '2s#^$[0-9]\+$.*#\1#p' $file)
and also got a blank

not sure if it matters but I am on a macbook

Procyon · 2008-08-07 21:12:11

So it's \+

It's easy to replace:

nr1=$(sed -ne '2s#^$[0-9][0-9]*$.*#\1#p' $file)
nr2=$(sed -ne '2s#^[0-9][0-9]* *$[0-9][0-9]*\.*[0-9][0-9]*$.*#\1#p' $file)

But I just realized that for nr2, if it's just 0, it won't match, so I think
nr2=$(sed -ne '2s#^[0-9][0-9]* *$[0-9][0-9]*\.*[0-9]*$.*#\1#p' $file)
is better.

Now let's hope bc works...

jnwebb · 2008-08-07 21:31:13

nr1 works good...nr2 returns "6"

jnwebb · 2008-08-07 21:41:26

also in my text file....there is a space before the 3006 and two spaces before 0.00500

If you could explain through nr2 what each character means that would alleviate me from asking this ? over and over

Procyon · 2008-08-07 22:14:53

As I said, append ' *' to ^ to make it match spaces at the start.

nr2=$(sed -ne '2s#^ *[0-9][0-9]* *$[0-9][0-9]*\.*[0-9]*$.*#\1#p' $file)

Matches
- ^
- 0-inf spaces
- 1-inf digits
- 1-inf spaces
- 1-inf digits & 0-inf dots & 0-inf digits, remembered
- 0-inf other chars

And replace with first remembered

edit:
and likewise for nr1

nr1=$(sed -ne '2s#^ *$[0-9][0-9]*$.*#\1#p' $file)

Last edited by Procyon (2008-08-07 22:16:43)

jnwebb · 2008-08-08 13:28:42

thanks...that works....what about for

nr3=$(sed -ne '$s#^.*\b$[0-9]\+$$#\1#p' $fname)

to get the last number

Procyon · 2008-08-08 14:11:51

nr3=$(sed -ne '$s#^.*\b$[0-9]\+$$#\1#p' $fname)

Matches:
- ^
- 0-inf chars
- blank anchor, you can replace this with ' ' if you want
- 1-inf characters (let's replace \+ below)
- $ (end of line anchor)

nr3=$(sed -ne '$s#^.*\b$[0-9][0-9]*$$#\1#p' $fname)
or if \b doesn't work
nr3=$(sed -ne '$s#^.* $[0-9][0-9]*$$#\1#p' $fname)
or if you have tabs
nr3=$(sed -ne '$s#^.*[[:blank:]]$[0-9][0-9]*$$#\1#p' $fname)

However I just realized, in your post #9 the last line doesn't end with such a number.

Do those files occur?
So you need that row padded with 0.0000 to the end and give it a record number?

jnwebb · 2008-08-08 14:38:05

yes, the last line doesn't end w/ a number in all files

and...each file ends at different columns

Procyon · 2008-08-08 15:34:44

Add this before you check what the last number is.

the awk script returns the last line, renumbered if necessary
It presumes a proper record has 8 columns and the rows before it do not have 8 columns (unless they adjoin)
And if there are blank lines at the end of the file it will cause problems. But that's also true for just the number checker. (a script to fix that shouldn't be too hard)

#fix odd ending files
getlast=$(awk '(NF==8) {indatasection="yes"; lastrecord=$NF}
(indatasection=="yes" && NF!=8) {
lastlinefixed="yes"
for (i=1; i<8; i++) {
if (i<=NF) {
printf "    " $i }
else { printf "    " "0.0000" }
}
print "       " lastrecord+1}
END {
if (indatasection=="yes" && NF==8 ) {
for (i=1; i<=8; i++) {
printf "    " $i }
printf "\n"} }' $file)

echo "replacing last line of $file with $getlast"

#delete last line
sed -i '$d' $file

#adding new line
echo "$getlast" >> $file


#get last number
etc. etc.

Arch Linux

#1 2008-08-06 16:40:44

Reading part of a line and zero padding a file

#2 2008-08-06 17:37:10

Re: Reading part of a line and zero padding a file

#3 2008-08-06 18:57:40

Re: Reading part of a line and zero padding a file

#4 2008-08-06 19:23:20

Re: Reading part of a line and zero padding a file

#5 2008-08-06 19:48:27

Re: Reading part of a line and zero padding a file

#6 2008-08-07 08:43:28

Re: Reading part of a line and zero padding a file

#7 2008-08-07 14:07:34

Re: Reading part of a line and zero padding a file

#8 2008-08-07 14:31:58

Re: Reading part of a line and zero padding a file

#9 2008-08-07 15:33:05

Re: Reading part of a line and zero padding a file

#10 2008-08-07 17:41:57

Re: Reading part of a line and zero padding a file

#11 2008-08-07 19:13:02

Re: Reading part of a line and zero padding a file

#12 2008-08-07 19:41:36

Re: Reading part of a line and zero padding a file

#13 2008-08-07 20:02:32

Re: Reading part of a line and zero padding a file

#14 2008-08-07 20:32:18

Re: Reading part of a line and zero padding a file

#15 2008-08-07 20:47:16

Re: Reading part of a line and zero padding a file

#16 2008-08-07 21:12:11

Re: Reading part of a line and zero padding a file

#17 2008-08-07 21:31:13

Re: Reading part of a line and zero padding a file

#18 2008-08-07 21:41:26

Re: Reading part of a line and zero padding a file

#19 2008-08-07 22:14:53

Re: Reading part of a line and zero padding a file

#20 2008-08-08 13:28:42

Re: Reading part of a line and zero padding a file

#21 2008-08-08 14:11:51

Re: Reading part of a line and zero padding a file

#22 2008-08-08 14:38:05

Re: Reading part of a line and zero padding a file

#23 2008-08-08 15:34:44

Re: Reading part of a line and zero padding a file

Board footer