You are not logged in.
So I have hundreds of files similar to the following text file:
PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
3006 0.005000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
6.1557 4.8601 2.5948 -2.1577 -5.6472 -13.2543 -11.1966 6
0.7748 8.7211 11.5924 9.5104 9.4419 10.7948 6.0172 7
-5.5127 -4.7920 7.8851 9.5068 2.1157 -14.7572 -13.0219 8
0.0000 0.0000
......... continues for several thousand lines, the file differ at the end...some have two sets of zeros, some three, some four, some 0 etc...
I would like to read each file and:
1. if the number after 3006 is 0.01 then I would like to add all sets of zeros to the end of the line and make it have 4096 lines and change 3006 to 4096 at the top.
2. if the number after 4096 is .005 is would like to run it through a program that will give change it to 0.01 and then do as #1
3. if the number after 3006 is any other number then remove the file.
Any help
Last edited by jnwebb (2008-08-06 16:41:17)
Offline
Is 3006 always line nr 2?
Is that a tab after 3006?
When you say you want it to have 4096 lines do you mean only the data? So that which has a ruler at the right?
How do you know where the data starts? Is it always line nr 3? And do they all start with a tab? Do they need the ruler at the right too?
Offline
3006 is always line number 2 and there are 2 spaces after it.
I actually want 4096 records...sorry...so I want 586 lines of data total with a ruler at the right. The data always starts on the 3rd line. each data point has 10 spaces in it.
Offline
oh another thing the number 3006 is different for each file
Offline
Some things I don't understand. Maybe if I put up a file you can tell me what's wrong. I think it'll be more clear for both of us.
awkscript.awk:
#!awk -f
(NR==2) {
if ($2==0.001) { fillfile="yes"; $1=4096}
else if ($2==0.005) { fillfile="yes"; $1=4096; $2=0.001; system("echo run it through a program here")}
else { system("echo why not rm " FILENAME) }
print $0}
(NR!=2) {print $0}
END { if (fillfile=="yes") {
for (i=NR-1;i<=15;i++) {
print " 0.00000 0.00000 etc ",i }}}
awk -f awkscript.awk data1.txt
NB because you change records in line 2, $0 is recalculated and it will lose the formatting of the blanks.
EDIT: off by one in filling in the for loop
Last edited by Procyon (2008-08-06 19:56:47)
Offline
Use function pointers (If I've understood the problem correctly).
Set up a function pointer, do the checks and assign the valid function, execute the pointed function.
http://www.newty.de/fpt/fpt.html#defi
Last edited by piotroxp (2008-08-07 08:50:18)
I invented EM Field Patterns and fixed Feynmann's Diagrams so they are physical.
Offline
First off, thank you for your help. Second, just to reiterate, I am a complete newbie @ this. When I try the above script it only changes the 3006 to 4096 and the .005 to .01. No zeros are added at the end of file.
I also need to run it through the program (which changes the time step so that it automatically comes out w/ .001) before I change it from 3006 to 4096 and then add zeros at the end. It also must have the line numbers at the end.
Offline
Ok then this approach of awk -f script.awk data.txt > newdata.txt is not what you want.
I think it should use sed -i for live editing called from the awk script.
Are the zeroes omitted because it only goes to 15? (I did that for comfortable testing)
And you're saying if "$2" is 0.005 you don't want this script to change it, right?, because yours does it?
I'll post a new script later.
Offline
Let me restate that total goal of this...for my own good!! If I start with a file such as this:
PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
3006 0.005000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
8.5943 -2.4925
(Each file is different in where they end, what column, row, etc., and in the # @ 3006, and 0.00500)
Step 1: Read the number where 0.00500 is.
IF not 0.00500 or 0.01000
THEN remove file (I already have them saved somewhere else...just remove them)
and print the filename in dumpstep.txt (to keep track of which files are not used)
IF 0.00500
THEN: look at # where 3006 is:
IF > 8192 THEN remove file and print filename in dumprec1.txt
ELSE
I need to run the following from the command line for each file (this will convert the file to .01 timestep):
crsdos4
1
y
filename
filename
y
5
.005
3006 (or whatever # is in that spot in the original file)
7F10.4
2
N
1
NEW
n
IF 0.01000 and # @3006 is > 4096 THEN remove and print filename in dumprec2.txt
Step 2: Now I should only have files that have 0.01000 in the 0.00500 spot and numbers <= 4096 is the 3006 spot.
IF the counter at the rightmost column read 586 or greater...do nothing
ELSE
I need to put zeros into each column so that all records have a total of 586 lines of data and they must
have the counter column. I also need the # @3006 to be changed to 4096... but only after all other steps have been done.
PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
4096 0.01000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
8.5943 -2.4925 0.0000 0.0000 0.0000 0.0000 0.0000 6
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 7
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 8
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 9
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 10
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 11
.
.
.
.
.
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 585
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 586
WOW I feel like a jerk even asking this ??? Thanks to anyone who is nice enough to help
Offline
Well, here is what I came up with, using bash instead of awk.
NB
1. bash can't do decimal comparison. So I used bc. If you don't have bc, then it needs to become a bit more complex with awk perhaps?
2. I suck with logic like || and &&, so check that most thoroughly.
3. When first number is >4096 and second number is 0.01 your program is run, but the file is also deleted. is this right?
Since it might be easy to miss, I did this for easy testing:
1. echo rm $file instead of rm
2. echo write $file to dumpfile1.txt instead of a proper name and echo $file >> propername.txt
3. twice, once in if [[ $nr3 -ge 20 ]] and again in while [[ $nr3 -le 20 ]], it goes to 20 instead of 586.
4. echo debug stuff
5. hexdump instead of crsdos4
#!/bin/bash
for file in data*; do
echo
echo processing $file
#get number 1
nr1=$(sed -ne '2s#^[[:blank:]]*\([0-9]\+\).*#\1#p' $file)
#get number 2
nr2=$(sed -ne '2s#^[[:blank:]]*[0-9]\+[[:blank:]]*\([0-9]\+\.*[0-9]\+\).*#\1#p' $file)
echo 1 = $nr1 2 = $nr2
#error check
if [[ -z $nr1 ]] || [[ -z $nr2 ]]; then echo error processing $file; continue; fi
#nr2 must be 0.01 or 0.005
if [[ $(echo "$nr2 != 0.01" | bc) -eq 1 ]] && [[ $(echo "$nr2 != 0.005" | bc) -eq 1 ]]; then echo rm $file; echo write $file to dumpfile1.txt; continue; fi
#nr2 0.005 can't have nr1 over 8192
if [[ $(echo "$nr2 == 0.005" | bc) -eq 1 ]]; then
if [[ $nr1 -gt 8192 ]]; then echo rm $file; echo write $file to dumpfile2.txt; continue; fi
else echo "1
y
$file
$file
y
5
.005
$nr1
7F10.4
2
N
1
NEW
n" | hexdump -C #replace with crsdos4
fi
if [[ $(echo "$nr2 == 0.01" | bc) -eq 1 ]] && [[ $nr1 -gt 4096 ]]; then echo rm $file; echo write $file to dumpfile3.txt; continue; fi
#get last number
nr3=$(sed -ne '$s#^.*\b\([0-9]\+\)$#\1#p' $file)
echo last number = $nr3
#error check
if [[ -z $nr3 ]]; then echo error in getting last number; continue; fi
if [[ $nr3 -ge 20 ]]; then echo file had more than 586 records, nothing to do; continue
else
#get ready for next number
nr3=$(($nr3+1))
while [[ $nr3 -le 20 ]]; do
echo " 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 $nr3"
nr3=$(($nr3+1))
done >> $file
#replace old first number with 4096
sed -i '2s#'"$nr1"'#4096#' $file
fi
done
Offline
Thank you...can you walk me through this line
nr1=$(sed -ne '2s#^[[:blank:]]*\([0-9]\+\).*#\1#p' $file)
Offline
Sure:
sed -ne '2s#^[[:blank:]]*\([0-9]\+\).*#\1#p' $file
On line 2 do substitution that looks like:
- ^ (I think it's redundant, but it's good practice)
- 0-infinite blanks (not necessary, because the number is at the start, but I thought whatever)
- 1-infinite amount of digits that are remembered
- 0-infinite other characters
- (You could put an excessive $ there too, just like ^. I'd have done it but I forgot)
And change it into
- the first part that was remember
And print it (because it was called with -n, (because we don't want to print everything) it has to be explicit).
Offline
if i put this in the cmd line:
nr1=$(sed -ne '2s#^[[:blank:]]*\([0-9]\+\).*#\1#p' "test.cth")
for a sample file the same as posted:
PEAK ACCEL. = 307.6032 in/sec2 @ t = 3.07 sec
3006 0.005000 Tstart = 0.00 sec , Tstop = 15.03 sec
-0.9709 -0.9582 -0.9497 -0.9798 -1.0209 -1.0377 -1.0487 1
-1.1092 -1.0958 -0.7594 -0.5974 -1.2448 -1.9784 -1.1627 2
-0.3853 -1.5358 -3.5072 -4.0851 -2.4065 -0.0700 3.1149 3
4.1113 0.3742 0.6078 3.8566 5.8099 3.8851 0.0263 4
-1.9638 0.9546 4.1485 -1.0938 -10.0133 -10.8820 0.1033 5
8.5943 -2.4925
echo $nr1
i get an empty line
Offline
It must be an older sed version that doesn't have [[:___:]] in its regex.
Change this line
nr1=$(sed -ne '2s#^[[:blank:]]*\([0-9]\+\).*#\1#p' $file)
to
nr1=$(sed -ne '2s#^\([0-9]\+\).*#\1#p' $file)
It's not even needed there, because it's always at the start right? otherwise append ' *' to ^
And
nr2=$(sed -ne '2s#^[[:blank:]]*[0-9]\+[[:blank:]]*\([0-9]\+\.*[0-9]\+\).*#\1#p' $file)
to
nr2=$(sed -ne '2s#^[0-9]\+ \+\([0-9]\+\.*[0-9]\+\).*#\1#p' $file)
Because there are only spaces between it right?
Unless it's \+ that gives trouble.
Here are some tests:
sed -e 's#e\+##' <<< "aee" #should give a
sed -e 's#[[:alpha:]]##g' <<< "1a2b3c" #should give 123
Offline
sed -e 's#e\+##' <<< "aee"
gives blank line
sed -e 's#[[:alpha:]]##g' <<< "1a2b3c" #should give 123
works and give 123
I tried:
nr1=$(sed -ne '2s#^\([0-9]\+\).*#\1#p' $file)
and also got a blank
not sure if it matters but I am on a macbook
Offline
So it's \+
It's easy to replace:
nr1=$(sed -ne '2s#^\([0-9][0-9]*\).*#\1#p' $file)
nr2=$(sed -ne '2s#^[0-9][0-9]* *\([0-9][0-9]*\.*[0-9][0-9]*\).*#\1#p' $file)
But I just realized that for nr2, if it's just 0, it won't match, so I think
nr2=$(sed -ne '2s#^[0-9][0-9]* *\([0-9][0-9]*\.*[0-9]*\).*#\1#p' $file)
is better.
Now let's hope bc works...
Offline
nr1 works good...nr2 returns "6"
Offline
also in my text file....there is a space before the 3006 and two spaces before 0.00500
If you could explain through nr2 what each character means that would alleviate me from asking this ? over and over
Offline
As I said, append ' *' to ^ to make it match spaces at the start.
nr2=$(sed -ne '2s#^ *[0-9][0-9]* *\([0-9][0-9]*\.*[0-9]*\).*#\1#p' $file)
Matches
- ^
- 0-inf spaces
- 1-inf digits
- 1-inf spaces
- 1-inf digits & 0-inf dots & 0-inf digits, remembered
- 0-inf other chars
And replace with first remembered
edit:
and likewise for nr1
nr1=$(sed -ne '2s#^ *\([0-9][0-9]*\).*#\1#p' $file)
Last edited by Procyon (2008-08-07 22:16:43)
Offline
thanks...that works....what about for
nr3=$(sed -ne '$s#^.*\b\([0-9]\+\)$#\1#p' $fname)
to get the last number
Offline
nr3=$(sed -ne '$s#^.*\b\([0-9]\+\)$#\1#p' $fname)
Matches:
- ^
- 0-inf chars
- blank anchor, you can replace this with ' ' if you want
- 1-inf characters (let's replace \+ below)
- $ (end of line anchor)
nr3=$(sed -ne '$s#^.*\b\([0-9][0-9]*\)$#\1#p' $fname)
or if \b doesn't work
nr3=$(sed -ne '$s#^.* \([0-9][0-9]*\)$#\1#p' $fname)
or if you have tabs
nr3=$(sed -ne '$s#^.*[[:blank:]]\([0-9][0-9]*\)$#\1#p' $fname)
However I just realized, in your post #9 the last line doesn't end with such a number.
Do those files occur?
So you need that row padded with 0.0000 to the end and give it a record number?
Offline
yes, the last line doesn't end w/ a number in all files
and...each file ends at different columns
Offline
Add this before you check what the last number is.
the awk script returns the last line, renumbered if necessary
It presumes a proper record has 8 columns and the rows before it do not have 8 columns (unless they adjoin)
And if there are blank lines at the end of the file it will cause problems. But that's also true for just the number checker. (a script to fix that shouldn't be too hard)
#fix odd ending files
getlast=$(awk '(NF==8) {indatasection="yes"; lastrecord=$NF}
(indatasection=="yes" && NF!=8) {
lastlinefixed="yes"
for (i=1; i<8; i++) {
if (i<=NF) {
printf " " $i }
else { printf " " "0.0000" }
}
print " " lastrecord+1}
END {
if (indatasection=="yes" && NF==8 ) {
for (i=1; i<=8; i++) {
printf " " $i }
printf "\n"} }' $file)
echo "replacing last line of $file with $getlast"
#delete last line
sed -i '$d' $file
#adding new line
echo "$getlast" >> $file
#get last number
etc. etc.
Offline