You are not logged in.

#1 2007-06-10 11:30:56

STUDENT
Member
Registered: 2007-06-10
Posts: 7

AWK help - beginner

Hi, I'm in dire need of some help for AWK. I'm a college student and my statistics professor decided he'd teach AWK in the last two days of our class, even when it's not programming class and we don't even have computers in class to experiment. Anyway, I tried looking at AWK tutorials, but it doesn't necessarily teach me in the sequence our professor taught it. This is the code our professor wrote on the board for printing lines. I can't figure the thing out. If anyone can interpret it or correct me (if I miscopied the code), thank you.

Abe M 70
Bea F 65
Cathy F 67
Dave M 69

{if ($2 == "M"){
s=s+$3
n=n+1
}END{
print s, n, s/n, "avg height"
}

I have trouble understand the "s=s+$3" and "n=n+1" lines. What do those mean? Also, I have trouble understanding the "print" line. Obviously it means to print, but what would it print? "s" "n" and "s/n?" What is the "s/n" line?

Here's another one.

Abe M 70
Bea F 65
Cathy F 67
Dave M 69

{
if ($N / [fF] /) {
fsum = fsum + 3
fnum = fsum + 1
}
if ($2N/[mM]){
msum = msum + $3
mnum = mnum + 1
}
}

What does "if ($N / [fF] /)" or "($2N/[mM])" mean?

THX


I've studied HTML, Javascript and CSS in the past, and I like to know what each line or code does. I'm very OCD with programming, but it just frustrates me that the professor is simply teaching it this way WITHOUT (i repeat) a computer. This has got to be, by far, the most knuckleheaded professor I've probably taken in this college. Thx for the help.

Last edited by STUDENT (2007-06-10 11:37:24)

Offline

#2 2007-06-10 11:40:15

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: AWK help - beginner

n = n + 1: n is a variable holding integer value, it's initialized with 0. on each line where 2nd column equals "M" (look the if statement) n is increased by one
s = s + $3: s is a variable holding integer value, it's initialized with 0. on each line where 2nd column equals "M" (look the if statement) s is increased by value of 3rd column ($3)
i hope it explains the print also smile (s/n = result of division of s by n (/ is division))

Offline

#3 2007-06-10 11:43:49

STUDENT
Member
Registered: 2007-06-10
Posts: 7

Re: AWK help - beginner

drakosha wrote:

n = n + 1: n is a variable holding integer value, it's initialized with 0. on each line where 2nd column equals "M" (look the if statement) n is increased by one
s = s + $3: s is a variable holding integer value, it's initialized with 0. on each line where 2nd column equals "M" (look the if statement) s is increased by value of 3rd column ($3)
i hope it explains the print also smile (s/n = result of division of s by n (/ is division))

I'm really sorry, but I'm having a hard time understanding the AWK lingo you're using. Again, I'm very new to this so I don't know what terms like "integer" and "variable" mean. Simply put, I wish you could tell me what these codes do, as opposed to defining them in AWK lingo. BIG Thx.

Last edited by STUDENT (2007-06-10 11:44:26)

Offline

#4 2007-06-10 13:29:57

klixon
Member
From: Nederland
Registered: 2007-01-17
Posts: 525

Re: AWK help - beginner

That's no AWK-lingo, but general programming lingo...

Basically a variable is (very general...) a name you give to a piece of the computer's memory, where you can store and retrieve values in/from... (sorta like the 'M' button on your calculator...).

So, 's' in the examples is a piece of memory where you store a value.

An integer is a number without the fraction part ( 1 is an int as opposed to 1.5)

to say that 's' is an integer variable means the computer will only store integers in that piece of memory you called s.

Now, normally awk looks at your program line by line. Every line (which are called records in awk-lingo) get chopped up into pieces called 'fields'. If you don't specify options, awk uses 'whitespace' (which are tabs, spaces and such) to delimit those fields (in your example, the record gets broken into fields at every space-character...)
These fields get a label, so you can examine or change them. field 1 is called $1, field 2 is called $2... etc

You write an awk program by defining rules for the records you feed it:

test { action }

The "test" is optional. If you don't supply it, the action will be applied to every record. If you supply it, awk checks if it's true and if so, performs the action. If not, it will go to the next rule, or, if there are no more rules, the next record
The "action" is also optional i think, but that's not really interesting right now.

So, what does example 1 do:
First, I think it has a typo and should be

{
  if ($2 == "M") {
    s=s+$3
    n=n+1
  }
}

END {
  print s, n, s/n, "avg height"
}

you feed it the first record ("abe M 70").
Awk splits this record into 3 fields:
$1 = "abe"
$2 = "M"
$3 = 70

There is no "test" (The rule starts with an '{' and tests go before the '{') so the action inside the outer accolades will be applied to the record.

Next you get and "if". This one check if $2 (field 2) is equal to "M" (when you check for equality, you use double ='s. When you assign a value to a variable, you use a single = ). If such is the case it will execute the statements between the inner accolades.
Of course, in this record, it's true, so this will happen:
s=s+$3 :: s is a variable. this statement adds the value of $3 (field 3) to s and assigns the resulting value to s (after this, s has the value 70, because s did not exist yet, and awk is kind enough to create it for you and assign the value 0 to it (don't count on this with other programming languages wink )
n=n+1 :: Here we add 1 to the value of the variable n and assign the new value to n (or, we increment n with 1) so n now has value 1.
I guess you can follow along with the next records now...

Now END is a special thing. END is essentially one of those "test"s is mentioned earlier. See: it before the '{', so the action won't be taken for every line, only those matching END.
END is "the end of all input", so when you're done feeding it records, this action will be done which goes like this:
print on screen the value of 's', the value of 'n', the value of s divided by the value of n and the text "avg height".
The comma's are necessary for awk to string it all together on one line.
In the case of your example input it would print

139 2 70 avg height

Hope it's clear and hope this gives you a starting point to figure out what happens in example 2


Stand back, intruder, or i'll blast you out of space! I am Klixon and I don't want any dealings with you human lifeforms. I'm a cyborg!

Offline

#5 2007-06-10 20:16:59

STUDENT
Member
Registered: 2007-06-10
Posts: 7

Re: AWK help - beginner

Wow, BIG thx to klixon. You truly broke it down for me to understand. Our professor did not teach it like that for sure. Again, much thx.

However, I was still wondering about the second example. Of course, I understand now, everything you've taught me, but what does "if ($N / [fF] /)" or "($2N/[mM])" mean? I don't understand the small "f" and capital "F" together like that, nor the capital "N." THX!

Abe M 70
Bea F 65
Cathy F 67
Dave M 69

{
if ($N / [fF] /) {
fsum = fsum + 3
fnum = fsum + 1
}
if ($2N/[mM]){
msum = msum + $3
mnum = mnum + 1
}
}

Offline

#6 2007-06-10 20:51:33

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: AWK help - beginner

from 'man awk':
For  /regular  expression/  patterns, the associated statement is executed for each input record that
       matches the regular expression.  Regular expressions are the same as those in egrep(1), and are  sum-
       marized below.

Those are regexps, i really do not know what is $N

Offline

#7 2007-06-10 22:14:14

klixon
Member
From: Nederland
Registered: 2007-01-17
Posts: 525

Re: AWK help - beginner

Are you sure you copied it accurately?
This probably won't work on it's own... you copied everything?

Last edited by klixon (2007-06-10 22:16:22)


Stand back, intruder, or i'll blast you out of space! I am Klixon and I don't want any dealings with you human lifeforms. I'm a cyborg!

Offline

#8 2007-06-10 23:05:46

STUDENT
Member
Registered: 2007-06-10
Posts: 7

Re: AWK help - beginner

klixon wrote:

Are you sure you copied it accurately?
This probably won't work on it's own... you copied everything?

Yes, that's all he had. I guess he didn't write the print directions. Still, I'd like to know what "if ($N / [fF] /)" or "($2N/[mM])" BIG THX

Offline

#9 2007-06-10 23:13:58

STUDENT
Member
Registered: 2007-06-10
Posts: 7

Re: AWK help - beginner

Here's another one that really confused me...

Name of Person - 1-30
Sex - 31
Race - 32-33

{
name = substr ($0, 1, 30)
sex = substr ($0, 31, 1)
race = substr ($0, 32, 2)
if (sex == 1) && (race == 1) wm=wm+1
if ((sex != 1) || (race != 1) other=other+1
}

1. What does "substri" or substring mean?
2. What's the "wm=wm+1" and "other=other+1" about? Is this actually a code or did he just add that on his own? What does it do? I understand from the previous post that the "+1" means you're specifying "wm" as "1," but what is the point of it?
3. What is object of this code?

Offline

#10 2007-06-11 07:23:29

klixon
Member
From: Nederland
Registered: 2007-01-17
Posts: 525

Re: AWK help - beginner

Well, I didn't mean that he left out the print-statements...
I'd say it has to look like this to make it work:

{
  if ($0 ~ / [fF] /) {
    fsum = fsum + $3
    fnum = fnum + 1
  }
  if ($0 ~ / [mM] /){
    msum = msum + $3
    mnum = mnum + 1
  }
}

That's 7 or 8 corrections... hmm

This way it will do at least something logical with the given data... I think you can create the print statement yourself now if you follow along on my first post. If  you don't, you won't get any output.

This is another rule without a test, so it will apply to every record.
It tests if the record ($0 means the whole record, not one particular field) contains either the regular expression / [fF] / or / [mM] / and increments values accordingly.

The ~ is an operator the tells awk to test if the regular expression on the right of it is contained in the string to the left of it...

Basically it first check if the records contains a space, followed by either a lowercase f or an uppercase f, followed by a space. The [...] is called a character class and is used to group character the regexp would match at that place. It also tests for the spaces so it for instance won't match 'ferry' (which has a lowercase f in it, but the next character isn't a space...

A more awk-ish way to write it would be like this:

/ [fF] / {
  fsum += $3
  fnum++
}
/ [mM] / {
  msum += $3
  mnum++
}

This finally has a test-part in the rule...

or to be more precise:

tolower($2) ~ /f/ {
  ...
}
tolower($2) ~ /m/ {
  ...
}

Last edited by klixon (2007-06-11 07:27:40)


Stand back, intruder, or i'll blast you out of space! I am Klixon and I don't want any dealings with you human lifeforms. I'm a cyborg!

Offline

#11 2007-06-11 07:35:31

klixon
Member
From: Nederland
Registered: 2007-01-17
Posts: 525

Re: AWK help - beginner

STUDENT wrote:

Name of Person - 1-30
Sex - 31
Race - 32-33

This is not the data itself, but a description of what the datafile should look like...
It is a file where every field has a fixed width

chars 1 to 30 contain the name
char 31 the sex
char 32 and 33 the race.

substr is a function... You feed it a piece of text (which is called a string), a starting character and a lenght and it returns to you the sub-string that starts in the string you fed it, where you told it and has the given length...

wm=wm+1 means you add 1 to the current value of wm and assign this new value to wm (iaw you add 1 to wm) See first post and the rambling about variables...

Basically, you're counting white males and others...

Alternative which works in gnu awk (but probably not in other awk implementations):

BEGIN { 
  # BEGIN is like END, but gets executed before any lines are read...
  # this is a comment by the way and will be completely ignored by awk
  # With gnu awk you can define field widths, so you won't have to 
  # use substr 
  # Breaking records into fields according to whitespace is not a good 
  # idea with a fixed width file, because if one of the fields is left blank, 
  # it won't be seen as a field and the number of fields in that record as 
  # seen by awk will be off (by one)
  FIELDWIDTHS="30 1 2" 
}

{
  if ($2 == 1) && ($3 == 1) wm=wm+1
  if ($2 != 1) || ($3 != 1) other=other+1
  # && and || should be read and "and" and "or"
  # if $2 (sex) is equal to 1 AND $3 (race) is equal to 1, increment 
  # wm (which presumably stands for "white male"?)
  # if $2 is not (!) equal to 1 OR $3 is not equal to 1, increment 
  # other (which are either not white or not male... Would you have 
  # used && on this line, only not-white females, females with unknown 
  # race or not-whites with unknown sex would have been counted
}

Last edited by klixon (2007-06-11 10:33:50)


Stand back, intruder, or i'll blast you out of space! I am Klixon and I don't want any dealings with you human lifeforms. I'm a cyborg!

Offline

#12 2007-06-13 08:47:38

STUDENT
Member
Registered: 2007-06-10
Posts: 7

Re: AWK help - beginner

First off, I want to seriously thank you for taking your precious time and attempting to teach me AWK.

Second, I'm just having a lot of trouble understanding this language. I come from CSS/Javascript, so I think I have this preconceived approach that's affecting my ability to fully comprehend these codes. My test is tomorrow, so it's a bit late for me to ask anymore questions. I'm just gonna have to make do with what I can make sense of.

Again, thanks a lot for trying to help me out. Extremely generous of you.

Offline

#13 2007-06-13 10:04:08

klixon
Member
From: Nederland
Registered: 2007-01-17
Posts: 525

Re: AWK help - beginner

No problem...
awk's more like a stripped-down, script-like C dialect with built-in regular expression... took me quite some time to figure out how to get things done...

I work a lot with plain-text data-files, so now that i get the basics, awk is really a blessing

Good luck with the test!
big_smile


Stand back, intruder, or i'll blast you out of space! I am Klixon and I don't want any dealings with you human lifeforms. I'm a cyborg!

Offline

Board footer

Powered by FluxBB