awk regular expressions in RS and FS

carlocci · 2008-07-12 17:04:33

see this file:

%DEPENDS%
readline
glibc

%REQUIREDBY%
glibc
coreutils
findutils
gawk
perl
gzip
hwdetect
util-linux-ng
initscripts
mkinitcpio
man
pacman
tar
tcp_wrappers
abs
m4
autoconf
automake
flex
libtool
make
capi4k-utils

%PROVIDES%
sh

I want to list only the lines under %DEPENDS%:
with awk I would do:

$ awk 'BEGIN { RS="^%DEPENDS%$" ; FS="^$" } {print $1}' filename
%DEPENDS%
readline
glibc

%REQUIREDBY%
glibc
coreutils
findutils
gawk
perl
gzip
hwdetect
util-linux-ng
initscripts
mkinitcpio
man
pacman
tar
tcp_wrappers
abs
m4
autoconf
automake
flex
libtool
make
capi4k-utils

%PROVIDES%
sh

why is this? Notice the newline awk adds to the end of the input.
From what I understood the problem lies with $ matching the end of the line: if I remove it from RS %DEPENDS% is matched. FS never matches the empty line.
Why is this?

Procyon · 2008-07-12 18:15:56

[false]
Watch out with large Records:

man awk wrote:

When RS is set to the null string, the newline character always acts as a
field separator, in addition to whatever value FS may have.

This seems to be incomplete, for with RS="DEPENDS" the same thing happens.

Therefor the approach is impossible. You'll have to find another way to do it.
[/false]

EDIT: While I saw what I just said was inconsistent with one large field. I thought it happened in:

--> awk 'BEGIN { RS="DEPENDS" ; FS="i" } {print $1}' filename
%
%
readl
--> awk 'BEGIN { RS="DEPENDS" ; FS="i" } {print $2}' filename

ne
gl

But I can't really make heads or tails of this, so please ignore what I said at first.

Anyway, try to get rid of the regexes.

Give this a try. Make FS="\n\n".
What to make RS is tricky, if it doesn't exist (like "^%DEPENDS%$") behavior is odd:

--> awk 'BEGIN { RS="NONOCCURING" ; FS="\n\n" } {print $1}' filename
%DEPENDS%
readline
glibc

So I think this is as far as we can go:

--> awk 'BEGIN { RS="%DEPENDS%\n" ; FS="\n\n" } {print $1}' filename

readline
glibc

One empty string for Record #1. Would require some extra work to get rid of.

Last edited by Procyon (2008-07-12 19:07:10)

carlocci · 2008-07-12 20:25:43

Thank you, it didn't come to my mind that I could use \n\n!

I worked it around with

awk 'BEGIN { RS="%DEPENDS%\n" ; FS="\n\n" ; ORS=""} {print $1} END {print "\n"}' depends

still it's quite awkward as regexp should work as RS, FS. Is this a bug?
I thought command lines utils were essentially bug free after 30 years of development

Procyon · 2008-07-12 22:15:10

I think it's just a bit limited because they aren't expected to be used as much.

$ will only match EOF. So:

--> awk 'BEGIN { RS="1.*eol\n$"; FS="\n"} {print "REC:" $0}' <<< "line 1 eol
line 2 eol"
REC:line

it's only 1 record, and it only has "line" in it. (EDIT: so substituting '.*' with ' ' doesn't work)

^ will match beginning of line
Let's take this for example:

^last item\n
^\n
^%something%\n

FS="\n^\n^" is valid. And FS="^\n^\n^" is not.
With a dot it works fine tooFS=".^.^", which proves ^ isn't meaningless.
But FS="^\n^" doesn't work, which is odd.

EDIT:

try yourself code if you don't want to make the file:

--> awk 'BEGIN { RS="DEP\n"; FS="\n^\n^"} {print "REC:" $1}' <<< "DEP
last item

don't display"
REC:
REC:last item
--> awk 'BEGIN { RS="DEP\n"; FS="^\n^"} {print "REC:" $1}' <<< "DEP
last item

don't display"
REC:
REC:last item

don't display

Last edited by Procyon (2008-07-12 22:32:47)

briest · 2008-07-14 19:12:14

Hm, what about

BEGIN{RS=""}($1=="%DEPENDS%"){print; exit}

?

carlocci · 2008-07-14 22:20:42

that is so good it's embarassing on my side!

Arch Linux

#1 2008-07-12 17:04:33

awk regular expressions in RS and FS

#2 2008-07-12 18:15:56

Re: awk regular expressions in RS and FS

#3 2008-07-12 20:25:43

Re: awk regular expressions in RS and FS

#4 2008-07-12 22:15:10

Re: awk regular expressions in RS and FS

#5 2008-07-14 19:12:14

Re: awk regular expressions in RS and FS

#6 2008-07-14 22:20:42

Re: awk regular expressions in RS and FS

Board footer