You are not logged in.

#1 2016-01-23 05:57:14

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,656

[JWR'D]audit my awk

I have a .csv which is an export of a simple voxel model:

8,4,1
#4F4530FF,#4F442FFF,#4E442FFF,#4E442FFF,#4B412DFF,#4E442FFF,#4F442FFF,#514631FF

#645639FF,#615337FF,#695A3CFF,#68593BFF,#615337FF,#67593BFF,#605337FF,#68593BFF

#4E442FFF,#4F442FFF,#514631FF,#4C422EFF,#4C422DFF,#4C422DFF,#4F4530FF,#4F4530FF

#6E5F3EFF,#6D5E3EFF,#6C5D3DFF,#6B5C3DFF,#706140FF,#695A3BFF,#6E5F3EFF,#695A3BFF

I needed to parse it down to just the sorted unique hex values. So obviously,

grep -o '#......' model.csv |sort -u

But the final output was intended for .json list so this would be even better:

"#000000",
...
"#ffffff"

Sure, I could add a pipe to sed or such, but I'm a masochist so I decided to do it all in awk. I'd never done anything more complicated than a one-liner before, so criticism is appreciated:

BEGIN{
    FS = ","
    hex_regexp = "^#[0-9a-fA-F]{8}$"
    delete all_vals[0]
}

/#/{
    for (i=1; i<=NF; i++)
    {
        if ( $i ~ hex_regexp )
        {
            len = length(all_vals) + 1
            all_vals[len] = substr($i, 0, 7)
        }
    }
}

END{
    PROCINFO["sorted_in"] = "@val_str_asc"

    for (i in all_vals)
    {
        a = all_vals[i]
        matched = 0
        for (j in unique_vals)
        {
            b = unique_vals[j]
            if (a == b)
            {
                matched = 1
                break
            }
        }
        if (matched == 0)
        {
            len = length(unique_vals) + 1
            unique_vals[len] = a
        }
    }

    for (i in unique_vals)
    {
        if ( i != length(unique_vals) )
            print "\"" unique_vals[i] "\","
        else
            print "\"" unique_vals[i] "\""
    }
}

One thing that's quirky is the need to delete all_vals[0] , which AFAICT is necessary to "initialize" it as an array. Otherwise awk throws an error:

$ awk -f voxel_colors.awk voxel.csv
awk: voxel_colors.awk:12: (FILENAME=voxel.csv FNR=2) fatal: attempt to use scalar `all_vals' as an array

I only knew to do that thanks to SO. It seems for (i in var) also initializes var as an array (only if first reference?), which is presumably why delete unique_vals[0] is not necessary.

Edit: although the solution jason led me to is best, I realized my original END block can be cut down to

END{
    PROCINFO["sorted_in"] = "@val_str_asc"

    b = ""
    for (i in all_vals)
    {
        a = all_vals[i]
        if ( a != b ) {
                b = a
                printf "\"%s\",\n", a
        }
    }
}

Last edited by alphaniner (2016-01-23 18:44:42)


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#2 2016-01-23 06:27:38

jasonwryan
Forum & Wiki Admin
From: .nz
Registered: 2009-05-09
Posts: 18,830
Website

Re: [JWR'D]audit my awk

Would this not be simpler?

awk 'BEGIN { RS=","} /^#/ {printf "%s,\n", $1}' file 

Arch + dwm   •   Mercurial repos  •   Github

Registered Linux User #482438

Online

#3 2016-01-23 07:31:03

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,656

Re: [JWR'D]audit my awk

It misses the first entry on each line. RS = "[\n,]" seems to solve that. Doing something like that was my first thought, but I didn't know if it was possible so I went with the for loop. Thanks for pointing it out.

However, I wasn't aiming for simplicity anyway; I wanted to fully replicate eg.

grep -o '#......' model.csv |sort -u |awk 'print{"\""$0"\","}'

with a single awk program & no pipes.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#4 2016-01-23 07:48:54

jasonwryan
Forum & Wiki Admin
From: .nz
Registered: 2009-05-09
Posts: 18,830
Website

Re: [JWR'D]audit my awk

Ugh: completely missed the uniq bit, sorry.

Give this a shot:

awk 'BEGIN { RS="," } /^#/ { !a[$1]++ } END {for (b in a)  { printf "\"%s,\"\n", b }}' file

Arch + dwm   •   Mercurial repos  •   Github

Registered Linux User #482438

Online

#5 2016-01-23 15:13:17

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,656

Re: [JWR'D]audit my awk

Wow. Assigning to indices to avoid the need for manual uniq-ing is a great trick. Thanks! It seems that references the key is all that's necessary to have it added. Is there a reason you use !a[$1]++ rather than just a[$1] ?

I still need to modify the RS to include "\n" and specify array scan order, and verify only records of the correct form are included just to be safe. Putting it all together:

BEGIN {
	RS="[\n,]"
	regex = "^#[0-9a-fA-F]{8}$"
}

$0 ~ regex { a[substr($1,0,7)] }

END {
	PROCINFO["sorted_in"] = "@ind_str_asc"
	for (b in a) {
		printf "\"%s\",\n", b
	}
}

But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#6 2016-01-23 18:33:48

jasonwryan
Forum & Wiki Admin
From: .nz
Registered: 2009-05-09
Posts: 18,830
Website

Re: [JWR'D]audit my awk

No, it was late and I was tired, so wasn't thinking all that clearly (missing the uniq requirement was a bit of a tell tongue )...

Glad you got is sorted (no pun intended).


Arch + dwm   •   Mercurial repos  •   Github

Registered Linux User #482438

Online

Board footer

Powered by FluxBB