[JWR'D]audit my awk

alphaniner · 2016-01-23 05:57:14

I have a .csv which is an export of a simple voxel model:

8,4,1
#4F4530FF,#4F442FFF,#4E442FFF,#4E442FFF,#4B412DFF,#4E442FFF,#4F442FFF,#514631FF

#645639FF,#615337FF,#695A3CFF,#68593BFF,#615337FF,#67593BFF,#605337FF,#68593BFF

#4E442FFF,#4F442FFF,#514631FF,#4C422EFF,#4C422DFF,#4C422DFF,#4F4530FF,#4F4530FF

#6E5F3EFF,#6D5E3EFF,#6C5D3DFF,#6B5C3DFF,#706140FF,#695A3BFF,#6E5F3EFF,#695A3BFF

I needed to parse it down to just the sorted unique hex values. So obviously,

grep -o '#......' model.csv |sort -u

But the final output was intended for .json list so this would be even better:

"#000000",
...
"#ffffff"

Sure, I could add a pipe to sed or such, but I'm a masochist so I decided to do it all in awk. I'd never done anything more complicated than a one-liner before, so criticism is appreciated:

BEGIN{
    FS = ","
    hex_regexp = "^#[0-9a-fA-F]{8}$"
    delete all_vals[0]
}

/#/{
    for (i=1; i<=NF; i++)
    {
        if ( $i ~ hex_regexp )
        {
            len = length(all_vals) + 1
            all_vals[len] = substr($i, 0, 7)
        }
    }
}

END{
    PROCINFO["sorted_in"] = "@val_str_asc"

    for (i in all_vals)
    {
        a = all_vals[i]
        matched = 0
        for (j in unique_vals)
        {
            b = unique_vals[j]
            if (a == b)
            {
                matched = 1
                break
            }
        }
        if (matched == 0)
        {
            len = length(unique_vals) + 1
            unique_vals[len] = a
        }
    }

    for (i in unique_vals)
    {
        if ( i != length(unique_vals) )
            print "\"" unique_vals[i] "\","
        else
            print "\"" unique_vals[i] "\""
    }
}

One thing that's quirky is the need to delete all_vals[0] , which AFAICT is necessary to "initialize" it as an array. Otherwise awk throws an error:

$ awk -f voxel_colors.awk voxel.csv
awk: voxel_colors.awk:12: (FILENAME=voxel.csv FNR=2) fatal: attempt to use scalar `all_vals' as an array

I only knew to do that thanks to SO. It seems for (i in var) also initializes var as an array (only if first reference?), which is presumably why delete unique_vals[0] is not necessary.

Edit: although the solution jason led me to is best, I realized my original END block can be cut down to

END{
    PROCINFO["sorted_in"] = "@val_str_asc"

    b = ""
    for (i in all_vals)
    {
        a = all_vals[i]
        if ( a != b ) {
                b = a
                printf "\"%s\",\n", a
        }
    }
}

Last edited by alphaniner (2016-01-23 18:44:42)

jasonwryan · 2016-01-23 06:27:38

Would this not be simpler?

awk 'BEGIN { RS=","} /^#/ {printf "%s,\n", $1}' file

alphaniner · 2016-01-23 07:31:03

It misses the first entry on each line. RS = "[\n,]" seems to solve that. Doing something like that was my first thought, but I didn't know if it was possible so I went with the for loop. Thanks for pointing it out.

However, I wasn't aiming for simplicity anyway; I wanted to fully replicate eg.

grep -o '#......' model.csv |sort -u |awk 'print{"\""$0"\","}'

with a single awk program & no pipes.

jasonwryan · 2016-01-23 07:48:54

Ugh: completely missed the uniq bit, sorry.

Give this a shot:

awk 'BEGIN { RS="," } /^#/ { !a[$1]++ } END {for (b in a)  { printf "\"%s,\"\n", b }}' file

alphaniner · 2016-01-23 15:13:17

Wow. Assigning to indices to avoid the need for manual uniq-ing is a great trick. Thanks! It seems that references the key is all that's necessary to have it added. Is there a reason you use !a[$1]++ rather than just a[$1] ?

I still need to modify the RS to include "\n" and specify array scan order, and verify only records of the correct form are included just to be safe. Putting it all together:

BEGIN {
	RS="[\n,]"
	regex = "^#[0-9a-fA-F]{8}$"
}

$0 ~ regex { a[substr($1,0,7)] }

END {
	PROCINFO["sorted_in"] = "@ind_str_asc"
	for (b in a) {
		printf "\"%s\",\n", b
	}
}

jasonwryan · 2016-01-23 18:33:48

No, it was late and I was tired, so wasn't thinking all that clearly (missing the uniq requirement was a bit of a tell )...

Glad you got is sorted (no pun intended).

Arch Linux

#1 2016-01-23 05:57:14

[JWR'D]audit my awk

#2 2016-01-23 06:27:38

Re: [JWR'D]audit my awk

#3 2016-01-23 07:31:03

Re: [JWR'D]audit my awk

#4 2016-01-23 07:48:54

Re: [JWR'D]audit my awk

#5 2016-01-23 15:13:17

Re: [JWR'D]audit my awk

#6 2016-01-23 18:33:48

Re: [JWR'D]audit my awk

Board footer