You are not logged in.

#1 2012-03-26 07:15:08

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Possible to use process substitution a variable number of times?

Hello Archers,

I have a script like so:

#! /bin/bash

( readlink /proc/$$/fd/0 | grep -q "^pipe:" ) || ( file $(readlink /proc/$$/fd/0) | grep -q "character special" )

cat | tee >(awk 'NR % 2 == 0') >(awk '(NR+1) % 2 == 0')

Well, it's pretty trivial at the moment, but the underlying goal is to split the input, so it can be processed in parallel on multiple cores. If I understand correctly, the above code will split the input across exactly two cores. Does there exist some way to put the process substitution (i.e. ">(awk ' ... ')")  in a loop?

Before trying to increase the number of cores, I first tried to see whether I could loop just two times with,

...
cat | tee $(for i in 1 2; do >(awk -vn="$i" '(NR + $n) % 2 == 0'); done) 

but that didn't work.

It also occurred to me that maybe I'm presenting an X-Y problem. I did also consider using the split command, but as far as I can tell, it will only dump its output to files; it's no good for piping or process substitution. Any other ideas that would avoid needing to tee across a variable number of process substitutions?

Naturally, I tried a little Googlemancy, but only came up with answers to newbie questions, or tutorials aimed at newbies.

Thank you for your consideration.

Last edited by /dev/zero (2012-03-26 07:16:25)

Offline

#2 2012-03-26 07:50:20

foppe
Member
Registered: 2011-04-02
Posts: 47

Re: Possible to use process substitution a variable number of times?

Try xargs with the -P flag.
I'm giving you a general lead only. Go `man` or Google for specifics 'cause I haven'y got a clue what you're trying to do smile

Have fun

Offline

#3 2012-03-26 09:08:32

Blµb
Member
Registered: 2008-02-10
Posts: 224

Re: Possible to use process substitution a variable number of times?

xargs -P won't fix the $i though hmm
Or build the >(awk) lines into a string and use something like: eval "cat | $awklist"

Also, there's a bug (in the for loop - fir the other code it's obviously correct since it's for 2 processes): (NR + $n)%2 == 0
Should read (NR + $n) % $number_of_processes == 0
(because, for n=2 and n=4: (4 + 2 = 6)%2 = 0, (4+4 = 8)%2 = also 0
All your even 'n's produce the same output tongue

So you want to make sure you know the # of processes, and use for i in `seq 1 $max_procs`;...

EDIT:
Yes, since xargs cannot (I think) do the <() behaviour (unless it uses bash to execute the commands)
You can build the AWK list like this

AWKLIST=$(echo $number_of_threads | awk '{ for(i = 0; i < $1; ++i) { print ">(awk '\''(NR + " i ") % " $1 "'\'')" } }')

then use eval "cat | tee $AWKLIST"

Though I'm curious about how you're gonna use the /dev/fd/* produced by <() ?

Last edited by Blµb (2012-03-26 09:19:12)


You know you're paranoid when you start thinking random letters while typing a password.
A good post about vim
Python has no multithreading.

Offline

#4 2012-03-26 10:25:08

/dev/zero
Member
From: Melbourne, Australia
Registered: 2011-10-20
Posts: 1,247

Re: Possible to use process substitution a variable number of times?

Thanks for the ideas. It's getting late here so I won't try them out yet, but you've both given me some leads to consider smile.

Offline

#5 2012-03-26 15:02:43

tavianator
Member
From: Waterloo, ON, Canada
Registered: 2007-08-21
Posts: 859
Website

Re: Possible to use process substitution a variable number of times?

You could use mkfifo to make a named pipe, create a bunch of awk processes reading from that pipe, and then write to the pipe.

Offline

#6 2012-03-26 17:07:30

Blµb
Member
Registered: 2008-02-10
Posts: 224

Re: Possible to use process substitution a variable number of times?

Or if you want to avoid named pipes, since during development they tend to create lots of files, and the script has to clean them up afterwards, take a look at the 'coproc' feature bash has since version 4
Though, remember that 'man bash' under BUGS says, there may only be one active coproc at a time.
According to this it's not really such a big deal though.
And it won't kill the started coprocesses or anything if you start another process...

Here's a sample session: (the spaces are important btw. tongue)

wry:~/ $ bash                                                                                                                                       [19:02:43]
wry@blubmb:~$ coproc c1 ( cat )
[1] 4129
wry@blubmb:~$ coproc c2 ( cat )
bash: warning: execute_coproc: coproc [4129:c1] still exists
[2] 4130
wry@blubmb:~$ echo $c1_PID $c2_PID -- ${c1[@]} ${c2[@]}
4129 4130 -- 63 60 62 58
wry@blubmb:~$ echo aaac1 >&${c1[1]}
wry@blubmb:~$ echo bbbc2 >&${c2[1]}
wry@blubmb:~$ read -u ${c1[0]} line1
wry@blubmb:~$ read -u ${c2[0]} line2
wry@blubmb:~$ echo $line1 and $line2
aaac1 and bbbc2
wry@blubmb:~$ kill $c1_PID $c2_PID
wry@blubmb:~$ 
[1]-  Terminated              coproc c1 ( cat )
[2]+  Terminated              coproc c2 ( cat )
wry@blubmb:~$ exit
exit

Just keep in mind that the filedescriptors you get for a coproc won't be available in subshells you execute with &
Eg. you cannot do

coproc X
( use X ) &

Links: http://wiki.bash-hackers.org/syntax/keywords/coproc


You know you're paranoid when you start thinking random letters while typing a password.
A good post about vim
Python has no multithreading.

Offline

Board footer

Powered by FluxBB