You are not logged in.
Hello everybody
when I try to run my python code I always get this error:
Traceback (most recent call last):
File "gcd_process.py", line 43, in <module>
p[counter].start()
File "/usr/lib/python3.3/multiprocessing/process.py", line 111, in start
self._popen = Popen(self)
File "/usr/lib/python3.3/multiprocessing/forking.py", line 94, in __init__
r, w = os.pipe()
OSError: [Errno 24] Too many open files
I only get this since I tried to implement paralellism for speed improvement. This error always pops uf when I've opened about 1020 child processes.(There are always about 300 processes active, however when I opened the 1020th process the error pops up. Does anyone have an idea how to solve this?
Greetings
Blubbb
Oh and here's my code:
p=[]
for counter, n in enumerate(huge_list):
#threading.Thread(target=coolstuff, args=(n, huge_list, counter)).start()
p+=[Process(target=coolstuff, args=(n, huge_list,counter))]
p[counter].start()
print(str(counter + 1) + "/" + str(len(huge_list)))
#for x in p:
# p.join()
print("finished")
def coolstuff(n, huge_list, counter):
for thingy in huge_list[counter:]:
if n == thingy:
continue
gcdresult = gcd(thingy, n)
if gcdresult != 1:
with open("primes.txt1", 'a') as foo:
foo.write("n1: " + str(thingy) + " n2: " + str(n) + " common prime: " + str(gcdresult) + " counter: " + str(counter)+ "\n")
foo.write("q1: " + str(thingy/gcdresult) + " q2: " + str(n/gcdresult))
print(gcdresult, n, thingy)
#print('Process ended['+str(counter)+"]")
Offline
Please do not use variable names like "foo", "thingy" and "coolstuff", especially for code you intend for others to read. You must be able to think of more descriptive labels. (It would also help to have a few sentences, either in comments within the code or in your post, describing the code's purpose.)
So in response to your question, there may be ways to avoid the max file limit, but you should not be creating so many processes anyway. They do not make your program faster.
Your CPU probably has only 2-4 cores (some CPUs have 6 or more; many have only one). One core can only do one thing at a time. Multithreading or multiprocessing may therefore allow you to do CPU-bound tasks, like arithmetic, twice to four times as fast as simply doing it sequentially.[1] If you create 100 processes on a machine with four cores, the process scheduler may put 25 processes on each core, but since each core can still only work on one process at a time, they won't get work done any faster than if you only had one on each. (In fact, it will be much slower, because there is a lot of overhead associated with switching between processes.)
Furthermore, even if you had an effectively infinite number of cores, you'd have them all trying to write to one file at once. That's not workable because (a) I/O tasks are so slow that each process will spend most of its time waiting for foo.write(), not doing actual work; and (b) writing to one file from different processes may not work the way you expect (see e.g. here).
There are several different ways to parallelize algorithms. For a task like this, you might create a queue containing sub-tasks, then create a number of processes each of which works on one task from the queue at a time until the entire queue is empty. Then just fill the queue in your master process. You should look into doing something like that. Your code has some other issues -- notably, every call to coolstuff() copies a (large) slice of huge_list unnecessarily -- but that is how I would start trying to make it work.
[1] This statement is a gross oversimplification. Many tasks can't easily be split into multiple concurrent processes; those that can may scale in different ways. As Brooks says, "Nine women can't make a baby in one month."
Offline