You are not logged in.

#1 2010-10-14 15:47:52

ogronom
Member
From: Toronto, Canada
Registered: 2008-05-06
Posts: 123

[SOLVED] c, pthread: parallel map, reduce

Hi guys,

Problem: I have an array of arguments, I should apply some function (which calculation is quite time-consuming) and sum the results. Since all calculation are independent I thought it would be great try to parallel my code with pthreads.
I would expect to get the faster program by approximately factor of 1/NUM_THREADS (if NUM_THREADS <= num_of_cpus) compared to the serial varial.

However I get the same time of execution.

Since this is my first experience with threads I'm out of ideas why there is no advantage. Here is the sample code that mimics the problem.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_THREADS 2
#define NUM_ELEMENTS 36
#define FIB 37

long fib(long n);

long fib(long n)
{
  if (n == 0) return 0;
  if (n == 1) return 1;
  return fib(n - 1) + fib(n - 2);
}

double work(long start, long num)
{
  long i, ie = start + num, a;
  double res = 0.;
  for ( i = start; i < ie; ++i) {
    a = fib(FIB);
    res += (i + 1.) / a;
  }
  return res;
}

typedef struct
{
  long tid;
  long start, num;
  double res;
} my_thread_data_t;

void *BusyWork(void *t)
{
  int i;
  my_thread_data_t * data = (my_thread_data_t *) t;
  long tid = data->tid;
  printf("Thread %ld starting...\n",tid);
  data->res = work(data->start, data->num);
  printf("Thread %ld res = %e\n", tid, data->res);
  printf("Thread %ld done. \n",tid);
  pthread_exit(NULL);
}

int main (int argc, char *argv[])
{
  pthread_t thread[NUM_THREADS];
  pthread_attr_t attr;
  my_thread_data_t data[NUM_THREADS];
  int rc;
  long num = NUM_ELEMENTS / NUM_THREADS;
  long t;
  double res;

  /* Initialize and set thread detached attribute */
  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

  for(t=0; t<NUM_THREADS; t++) {
    printf("Main: creating thread %ld\n", t);
    (data + t)->tid = t;
    (data + t)->start = t * num;
    (data + t)->num = num;
    rc = pthread_create(&thread[t], &attr, BusyWork, data + t);
    if (rc) {
      printf("ERROR; return code from pthread_create() is %d\n", rc);
      exit(-1);
    }
  }

  /* Free attribute and wait for the other threads */
  pthread_attr_destroy(&attr);
  for(t=0; t<NUM_THREADS; t++) {
    rc = pthread_join(thread[t], NULL);
    if (rc) {
      printf("ERROR; return code from pthread_join() is %d\n", rc);
      exit(-1);
    }
    printf("Main: completed join with thread %ld\n",t);
  }

  res = 0.;
  for(t = 0; t < NUM_THREADS; t++) {
    res += (data + t)->res;
  }
  printf("Main: res = %e\n", res);

  printf("Main: program completed. Exiting.\n");
  return 0;
}

Changing NUM_THREADS from 2 to 1. Makes no difference in time of calculation.

Last edited by ogronom (2010-10-17 02:51:45)

Offline

#2 2010-10-14 20:11:36

tavianator
Member
From: Waterloo, ON, Canada
Registered: 2007-08-21
Posts: 859
Website

Re: [SOLVED] c, pthread: parallel map, reduce

The reason in this case seems to be that your fib() function takes GIGANTICALLY more time for larger inputs than for smaller ones (in fact, it has complexity O(2^n) to find the nth Fibonacci number), so the second thread does way more work than the first thread.  For this algorithm, the fair parallelization is to have the first thread calculate the first FIB-1 numbers, and the second thread only calculate the last number.

Look here for a more reasonable way to calculate the nth Fibonacci number.  For the linear (O(n)) algorithm, the fair split is at FIB/sqrt(2) for two threads, but FIB/2 will at least show some speed improvement.

Also, the variable `i' is unused in BusyWork().

Edit: formula fix, clarity, typo.

Last edited by tavianator (2010-10-14 20:30:59)

Offline

#3 2010-10-14 20:28:39

ber_t
Member
From: Berlin, Germany
Registered: 2010-03-10
Posts: 214
Website

Re: [SOLVED] c, pthread: parallel map, reduce

@tavianator: Aren't both threads always calculating fib(FIB) == fib(37) for every input value?

I've replaced the fib function with a non-recursive one:

long fib(long n)
{
  long prev = -1;
  long sum, result = 1;
  long i;

  for (i = 0; i <= n; ++i) {
    sum = result + prev;
    prev = result;
    result = sum;
  }
  return result;
}

Doing this results in almost no runtime at all. Is it possible that the recursion is the problem? Does your original code also use recursion?

Offline

#4 2010-10-14 20:37:23

tavianator
Member
From: Waterloo, ON, Canada
Registered: 2007-08-21
Posts: 859
Website

Re: [SOLVED] c, pthread: parallel map, reduce

ber_t wrote:

@tavianator: Aren't both threads always calculating fib(FIB) == fib(37) for every input value?

Um no, I think the intent of the OP is that thread 1 calculates the sub of fib(i) from i=1 to FIB/NUM_THREADS, and thread 2 from i=FIB/NUM_THREADS + 1 to 2*FIB/NUM_THREADS, etc.

I've replaced the fib function with a non-recursive one ... Doing this results in almost no runtime at all. Is it possible that the recursion is the problem? Does your original code also use recursion?

Yeah, with a non-recursive fib() I needed FIB to be 3700000 before I could get any reasonably accurate timings.

Last edited by tavianator (2010-10-14 20:39:57)

Offline

#5 2010-10-15 02:58:11

ogronom
Member
From: Toronto, Canada
Registered: 2008-05-06
Posts: 123

Re: [SOLVED] c, pthread: parallel map, reduce

Maybe I was not quite clear (or my English is that bad), but the code posted above is just an example to illustrate the problem. The original code too large to post here.

Fib was taken in the recursive form on purpose to simulate the complex time-consuming function. There is no more possible (or sufficient) optimization for the original function.

Both threads calculate Fib(37), then do some minor fast modification (in the example (i+1.)/Fib(37)). This is done to get approximately the same (quite large) time of calculation for any input parameter i.

Offline

#6 2010-10-15 04:13:33

tavianator
Member
From: Waterloo, ON, Canada
Registered: 2007-08-21
Posts: 859
Website

Re: [SOLVED] c, pthread: parallel map, reduce

Sorry, I misinterpreted the code, I confused FIB with NUM_ELEMENTS, and thought you were calling fib(i) rather than fib(FIB).  Ignore everything I said above tongue

In this case the lack of increased performance is caused by loop invariant code motion -- meaning that gcc moved the a = fib(FIB); calculation out of the loop because it doesn't depend on anything in the loop and it has no side effects, so fib() is only called once.  Compiling without optimization shows giant speed improvements as NUM_THREADS is increased, because fib(FIB) is computed inside the loop (13.202s to 6.539).

Edit: I doubt that this is the problem with your real code, so to help diagnose the real problem, it'd help to have access to the real code or a more thorough code example.  I'd be happy to look at it even if it's huge; I happen to like implementing concurrent algorithms.

Last edited by tavianator (2010-10-15 04:20:43)

Offline

#7 2010-10-15 09:45:01

ber_t
Member
From: Berlin, Germany
Registered: 2010-03-10
Posts: 214
Website

Re: [SOLVED] c, pthread: parallel map, reduce

ogronom wrote:

Fib was taken in the recursive form on purpose to simulate the complex time-consuming function. There is no more possible (or sufficient) optimization for the original function.

And my question was, if the original code also uses recursion? So now I guess it does not...

Have you already tried a profiling tool like gprof?

Offline

#8 2010-10-16 23:24:12

ogronom
Member
From: Toronto, Canada
Registered: 2008-05-06
Posts: 123

Re: [SOLVED] c, pthread: parallel map, reduce

Hi
@ber_t: no I don't have recursion. Yep I profiled my program with gprof. But the question is about threads.

@tavianator: I'll think about posting my code. But right now I grew interest in this simple one. You said that you receive speed up

Here is some modification of the current code.  http://pastebin.org/245755
And the results what I have.

% gcc -O0 -DNUM_THREADS=1 pthreads.c -lpthread
% time ./a.out
Thread 0 starting...
Thread 0 res = 2.756872e-05
Thread 0 done. 
./a.out  38.99s user 0.28s system 99% cpu 39.460 total
% gcc -O0 -DNUM_THREADS=2 pthreads.c -lpthread
% time ./a.out
Main: creating thread 0
Main: creating thread 1
Thread 1 starting...
Thread 0 starting...
Thread 1 res = 2.049026e-05
Thread 1 done. 
Thread 0 res = 7.078454e-06
Thread 0 done. 
Main: completed join with thread 0
Main: completed join with thread 1
Main: res = 2.756872e-05
Main: program completed. Exiting.
./a.out  35.42s user 0.21s system 187% cpu 18.956 total

I have

% cat /proc/cpuinfo | grep name
model name    : Intel(R) Pentium(R) Dual  CPU  T2370  @ 1.73GHz
model name    : Intel(R) Pentium(R) Dual  CPU  T2370  @ 1.73GHz

Could you post you results of time calculation?

Offline

#9 2010-10-16 23:30:59

tavianator
Member
From: Waterloo, ON, Canada
Registered: 2007-08-21
Posts: 859
Website

Re: [SOLVED] c, pthread: parallel map, reduce

Sure:

tavianator@superluminal> gcc -O0 -DNUM_THREADS=1 pthreads.c -pthread && time ./a.out >/dev/null                                                                    
./a.out > /dev/null  12.68s user 0.00s system 99% cpu 12.702 total
tavianator@superluminal> gcc -O0 -DNUM_THREADS=2 pthreads.c -pthread && time ./a.out >/dev/null
./a.out > /dev/null  13.19s user 0.00s system 199% cpu 6.618 total
tavianator@superluminal> gcc -O0 -DNUM_THREADS=12 pthreads.c -pthread && time ./a.out >/dev/null
./a.out > /dev/null  13.87s user 0.00s system 935% cpu 1.483 total
tavianator@superluminal> gcc -O0 -DNUM_THREADS=18 pthreads.c -pthread && time ./a.out >/dev/null
./a.out > /dev/null  16.40s user 0.00s system 1594% cpu 1.029 total
tavianator@superluminal> gcc -O0 -DNUM_THREADS=36 pthreads.c -pthread && time ./a.out >/dev/null
./a.out > /dev/null  19.47s user 0.00s system 2076% cpu 0.938 total

I have:

tavianator@superluminal> grep name /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz

Offline

#10 2010-10-17 02:51:19

ogronom
Member
From: Toronto, Canada
Registered: 2008-05-06
Posts: 123

Re: [SOLVED] c, pthread: parallel map, reduce

Thanks a lot for the help.

I just looked at wrong place

Offline

Board footer

Powered by FluxBB