Project Euler

Abelian · 2008-05-15 21:52:05

Anyone on here working their way through project euler (www.projecteuler.net)?

Project Euler is series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

I'm finding it a great way of learning how to program. It is satisfying though when you see you don't need a computer to solve one of them though!

Anyway this is problem 9 and my solution in python:

#Find the greatest product of five consecutive digits in the 1000 digit number:
N = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"

a = 0
b = 5
finalist=[]
while b < 1000:
    r = 1
    p = N[a:b]
    s = list(p)
    for p in s:
        p = int(p)
        r = r*p
        
    finalist.append(r)
    a = a + 1
    b = b + 1
p_old = 0
for p in finalist:
    if p > p_old:
        p_old = p
print p_old

Executing this and timing it gives this:

40824

real    0m0.032s
user    0m0.023s
sys    0m0.003s

The question is, how fast can we complete this problem? Any language goes. How fast can you make it

kakTuZ · 2008-05-15 23:35:08

Hi,
I am playing with Project Euler, too (have the first 30 Problems solved). Really great site for learning to handle new languages and practicing implementation of algorithms.

You are talking about problem No. 8, don't you?

jeebusroxors · 2008-05-16 00:46:33

Check out:

http://www.pythonchallenge.com/

That's a lot of fun. I should get back to it.

elide · 2008-05-16 00:49:45

#include <stdio.h>

char *N = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450";
#define I(x) (*(x)-'0')

int main(){
    int max = 0, prod;
    char *n = N+3;
    while(*++n)
        if( (prod = I(n-4) * I(n-3) * I(n-2) * I(n-1) * I(n)) > max )
            max = prod;
    printf("%d\n", max);
    return 0;
}

1000 digit number is a very small task. Time needed to complete computation itself is negligibly small compared with the time needed to process creation, program loading, linking, etc.
So, if you want to compare execution times, you should extend number to 10^6 digits at least, or wrap you code in a loop.
e & e.py are short versions:

>> export TIMEFORMAT='real: %3R | user: %3U | sys: %3S'
>> time ./e
40824
real: 0.002 | user: 0.000 | sys: 0.003
>> time python e.py
40824
real: 0.052 | user: 0.040 | sys: 0.013

elong & e_long.py are versions with 10000 iterations:

>> time ./elong
40824
real: 0.113 | user: 0.107 | sys: 0.003
>> time python e_long.py 
40824
real: 2.642 | user: 2.586 | sys: 0.017

>> bcc
c=0.002
p=0.052
cl=0.113/10^4
pl=2.642/10^4
c/cl
176.99115044247787610619
p/pl
196.82059046177138531415
 
p/c
26.00000000000000000000
pl/cl
23.38053097345132743362

So, C is about 24 times faster, than python. Just as planned (;

Allan · 2008-05-16 03:30:48

Once you submit a solution, do you get more info on good ways to solve the problem? I was thinking of doing something like this while learning python (or something else that fits between bash and c++). It is difficult learning a new language without having some sort of feedback about your approach to solving the problem and I don't want to get into really bad habits early on.

Last edited by Allan (2008-05-16 03:31:44)

sunn · 2008-05-16 05:47:52

@Allan: You get access to the forum thread for that solution, in which people post their solutions to the problem in different languages.

Abelian · 2008-05-16 09:27:34

Sorry I did indeed mean problem 8.

elide: How did you extend the number to 10^6, did you just literally double the number?

Changed my code to this for just the 1000 digits:

 N = "73167176531330624919225119674426574742355349194934969835203127745063262395\
7831801698480186947885184385861560789112949495459501737958331952853208805511125\
4069874715852386305071569329096329522744304355766896648950445244523161731856403\
0987111217223831136222989342338030813533627661428280644448664523874930358907296\
2904915604407723907138105158593079608667017242712188399879790879227492190169972\
0888093776657273330010533678812202354218097512545405947522435258490771167055601\
3604839586446706324415722155397536978179778461740649551492908625693219784686224\
8283972241375657056057490261407972968652414535100474821663704844031998900088952\
4345065854122758866688116427171479924442928230863465674813919123162824586178664\
5835912456652947654568284891288314260769004224219022671055626321111109370544217\
5069416589604080719840385096245544436298123098787992724428490918884580156166097\
9191338754992005240636899125607176060588611646710940507754100225698315520005593\
572972571636269561882670428252483600823257530420752963450"

a = 0
b = 5
biggest_r = 0
while b < 1000:
    r = 1
    p = N[a:b]
    s = list(p)
    for p in s:
        p = int(p)
    r = r*p
    if r > biggest_r:
        biggest_r = r
    a = a + 1
    b = b + 1

print biggest_r

Gives me:

40824
real: 0.022 | user: 0.023 | sys: 0.000

So a slight improvement.

Last edited by Abelian (2008-05-16 09:28:02)

elide · 2008-05-16 12:22:34

did you just literally double the number

"literally doubling" gives you 2*10^3 != 10^6 (;

#Find the greatest product of five consecutive digits in the 1000 digit number:
N = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"

a = 0
b = 5
finalist=[]
p_old = 0
RC=10000       # !!!!!!
while RC > 0:  # !!!!!!
    RC -= 1    # !!!!!!
    while b < 1000:
        r = 1
        p = N[a:b]
        s = list(p)
        for p in s:
            p = int(p)
            r = r*p
            
        finalist.append(r)
        a = a + 1
        b = b + 1
    p_old = 0
    for p in finalist:
        if p > p_old:
            p_old = p
print p_old

Last edited by elide (2008-05-16 12:23:44)

Abelian · 2008-05-16 12:50:39

ah sorry i meant just repeating it again ie the 1000 digits then the same 1000 again which is 10^6.

Allan · 2008-05-16 13:45:08

Going to give my approach in psuedo code. You are re-multiplying the same numbers repeatedly.

val = prod(N[1:5])
max = val

for(i in 6:1000)
val = val / N[i-5] * N[i]
if(max < val)
    max = val

That reduces four * operations to one / and one *. From memory * and / are about equally expensive in integer arithmetic for most languages. It also works out the maximum as you go so there is no need to store stuff.

Last edited by Allan (2008-05-16 13:51:02)

elide · 2008-05-16 14:17:16

Allan, your code will fail with "/0" error (: just think about corner cases...
and if you make some workaround for that cases, then broken memory prefetch and cache misses will lead to a significant performance drop...

Allan · 2008-05-16 14:50:21

Yeah, I missed that. For some reason I thought that they were all 1-9's and scanning the first 58 numbers I was on the right track....

I was initially going to comment on the storage of each results which I saw as the most wasteful part but then that "moment of brilliance" hit me!

shining · 2008-05-16 19:06:37

Allan wrote:

Going to give my approach in psuedo code. You are re-multiplying the same numbers repeatedly.
val = prod(N[1:5])
max = val

for(i in 6:1000)
val = val / N[i-5] * N[i]
if(max < val)
    max = val
That reduces four * operations to one / and one *. From memory * and / are about equally expensive in integer arithmetic for most languages. It also works out the maximum as you go so there is no need to store stuff.

Ah, I am glad someone mentioned that
This is what I was trying to do, but the results are disappointing.
As elide mentioned, handling 0 require some additional work which seems to make my C prog even slower than elide's one.. Go figure.
Since it is much more uglier, more complicated, and slower, I can't even dare posting it

To elide : your code for iterating 10000 times (python version) is completely broken I hope you can see why.
You'll see that you will want to greatly reduce that number after fixing the problem
You should have been suspicious after seeing the very small time increase between 1 loop and 10 000.

tesjo · 2008-05-17 07:09:50

Well I had an idea but cant get it to run faster than the brute force method.But....
The idea is when you move from one five set number to the next you drop one on the left and pick one up on the right so if the one you pick up is larger than the one you drop, then calculate the product, else move on. If the long number is random you could expect half the product calculations, but you have to compare the right and left int whether you need the product or not. I am not great with programming theory as to whether this should be faster or not, but in my c++ code it wasn't. Calculating each product was just slightly faster, and I had to loop everything 10000 times to get an appreciable total time of about 0.6s.

elide · 2008-05-17 11:20:06

your code for iterating 10000 times (python version) is completely broken. I hope you can see why.

No, I can't. Can you explain?

You should have been suspicious after seeing the very small time increase between 1 loop and 10 000.

>> time python /dev/null
real: 0.037 | user: 0.024 | sys: 0.003

37ms with empty prog. "Very small time increase"? This is exactly what should be (;

Last edited by elide (2008-05-17 11:22:13)

shining · 2008-05-17 18:10:00

elide wrote:

your code for iterating 10000 times (python version) is completely broken. I hope you can see why.
No, I can't. Can you explain?

No, I can't.

nj · 2008-05-17 20:00:53

#include <stdio.h>

int N[] = { 7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9,2,2,5,1,1,9,6,7,4,4,2,6,5,7,4,7,4,2,3,5,5,3,4,9,1,9,4,9,3,4,9,6,9,8,3,5,2,0,3,1,2,7,7,4,5,0,6,3,2,6,2,3,9,5,7,8,3,1,8,0,1,6,9,8,4,8,0,1,8,6,9,4,7,8,8,5,1,8,4,3,8,5,8,6,1,5,6,0,7,8,9,1,1,2,9,4,9,4,9,5,4,5,9,5,0,1,7,3,7,9,5,8,3,3,1,9,5,2,8,5,3,2,0,8,8,0,5,5,1,1,1,2,5,4,0,6,9,8,7,4,7,1,5,8,5,2,3,8,6,3,0,5,0,7,1,5,6,9,3,2,9,0,9,6,3,2,9,5,2,2,7,4,4,3,0,4,3,5,5,7,6,6,8,9,6,6,4,8,9,5,0,4,4,5,2,4,4,5,2,3,1,6,1,7,3,1,8,5,6,4,0,3,0,9,8,7,1,1,1,2,1,7,2,2,3,8,3,1,1,3,6,2,2,2,9,8,9,3,4,2,3,3,8,0,3,0,8,1,3,5,3,3,6,2,7,6,6,1,4,2,8,2,8,0,6,4,4,4,4,8,6,6,4,5,2,3,8,7,4,9,3,0,3,5,8,9,0,7,2,9,6,2,9,0,4,9,1,5,6,0,4,4,0,7,7,2,3,9,0,7,1,3,8,1,0,5,1,5,8,5,9,3,0,7,9,6,0,8,6,6,7,0,1,7,2,4,2,7,1,2,1,8,8,3,9,9,8,7,9,7,9,0,8,7,9,2,2,7,4,9,2,1,9,0,1,6,9,9,7,2,0,8,8,8,0,9,3,7,7,6,6,5,7,2,7,3,3,3,0,0,1,0,5,3,3,6,7,8,8,1,2,2,0,2,3,5,4,2,1,8,0,9,7,5,1,2,5,4,5,4,0,5,9,4,7,5,2,2,4,3,5,2,5,8,4,9,0,7,7,1,1,6,7,0,5,5,6,0,1,3,6,0,4,8,3,9,5,8,6,4,4,6,7,0,6,3,2,4,4,1,5,7,2,2,1,5,5,3,9,7,5,3,6,9,7,8,1,7,9,7,7,8,4,6,1,7,4,0,6,4,9,5,5,1,4,9,2,9,0,8,6,2,5,6,9,3,2,1,9,7,8,4,6,8,6,2,2,4,8,2,8,3,9,7,2,2,4,1,3,7,5,6,5,7,0,5,6,0,5,7,4,9,0,2,6,1,4,0,7,9,7,2,9,6,8,6,5,2,4,1,4,5,3,5,1,0,0,4,7,4,8,2,1,6,6,3,7,0,4,8,4,4,0,3,1,9,9,8,9,0,0,0,8,8,9,5,2,4,3,4,5,0,6,5,8,5,4,1,2,2,7,5,8,8,6,6,6,8,8,1,1,6,4,2,7,1,7,1,4,7,9,9,2,4,4,4,2,9,2,8,2,3,0,8,6,3,4,6,5,6,7,4,8,1,3,9,1,9,1,2,3,1,6,2,8,2,4,5,8,6,1,7,8,6,6,4,5,8,3,5,9,1,2,4,5,6,6,5,2,9,4,7,6,5,4,5,6,8,2,8,4,8,9,1,2,8,8,3,1,4,2,6,0,7,6,9,0,0,4,2,2,4,2,1,9,0,2,2,6,7,1,0,5,5,6,2,6,3,2,1,1,1,1,1,0,9,3,7,0,5,4,4,2,1,7,5,0,6,9,4,1,6,5,8,9,6,0,4,0,8,0,7,1,9,8,4,0,3,8,5,0,9,6,2,4,5,5,4,4,4,3,6,2,9,8,1,2,3,0,9,8,7,8,7,9,9,2,7,2,4,4,2,8,4,9,0,9,1,8,8,8,4,5,8,0,1,5,6,1,6,6,0,9,7,9,1,9,1,3,3,8,7,5,4,9,9,2,0,0,5,2,4,0,6,3,6,8,9,9,1,2,5,6,0,7,1,7,6,0,6,0,5,8,8,6,1,1,6,4,6,7,1,0,9,4,0,5,0,7,7,5,4,1,0,0,2,2,5,6,9,8,3,1,5,5,2,0,0,0,5,5,9,3,5,7,2,9,7,2,5,7,1,6,3,6,2,6,9,5,6,1,8,8,2,6,7,0,4,2,8,2,5,2,4,8,3,6,0,0,8,2,3,2,5,7,5,3,0,4,2,0,7,5,2,9,6,3,4,5,0 };
#define LEN 1000

int main()
{
    int i = 0, j = 0;
    int r = 0, p = 0;
#ifdef OPTIMIZE
    int a;
    int n1;
    int n2;
    int n3;
    int n4;
    int n5;
    int n6;
    int n7;
    int n8;
    int n9;
    int n10;
#endif

    while (j < 10000)
    {
#ifdef OPTIMIZE
        for (i = 0; i < LEN-10; i+=6)
        {
            if (N[i+5] == 0)
            {
                r = N[i] * N[i+1] * N[i+2] * N[i+3] * N[i+4];
                if (p < r)
                    p = r;
                continue;
            }
            if (N[i+4] == 0)
            {
                --i;
                continue;
            }
            n1 = N[i+4];
            n2 = n1*N[i+3];
            n3 = n2*N[i+2];
            n4 = n3*N[i+1];
            n5 = n4*N[i];

            n6 = N[i+5];
            n7 = n6*N[i+6];
            n8 = n7*N[i+7];
            n9 = n8*N[i+8];
            n10 = n9*N[i+9];

            if (p < n5)
                p = n5;
            if (p < n10)
                p = n10;
            a = n6*n4;
            if (p < a)
                p = a;
            a = n7*n3;
            if (p < a)
                p = a;
            a = n8*n2;
            if (p < a)
                p = a;
            a = n9*n1;
            if (p < a)
                p = a;
        }
#endif
        while (i < LEN-5)
        {
#ifdef OPTIMIZE
            if (N[i+4] == 0)
            {
                i+=5;
                continue;
            }
#endif
            r = N[i] * N[i+1] * N[i+2] * N[i+3] * N[i+4];
            if (p < r)
                p = r;
            ++i;
        }
        i = 0;
        ++j;
    }

    printf("%d\n", p);

    return 0;
}

$ make
gcc -o pe2 -O2 pe2.c
gcc -DOPTIMIZE -o pe2o -O2 pe2.c

$ time ./pe2
40824
./pe2  0.11s user 0.00s system 99% cpu 0.107 total

$ time ./pe2o
40824
./pe2o  0.07s user 0.00s system 96% cpu 0.069 total

A decent speedup, at the expense of readability. Readability is overrated anyway.

scj · 2008-05-24 02:47:39

Haskell FTW!

import Char
n = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"
pr :: String -> [Int]
pr (a:b:c:d:e:xs) = foldl (\x y -> x * (ord y - 48)) 1 (a:b:c:d:e:[]) : pr (b:c:d:e:xs)
pr _ = []

main = print $ maximum $ pr n

$ time ./p09
40824
./p09  0.01s user 0.00s system 112% cpu 0.009 total

Abelian · 2008-05-24 07:38:51

I'm pretty certain using 112% of your CPU is cheating!

scj · 2008-05-24 08:07:55

Well, it's a PIII 1GHz, so I need it
But really, I have no idea what that is all about. Running it a couple of times it varies between 8% and 112%

joe · 2008-05-26 06:43:26

I thought I'd throw my hat into the ring. This will only work if you have a newer SSE4.1-enabled Intel (Penryn) processor:

#include <stdio.h>
#include <smmintrin.h>


int main()
{
  unsigned short int n[1000] __attribute__((aligned(16))) = { 7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9,2,2,5,1,1,9,6,7,4,4,2,6,5,7,4,7,4,2,3,5,5,3,4,9,1,9,4,9,3,4,9,6,9,8,3,5,2,0,3,1,2,7,7,4,5,0,6,3,2,6,2,3,9,5,7,8,3,1,8,0,1,6,9,8,4,8,0,1,8,6,9,4,7,8,8,5,1,8,4,3,8,5,8,6,1,5,6,0,7,8,9,1,1,2,9,4,9,4,9,5,4,5,9,5,0,1,7,3,7,9,5,8,3,3,1,9,5,2,8,5,3,2,0,8,8,0,5,5,1,1,1,2,5,4,0,6,9,8,7,4,7,1,5,8,5,2,3,8,6,3,0,5,0,7,1,5,6,9,3,2,9,0,9,6,3,2,9,5,2,2,7,4,4,3,0,4,3,5,5,7,6,6,8,9,6,6,4,8,9,5,0,4,4,5,2,4,4,5,2,3,1,6,1,7,3,1,8,5,6,4,0,3,0,9,8,7,1,1,1,2,1,7,2,2,3,8,3,1,1,3,6,2,2,2,9,8,9,3,4,2,3,3,8,0,3,0,8,1,3,5,3,3,6,2,7,6,6,1,4,2,8,2,8,0,6,4,4,4,4,8,6,6,4,5,2,3,8,7,4,9,3,0,3,5,8,9,0,7,2,9,6,2,9,0,4,9,1,5,6,0,4,4,0,7,7,2,3,9,0,7,1,3,8,1,0,5,1,5,8,5,9,3,0,7,9,6,0,8,6,6,7,0,1,7,2,4,2,7,1,2,1,8,8,3,9,9,8,7,9,7,9,0,8,7,9,2,2,7,4,9,2,1,9,0,1,6,9,9,7,2,0,8,8,8,0,9,3,7,7,6,6,5,7,2,7,3,3,3,0,0,1,0,5,3,3,6,7,8,8,1,2,2,0,2,3,5,4,2,1,8,0,9,7,5,1,2,5,4,5,4,0,5,9,4,7,5,2,2,4,3,5,2,5,8,4,9,0,7,7,1,1,6,7,0,5,5,6,0,1,3,6,0,4,8,3,9,5,8,6,4,4,6,7,0,6,3,2,4,4,1,5,7,2,2,1,5,5,3,9,7,5,3,6,9,7,8,1,7,9,7,7,8,4,6,1,7,4,0,6,4,9,5,5,1,4,9,2,9,0,8,6,2,5,6,9,3,2,1,9,7,8,4,6,8,6,2,2,4,8,2,8,3,9,7,2,2,4,1,3,7,5,6,5,7,0,5,6,0,5,7,4,9,0,2,6,1,4,0,7,9,7,2,9,6,8,6,5,2,4,1,4,5,3,5,1,0,0,4,7,4,8,2,1,6,6,3,7,0,4,8,4,4,0,3,1,9,9,8,9,0,0,0,8,8,9,5,2,4,3,4,5,0,6,5,8,5,4,1,2,2,7,5,8,8,6,6,6,8,8,1,1,6,4,2,7,1,7,1,4,7,9,9,2,4,4,4,2,9,2,8,2,3,0,8,6,3,4,6,5,6,7,4,8,1,3,9,1,9,1,2,3,1,6,2,8,2,4,5,8,6,1,7,8,6,6,4,5,8,3,5,9,1,2,4,5,6,6,5,2,9,4,7,6,5,4,5,6,8,2,8,4,8,9,1,2,8,8,3,1,4,2,6,0,7,6,9,0,0,4,2,2,4,2,1,9,0,2,2,6,7,1,0,5,5,6,2,6,3,2,1,1,1,1,1,0,9,3,7,0,5,4,4,2,1,7,5,0,6,9,4,1,6,5,8,9,6,0,4,0,8,0,7,1,9,8,4,0,3,8,5,0,9,6,2,4,5,5,4,4,4,3,6,2,9,8,1,2,3,0,9,8,7,8,7,9,9,2,7,2,4,4,2,8,4,9,0,9,1,8,8,8,4,5,8,0,1,5,6,1,6,6,0,9,7,9,1,9,1,3,3,8,7,5,4,9,9,2,0,0,5,2,4,0,6,3,6,8,9,9,1,2,5,6,0,7,1,7,6,0,6,0,5,8,8,6,1,1,6,4,6,7,1,0,9,4,0,5,0,7,7,5,4,1,0,0,2,2,5,6,9,8,3,1,5,5,2,0,0,0,5,5,9,3,5,7,2,9,7,2,5,7,1,6,3,6,2,6,9,5,6,1,8,8,2,6,7,0,4,2,8,2,5,2,4,8,3,6,0,0,8,2,3,2,5,7,5,3,0,4,2,0,7,5,2,9,6,3,4,5,0 };
  __m128i max = {0LL, 0LL};
  unsigned short int *N = n;

  /* while there is more data */
  while((N + 8) < (n + 1000))
  {
    /* Load eight values into 128-bit data */
    __m128i first  = _mm_loadu_si128((__m128i *) (N + 0));
    __m128i second = _mm_loadu_si128((__m128i *) (N + 1));
    __m128i third  = _mm_loadu_si128((__m128i *) (N + 2));
    __m128i fourth = _mm_loadu_si128((__m128i *) (N + 3));
    __m128i fifth  = _mm_loadu_si128((__m128i *) (N + 4));

    __m128i res1 = _mm_mullo_epi16(first, second);
    __m128i res2 = _mm_mullo_epi16(third, fourth);
    __m128i res3 = _mm_mullo_epi16(fifth, res1);
    __m128i res4 = _mm_mullo_epi16(res3, res2);

    /* Conditional word move into max */
    max = _mm_max_epu16(max, res4);

    first  = _mm_srli_epi32(first,  16);
    second = _mm_srli_epi32(second, 16);
    third  = _mm_srli_epi32(third,  16);
    fourth = _mm_srli_epi32(fourth, 16);
    fifth  = _mm_srli_epi32(fifth,  16);

    /* multiply again and store conditionally */

    res1 = _mm_mullo_epi16(first, second);
    res2 = _mm_mullo_epi16(third, fourth);
    res3 = _mm_mullo_epi16(fifth, res1);
    res4 = _mm_mullo_epi16(res3, res2);

    max = _mm_max_epu16(max, res4);

    N += 8;
  }

  /* find largest 16-bit value in max and print */
  unsigned short int *result   = (unsigned short int *)(&max);
  unsigned short int final_max = 0;
  unsigned short int i         = 0;
  for(i = 0; i < 8; ++i) final_max = (result[i] > final_max) ? result[i] : final_max;
  printf("%u\n", final_max);

  return 1;
}

And the results were fast:

time ./newp
40824

real    0m0.001s
user    0m0.000s
sys     0m0.000s

Last edited by joe (2008-05-26 06:44:59)

greenpenguin · 2008-06-02 20:07:21

Really basic solution I wrote ages ago (ruby):

data = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"


dhol = 0
pointer = 0
while data[pointer..-1].length > 4
    ndata = 1
    5.times do |n|
        ndata *= data[n+pointer].chr.to_i
    end
    dhol = ndata if ndata > dhol
    break if dhol >= 59049
    pointer+=1
end

puts dhol

Fairly fast:

time { ruby p8.rb ; }
40824

real    0m0.021s
user    0m0.017s
sys    0m0.003s

By the way, for anyone <19 in the UK, try the BIO if you haven't already, fun competition, rather similar to project Euler but as a proper competition. (plus having /two/ Arch users next time would be awesome...)

nj · 2008-06-03 02:00:23

joe wrote:

I thought I'd throw my hat into the ring. This will only work if you have a newer SSE4.1-enabled Intel (Penryn)

Hardcore If you get some time, could you loop that 10000 times and compare it to my program above? I would be interested in seeing what kind of speedup that gets.

joe · 2008-06-03 03:00:24

nj wrote:

joe wrote:
I thought I'd throw my hat into the ring. This will only work if you have a newer SSE4.1-enabled Intel (Penryn)
Hardcore If you get some time, could you loop that 10000 times and compare it to my program above? I would be interested in seeing what kind of speedup that gets.

Sure, I threw a while loop around the 'meat' of the program and had it run 10,000 times. Using

gcc -march=core2 -msse4 -o newp newp.c

I got :

 time ./newp
40824

real    0m0.076s
user    0m0.077s
sys     0m0.000s

Using basic gcc optimizations

gcc -O4 -march=core2 -msse4 -o newp newp.c

I get:

time ./newp
40824

real    0m0.014s
user    0m0.013s
sys     0m0.000s

Holy crap! That's pretty impressive. I was actually surprised at that speed up. I guess the biggest bottleneck is the original read from memory. The processor in my system has enough cache that subsequent loops are likely not causing any cache misses.

And to think, if this workload were big enough, it would also benefit from threading. It's small enough that right now spawning threads would take longer than the workload needs.

Lesson learned: SSE optimizations are worth learning.

Also, you could easily modify this to run on an older processor. I think the only instruction that requires SSE4 is _mm_max_epu16, which does a comparison of 16-bit values over a 128-bit integer type, saving the max value for each 16-bits. So it's really doing 8 comparisons at once.

Last edited by joe (2008-06-03 03:04:50)

joe · 2008-06-03 03:45:47

I guess I didn't read nj's request well enough. It would make sense to compare these on the same machine

Here we go all have been modified to run 10000 iterations:

time ruby greenpenguin.rb 
40824

real    1m32.702s
user    1m22.635s
sys     0m9.716s

I tweaked scj's from pr :: String -> [Int] to pr :: [Int] -> [Int]

 time ./scj
40824

real    0m0.128s
user    0m0.127s
sys     0m0.003s

 gcc -DOPTIMIZE -o nj -O4 nj.c
 time ./nj
40824

real    0m0.029s
user    0m0.030s
sys     0m0.000s

time python Abelian.py 
40824

real    0m1.832s
user    0m1.480s
sys     0m0.007s

Arch Linux

#1 2008-05-15 21:52:05

Project Euler

#2 2008-05-15 23:35:08

Re: Project Euler

#3 2008-05-16 00:46:33

Re: Project Euler

#4 2008-05-16 00:49:45

Re: Project Euler

#5 2008-05-16 03:30:48

Re: Project Euler

#6 2008-05-16 05:47:52

Re: Project Euler

#7 2008-05-16 09:27:34

Re: Project Euler

#8 2008-05-16 12:22:34

Re: Project Euler

#9 2008-05-16 12:50:39

Re: Project Euler

#10 2008-05-16 13:45:08

Re: Project Euler

#11 2008-05-16 14:17:16

Re: Project Euler

#12 2008-05-16 14:50:21

Re: Project Euler

#13 2008-05-16 19:06:37

Re: Project Euler

#14 2008-05-17 07:09:50

Re: Project Euler

#15 2008-05-17 11:20:06

Re: Project Euler

#16 2008-05-17 18:10:00

Re: Project Euler

#17 2008-05-17 20:00:53

Re: Project Euler

#18 2008-05-24 02:47:39

Re: Project Euler

#19 2008-05-24 07:38:51

Re: Project Euler

#20 2008-05-24 08:07:55

Re: Project Euler

#21 2008-05-26 06:43:26

Re: Project Euler

#22 2008-06-02 20:07:21

Re: Project Euler

#23 2008-06-03 02:00:23

Re: Project Euler

#24 2008-06-03 03:00:24

Re: Project Euler

#25 2008-06-03 03:45:47

Re: Project Euler

Board footer