You are not logged in.
Anyone on here working their way through project euler (www.projecteuler.net)?
Project Euler is series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.
I'm finding it a great way of learning how to program. It is satisfying though when you see you don't need a computer to solve one of them though!
Anyway this is problem 9 and my solution in python:
#Find the greatest product of five consecutive digits in the 1000 digit number:
N = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"
a = 0
b = 5
finalist=[]
while b < 1000:
r = 1
p = N[a:b]
s = list(p)
for p in s:
p = int(p)
r = r*p
finalist.append(r)
a = a + 1
b = b + 1
p_old = 0
for p in finalist:
if p > p_old:
p_old = p
print p_old
Executing this and timing it gives this:
40824
real 0m0.032s
user 0m0.023s
sys 0m0.003s
The question is, how fast can we complete this problem? Any language goes. How fast can you make it
Offline
Hi,
I am playing with Project Euler, too (have the first 30 Problems solved). Really great site for learning to handle new languages and practicing implementation of algorithms.
You are talking about problem No. 8, don't you?
Offline
#include <stdio.h>
char *N = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450";
#define I(x) (*(x)-'0')
int main(){
int max = 0, prod;
char *n = N+3;
while(*++n)
if( (prod = I(n-4) * I(n-3) * I(n-2) * I(n-1) * I(n)) > max )
max = prod;
printf("%d\n", max);
return 0;
}
1000 digit number is a very small task. Time needed to complete computation itself is negligibly small compared with the time needed to process creation, program loading, linking, etc.
So, if you want to compare execution times, you should extend number to 10^6 digits at least, or wrap you code in a loop.
e & e.py are short versions:
>> export TIMEFORMAT='real: %3R | user: %3U | sys: %3S'
>> time ./e
40824
real: 0.002 | user: 0.000 | sys: 0.003
>> time python e.py
40824
real: 0.052 | user: 0.040 | sys: 0.013
elong & e_long.py are versions with 10000 iterations:
>> time ./elong
40824
real: 0.113 | user: 0.107 | sys: 0.003
>> time python e_long.py
40824
real: 2.642 | user: 2.586 | sys: 0.017
>> bcc
c=0.002
p=0.052
cl=0.113/10^4
pl=2.642/10^4
c/cl
176.99115044247787610619
p/pl
196.82059046177138531415
p/c
26.00000000000000000000
pl/cl
23.38053097345132743362
So, C is about 24 times faster, than python. Just as planned (;
Offline
Once you submit a solution, do you get more info on good ways to solve the problem? I was thinking of doing something like this while learning python (or something else that fits between bash and c++). It is difficult learning a new language without having some sort of feedback about your approach to solving the problem and I don't want to get into really bad habits early on.
Last edited by Allan (2008-05-16 03:31:44)
Offline
@Allan: You get access to the forum thread for that solution, in which people post their solutions to the problem in different languages.
Offline
Sorry I did indeed mean problem 8.
elide: How did you extend the number to 10^6, did you just literally double the number?
Changed my code to this for just the 1000 digits:
N = "73167176531330624919225119674426574742355349194934969835203127745063262395\
7831801698480186947885184385861560789112949495459501737958331952853208805511125\
4069874715852386305071569329096329522744304355766896648950445244523161731856403\
0987111217223831136222989342338030813533627661428280644448664523874930358907296\
2904915604407723907138105158593079608667017242712188399879790879227492190169972\
0888093776657273330010533678812202354218097512545405947522435258490771167055601\
3604839586446706324415722155397536978179778461740649551492908625693219784686224\
8283972241375657056057490261407972968652414535100474821663704844031998900088952\
4345065854122758866688116427171479924442928230863465674813919123162824586178664\
5835912456652947654568284891288314260769004224219022671055626321111109370544217\
5069416589604080719840385096245544436298123098787992724428490918884580156166097\
9191338754992005240636899125607176060588611646710940507754100225698315520005593\
572972571636269561882670428252483600823257530420752963450"
a = 0
b = 5
biggest_r = 0
while b < 1000:
r = 1
p = N[a:b]
s = list(p)
for p in s:
p = int(p)
r = r*p
if r > biggest_r:
biggest_r = r
a = a + 1
b = b + 1
print biggest_r
Gives me:
40824
real: 0.022 | user: 0.023 | sys: 0.000
So a slight improvement.
Last edited by Abelian (2008-05-16 09:28:02)
Offline
did you just literally double the number
"literally doubling" gives you 2*10^3 != 10^6 (;
#Find the greatest product of five consecutive digits in the 1000 digit number:
N = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"
a = 0
b = 5
finalist=[]
p_old = 0
RC=10000 # !!!!!!
while RC > 0: # !!!!!!
RC -= 1 # !!!!!!
while b < 1000:
r = 1
p = N[a:b]
s = list(p)
for p in s:
p = int(p)
r = r*p
finalist.append(r)
a = a + 1
b = b + 1
p_old = 0
for p in finalist:
if p > p_old:
p_old = p
print p_old
Last edited by elide (2008-05-16 12:23:44)
Offline
ah sorry i meant just repeating it again ie the 1000 digits then the same 1000 again which is 10^6.
Offline
Going to give my approach in psuedo code. You are re-multiplying the same numbers repeatedly.
val = prod(N[1:5])
max = val
for(i in 6:1000)
val = val / N[i-5] * N[i]
if(max < val)
max = val
That reduces four * operations to one / and one *. From memory * and / are about equally expensive in integer arithmetic for most languages. It also works out the maximum as you go so there is no need to store stuff.
Last edited by Allan (2008-05-16 13:51:02)
Offline
Allan, your code will fail with "/0" error (: just think about corner cases...
and if you make some workaround for that cases, then broken memory prefetch and cache misses will lead to a significant performance drop...
Offline
Yeah, I missed that. For some reason I thought that they were all 1-9's and scanning the first 58 numbers I was on the right track....
I was initially going to comment on the storage of each results which I saw as the most wasteful part but then that "moment of brilliance" hit me!
Offline
Going to give my approach in psuedo code. You are re-multiplying the same numbers repeatedly.
val = prod(N[1:5]) max = val for(i in 6:1000) val = val / N[i-5] * N[i] if(max < val) max = val
That reduces four * operations to one / and one *. From memory * and / are about equally expensive in integer arithmetic for most languages. It also works out the maximum as you go so there is no need to store stuff.
Ah, I am glad someone mentioned that
This is what I was trying to do, but the results are disappointing.
As elide mentioned, handling 0 require some additional work which seems to make my C prog even slower than elide's one.. Go figure.
Since it is much more uglier, more complicated, and slower, I can't even dare posting it
To elide : your code for iterating 10000 times (python version) is completely broken I hope you can see why.
You'll see that you will want to greatly reduce that number after fixing the problem
You should have been suspicious after seeing the very small time increase between 1 loop and 10 000.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
Well I had an idea but cant get it to run faster than the brute force method.But....
The idea is when you move from one five set number to the next you drop one on the left and pick one up on the right so if the one you pick up is larger than the one you drop, then calculate the product, else move on. If the long number is random you could expect half the product calculations, but you have to compare the right and left int whether you need the product or not. I am not great with programming theory as to whether this should be faster or not, but in my c++ code it wasn't. Calculating each product was just slightly faster, and I had to loop everything 10000 times to get an appreciable total time of about 0.6s.
Offline
your code for iterating 10000 times (python version) is completely broken. I hope you can see why.
No, I can't. Can you explain?
You should have been suspicious after seeing the very small time increase between 1 loop and 10 000.
>> time python /dev/null
real: 0.037 | user: 0.024 | sys: 0.003
37ms with empty prog. "Very small time increase"? This is exactly what should be (;
Last edited by elide (2008-05-17 11:22:13)
Offline
your code for iterating 10000 times (python version) is completely broken. I hope you can see why.
No, I can't. Can you explain?
No, I can't.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
#include <stdio.h>
int N[] = { 7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9,2,2,5,1,1,9,6,7,4,4,2,6,5,7,4,7,4,2,3,5,5,3,4,9,1,9,4,9,3,4,9,6,9,8,3,5,2,0,3,1,2,7,7,4,5,0,6,3,2,6,2,3,9,5,7,8,3,1,8,0,1,6,9,8,4,8,0,1,8,6,9,4,7,8,8,5,1,8,4,3,8,5,8,6,1,5,6,0,7,8,9,1,1,2,9,4,9,4,9,5,4,5,9,5,0,1,7,3,7,9,5,8,3,3,1,9,5,2,8,5,3,2,0,8,8,0,5,5,1,1,1,2,5,4,0,6,9,8,7,4,7,1,5,8,5,2,3,8,6,3,0,5,0,7,1,5,6,9,3,2,9,0,9,6,3,2,9,5,2,2,7,4,4,3,0,4,3,5,5,7,6,6,8,9,6,6,4,8,9,5,0,4,4,5,2,4,4,5,2,3,1,6,1,7,3,1,8,5,6,4,0,3,0,9,8,7,1,1,1,2,1,7,2,2,3,8,3,1,1,3,6,2,2,2,9,8,9,3,4,2,3,3,8,0,3,0,8,1,3,5,3,3,6,2,7,6,6,1,4,2,8,2,8,0,6,4,4,4,4,8,6,6,4,5,2,3,8,7,4,9,3,0,3,5,8,9,0,7,2,9,6,2,9,0,4,9,1,5,6,0,4,4,0,7,7,2,3,9,0,7,1,3,8,1,0,5,1,5,8,5,9,3,0,7,9,6,0,8,6,6,7,0,1,7,2,4,2,7,1,2,1,8,8,3,9,9,8,7,9,7,9,0,8,7,9,2,2,7,4,9,2,1,9,0,1,6,9,9,7,2,0,8,8,8,0,9,3,7,7,6,6,5,7,2,7,3,3,3,0,0,1,0,5,3,3,6,7,8,8,1,2,2,0,2,3,5,4,2,1,8,0,9,7,5,1,2,5,4,5,4,0,5,9,4,7,5,2,2,4,3,5,2,5,8,4,9,0,7,7,1,1,6,7,0,5,5,6,0,1,3,6,0,4,8,3,9,5,8,6,4,4,6,7,0,6,3,2,4,4,1,5,7,2,2,1,5,5,3,9,7,5,3,6,9,7,8,1,7,9,7,7,8,4,6,1,7,4,0,6,4,9,5,5,1,4,9,2,9,0,8,6,2,5,6,9,3,2,1,9,7,8,4,6,8,6,2,2,4,8,2,8,3,9,7,2,2,4,1,3,7,5,6,5,7,0,5,6,0,5,7,4,9,0,2,6,1,4,0,7,9,7,2,9,6,8,6,5,2,4,1,4,5,3,5,1,0,0,4,7,4,8,2,1,6,6,3,7,0,4,8,4,4,0,3,1,9,9,8,9,0,0,0,8,8,9,5,2,4,3,4,5,0,6,5,8,5,4,1,2,2,7,5,8,8,6,6,6,8,8,1,1,6,4,2,7,1,7,1,4,7,9,9,2,4,4,4,2,9,2,8,2,3,0,8,6,3,4,6,5,6,7,4,8,1,3,9,1,9,1,2,3,1,6,2,8,2,4,5,8,6,1,7,8,6,6,4,5,8,3,5,9,1,2,4,5,6,6,5,2,9,4,7,6,5,4,5,6,8,2,8,4,8,9,1,2,8,8,3,1,4,2,6,0,7,6,9,0,0,4,2,2,4,2,1,9,0,2,2,6,7,1,0,5,5,6,2,6,3,2,1,1,1,1,1,0,9,3,7,0,5,4,4,2,1,7,5,0,6,9,4,1,6,5,8,9,6,0,4,0,8,0,7,1,9,8,4,0,3,8,5,0,9,6,2,4,5,5,4,4,4,3,6,2,9,8,1,2,3,0,9,8,7,8,7,9,9,2,7,2,4,4,2,8,4,9,0,9,1,8,8,8,4,5,8,0,1,5,6,1,6,6,0,9,7,9,1,9,1,3,3,8,7,5,4,9,9,2,0,0,5,2,4,0,6,3,6,8,9,9,1,2,5,6,0,7,1,7,6,0,6,0,5,8,8,6,1,1,6,4,6,7,1,0,9,4,0,5,0,7,7,5,4,1,0,0,2,2,5,6,9,8,3,1,5,5,2,0,0,0,5,5,9,3,5,7,2,9,7,2,5,7,1,6,3,6,2,6,9,5,6,1,8,8,2,6,7,0,4,2,8,2,5,2,4,8,3,6,0,0,8,2,3,2,5,7,5,3,0,4,2,0,7,5,2,9,6,3,4,5,0 };
#define LEN 1000
int main()
{
int i = 0, j = 0;
int r = 0, p = 0;
#ifdef OPTIMIZE
int a;
int n1;
int n2;
int n3;
int n4;
int n5;
int n6;
int n7;
int n8;
int n9;
int n10;
#endif
while (j < 10000)
{
#ifdef OPTIMIZE
for (i = 0; i < LEN-10; i+=6)
{
if (N[i+5] == 0)
{
r = N[i] * N[i+1] * N[i+2] * N[i+3] * N[i+4];
if (p < r)
p = r;
continue;
}
if (N[i+4] == 0)
{
--i;
continue;
}
n1 = N[i+4];
n2 = n1*N[i+3];
n3 = n2*N[i+2];
n4 = n3*N[i+1];
n5 = n4*N[i];
n6 = N[i+5];
n7 = n6*N[i+6];
n8 = n7*N[i+7];
n9 = n8*N[i+8];
n10 = n9*N[i+9];
if (p < n5)
p = n5;
if (p < n10)
p = n10;
a = n6*n4;
if (p < a)
p = a;
a = n7*n3;
if (p < a)
p = a;
a = n8*n2;
if (p < a)
p = a;
a = n9*n1;
if (p < a)
p = a;
}
#endif
while (i < LEN-5)
{
#ifdef OPTIMIZE
if (N[i+4] == 0)
{
i+=5;
continue;
}
#endif
r = N[i] * N[i+1] * N[i+2] * N[i+3] * N[i+4];
if (p < r)
p = r;
++i;
}
i = 0;
++j;
}
printf("%d\n", p);
return 0;
}
$ make
gcc -o pe2 -O2 pe2.c
gcc -DOPTIMIZE -o pe2o -O2 pe2.c
$ time ./pe2
40824
./pe2 0.11s user 0.00s system 99% cpu 0.107 total
$ time ./pe2o
40824
./pe2o 0.07s user 0.00s system 96% cpu 0.069 total
A decent speedup, at the expense of readability. Readability is overrated anyway.
Offline
Haskell FTW!
import Char
n = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"
pr :: String -> [Int]
pr (a:b:c:d:e:xs) = foldl (\x y -> x * (ord y - 48)) 1 (a:b:c:d:e:[]) : pr (b:c:d:e:xs)
pr _ = []
main = print $ maximum $ pr n
$ time ./p09
40824
./p09 0.01s user 0.00s system 112% cpu 0.009 total
Offline
I'm pretty certain using 112% of your CPU is cheating!
Offline
Well, it's a PIII 1GHz, so I need it
But really, I have no idea what that is all about. Running it a couple of times it varies between 8% and 112%
Offline
I thought I'd throw my hat into the ring. This will only work if you have a newer SSE4.1-enabled Intel (Penryn) processor:
#include <stdio.h>
#include <smmintrin.h>
int main()
{
unsigned short int n[1000] __attribute__((aligned(16))) = { 7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9,2,2,5,1,1,9,6,7,4,4,2,6,5,7,4,7,4,2,3,5,5,3,4,9,1,9,4,9,3,4,9,6,9,8,3,5,2,0,3,1,2,7,7,4,5,0,6,3,2,6,2,3,9,5,7,8,3,1,8,0,1,6,9,8,4,8,0,1,8,6,9,4,7,8,8,5,1,8,4,3,8,5,8,6,1,5,6,0,7,8,9,1,1,2,9,4,9,4,9,5,4,5,9,5,0,1,7,3,7,9,5,8,3,3,1,9,5,2,8,5,3,2,0,8,8,0,5,5,1,1,1,2,5,4,0,6,9,8,7,4,7,1,5,8,5,2,3,8,6,3,0,5,0,7,1,5,6,9,3,2,9,0,9,6,3,2,9,5,2,2,7,4,4,3,0,4,3,5,5,7,6,6,8,9,6,6,4,8,9,5,0,4,4,5,2,4,4,5,2,3,1,6,1,7,3,1,8,5,6,4,0,3,0,9,8,7,1,1,1,2,1,7,2,2,3,8,3,1,1,3,6,2,2,2,9,8,9,3,4,2,3,3,8,0,3,0,8,1,3,5,3,3,6,2,7,6,6,1,4,2,8,2,8,0,6,4,4,4,4,8,6,6,4,5,2,3,8,7,4,9,3,0,3,5,8,9,0,7,2,9,6,2,9,0,4,9,1,5,6,0,4,4,0,7,7,2,3,9,0,7,1,3,8,1,0,5,1,5,8,5,9,3,0,7,9,6,0,8,6,6,7,0,1,7,2,4,2,7,1,2,1,8,8,3,9,9,8,7,9,7,9,0,8,7,9,2,2,7,4,9,2,1,9,0,1,6,9,9,7,2,0,8,8,8,0,9,3,7,7,6,6,5,7,2,7,3,3,3,0,0,1,0,5,3,3,6,7,8,8,1,2,2,0,2,3,5,4,2,1,8,0,9,7,5,1,2,5,4,5,4,0,5,9,4,7,5,2,2,4,3,5,2,5,8,4,9,0,7,7,1,1,6,7,0,5,5,6,0,1,3,6,0,4,8,3,9,5,8,6,4,4,6,7,0,6,3,2,4,4,1,5,7,2,2,1,5,5,3,9,7,5,3,6,9,7,8,1,7,9,7,7,8,4,6,1,7,4,0,6,4,9,5,5,1,4,9,2,9,0,8,6,2,5,6,9,3,2,1,9,7,8,4,6,8,6,2,2,4,8,2,8,3,9,7,2,2,4,1,3,7,5,6,5,7,0,5,6,0,5,7,4,9,0,2,6,1,4,0,7,9,7,2,9,6,8,6,5,2,4,1,4,5,3,5,1,0,0,4,7,4,8,2,1,6,6,3,7,0,4,8,4,4,0,3,1,9,9,8,9,0,0,0,8,8,9,5,2,4,3,4,5,0,6,5,8,5,4,1,2,2,7,5,8,8,6,6,6,8,8,1,1,6,4,2,7,1,7,1,4,7,9,9,2,4,4,4,2,9,2,8,2,3,0,8,6,3,4,6,5,6,7,4,8,1,3,9,1,9,1,2,3,1,6,2,8,2,4,5,8,6,1,7,8,6,6,4,5,8,3,5,9,1,2,4,5,6,6,5,2,9,4,7,6,5,4,5,6,8,2,8,4,8,9,1,2,8,8,3,1,4,2,6,0,7,6,9,0,0,4,2,2,4,2,1,9,0,2,2,6,7,1,0,5,5,6,2,6,3,2,1,1,1,1,1,0,9,3,7,0,5,4,4,2,1,7,5,0,6,9,4,1,6,5,8,9,6,0,4,0,8,0,7,1,9,8,4,0,3,8,5,0,9,6,2,4,5,5,4,4,4,3,6,2,9,8,1,2,3,0,9,8,7,8,7,9,9,2,7,2,4,4,2,8,4,9,0,9,1,8,8,8,4,5,8,0,1,5,6,1,6,6,0,9,7,9,1,9,1,3,3,8,7,5,4,9,9,2,0,0,5,2,4,0,6,3,6,8,9,9,1,2,5,6,0,7,1,7,6,0,6,0,5,8,8,6,1,1,6,4,6,7,1,0,9,4,0,5,0,7,7,5,4,1,0,0,2,2,5,6,9,8,3,1,5,5,2,0,0,0,5,5,9,3,5,7,2,9,7,2,5,7,1,6,3,6,2,6,9,5,6,1,8,8,2,6,7,0,4,2,8,2,5,2,4,8,3,6,0,0,8,2,3,2,5,7,5,3,0,4,2,0,7,5,2,9,6,3,4,5,0 };
__m128i max = {0LL, 0LL};
unsigned short int *N = n;
/* while there is more data */
while((N + 8) < (n + 1000))
{
/* Load eight values into 128-bit data */
__m128i first = _mm_loadu_si128((__m128i *) (N + 0));
__m128i second = _mm_loadu_si128((__m128i *) (N + 1));
__m128i third = _mm_loadu_si128((__m128i *) (N + 2));
__m128i fourth = _mm_loadu_si128((__m128i *) (N + 3));
__m128i fifth = _mm_loadu_si128((__m128i *) (N + 4));
__m128i res1 = _mm_mullo_epi16(first, second);
__m128i res2 = _mm_mullo_epi16(third, fourth);
__m128i res3 = _mm_mullo_epi16(fifth, res1);
__m128i res4 = _mm_mullo_epi16(res3, res2);
/* Conditional word move into max */
max = _mm_max_epu16(max, res4);
first = _mm_srli_epi32(first, 16);
second = _mm_srli_epi32(second, 16);
third = _mm_srli_epi32(third, 16);
fourth = _mm_srli_epi32(fourth, 16);
fifth = _mm_srli_epi32(fifth, 16);
/* multiply again and store conditionally */
res1 = _mm_mullo_epi16(first, second);
res2 = _mm_mullo_epi16(third, fourth);
res3 = _mm_mullo_epi16(fifth, res1);
res4 = _mm_mullo_epi16(res3, res2);
max = _mm_max_epu16(max, res4);
N += 8;
}
/* find largest 16-bit value in max and print */
unsigned short int *result = (unsigned short int *)(&max);
unsigned short int final_max = 0;
unsigned short int i = 0;
for(i = 0; i < 8; ++i) final_max = (result[i] > final_max) ? result[i] : final_max;
printf("%u\n", final_max);
return 1;
}
And the results were fast:
time ./newp
40824
real 0m0.001s
user 0m0.000s
sys 0m0.000s
Last edited by joe (2008-05-26 06:44:59)
Offline
Really basic solution I wrote ages ago (ruby):
data = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"
dhol = 0
pointer = 0
while data[pointer..-1].length > 4
ndata = 1
5.times do |n|
ndata *= data[n+pointer].chr.to_i
end
dhol = ndata if ndata > dhol
break if dhol >= 59049
pointer+=1
end
puts dhol
Fairly fast:
time { ruby p8.rb ; }
40824
real 0m0.021s
user 0m0.017s
sys 0m0.003s
By the way, for anyone <19 in the UK, try the BIO if you haven't already, fun competition, rather similar to project Euler but as a proper competition. (plus having /two/ Arch users next time would be awesome...)
Offline
I thought I'd throw my hat into the ring. This will only work if you have a newer SSE4.1-enabled Intel (Penryn)
Hardcore If you get some time, could you loop that 10000 times and compare it to my program above? I would be interested in seeing what kind of speedup that gets.
Offline
joe wrote:I thought I'd throw my hat into the ring. This will only work if you have a newer SSE4.1-enabled Intel (Penryn)
Hardcore If you get some time, could you loop that 10000 times and compare it to my program above? I would be interested in seeing what kind of speedup that gets.
Sure, I threw a while loop around the 'meat' of the program and had it run 10,000 times. Using
gcc -march=core2 -msse4 -o newp newp.c
I got :
time ./newp
40824
real 0m0.076s
user 0m0.077s
sys 0m0.000s
Using basic gcc optimizations
gcc -O4 -march=core2 -msse4 -o newp newp.c
I get:
time ./newp
40824
real 0m0.014s
user 0m0.013s
sys 0m0.000s
Holy crap! That's pretty impressive. I was actually surprised at that speed up. I guess the biggest bottleneck is the original read from memory. The processor in my system has enough cache that subsequent loops are likely not causing any cache misses.
And to think, if this workload were big enough, it would also benefit from threading. It's small enough that right now spawning threads would take longer than the workload needs.
Lesson learned: SSE optimizations are worth learning.
Also, you could easily modify this to run on an older processor. I think the only instruction that requires SSE4 is _mm_max_epu16, which does a comparison of 16-bit values over a 128-bit integer type, saving the max value for each 16-bits. So it's really doing 8 comparisons at once.
Last edited by joe (2008-06-03 03:04:50)
Offline
I guess I didn't read nj's request well enough. It would make sense to compare these on the same machine
Here we go all have been modified to run 10000 iterations:
time ruby greenpenguin.rb
40824
real 1m32.702s
user 1m22.635s
sys 0m9.716s
I tweaked scj's from pr :: String -> [Int] to pr :: [Int] -> [Int]
time ./scj
40824
real 0m0.128s
user 0m0.127s
sys 0m0.003s
gcc -DOPTIMIZE -o nj -O4 nj.c
time ./nj
40824
real 0m0.029s
user 0m0.030s
sys 0m0.000s
time python Abelian.py
40824
real 0m1.832s
user 0m1.480s
sys 0m0.007s
Offline