F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

hatten · 2010-09-05 11:02:06

Haha, yeah 668MHz

Nepherte · 2010-09-05 11:13:12

Before using the smp version, I just added 'x86_64' to the arch arrray and the regular f@h app worked.

duke11235 · 2010-09-10 01:44:35

I installed according to your instructions but it segfaults every time I try to load it. I get to Requesting User ID from server, and then fah6 adds ../sysdeps/unix/sysv/linux/getpagesize.c:32:__getpagesize: Assertion ` _rtld_global_ro._dl_pagesize != 0' failed
Segmentation Fault

Solved by Running: /etc/rc.d/nscd start

Last edited by duke11235 (2010-09-10 02:10:26)

duke11235 · 2010-09-10 02:13:33

I set up my computer as you instructed, but how long does a workunit take? My computer was assigned to Protein: p6316_sh3 and has no output after Completed 0 of 500000 steps, although it is taxing my dual cores.

gtklocker · 2010-09-10 07:53:29

==Forget it== I solved it myself following the above post's instructions.

Well, now I have a bigger problem...

...
[07:55:39] + Attempting to get work packet
[07:55:39] - Connecting to assignment server
[07:55:40] + No appropriate work server was available; will try again in a bit.
[07:55:40] + Couldn't get work instructions.
[07:55:40] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
...

Last edited by gtklocker (2010-09-10 07:56:22)

diesel1 · 2010-09-12 21:52:22

Nice 3 million jimbok!

Diesel1.

diesel1 · 2010-09-12 21:54:15

gtklocker wrote:

==Forget it== I solved it myself following the above post's instructions.

Well, now I have a bigger problem...

...
[07:55:39] + Attempting to get work packet
[07:55:39] - Connecting to assignment server
[07:55:40] + No appropriate work server was available; will try again in a bit.
[07:55:40] + Couldn't get work instructions.
[07:55:40] - Attempt #3  to get work failed, and no other work to do.
Waiting before retry.
...

Hi gtklocker,

Sometimes it can be like this for a few hours or so, give it 12/24 hours of letting your client try for work units.

Diesel1.

ahel · 2010-09-16 01:09:31

i'm into the project too!
username: ahel
Team 45032

do you wanna that i post the user id too, or it's private?

trmanco · 2010-09-19 00:50:10

It would be nice if someone could update the first post with up to date stats.

PS: Folding for 18 days

Marcel- · 2010-09-25 00:29:03

I see messages like these several times:

…
[22:42:00] Protein: proG_17 in water
[22:42:00] 
[22:42:01] Writing local files
[22:42:01] Extra SSE boost OK.
[22:42:02] Writing local files
[22:42:02] Completed 0 out of 250000 steps  (0%)
[22:57:02] Timered checkpoint triggered.
[23:11:50] Writing local files
[23:11:51] Completed 2500 out of 250000 steps  (1%)
[23:26:52] Timered checkpoint triggered.
[23:35:12] - Autosending finished units...
[23:35:12] Trying to send all finished work units
[23:35:12] + No unsent completed units remaining.
[23:35:12] - Autosend completed
[23:36:57] Writing local files
[23:36:57] Completed 5000 out of 250000 steps  (2%)
[23:44:17] CoreStatus = 0 (0)
[23:44:17] Client-core communications error: ERROR 0x0
[23:44:17] - Attempting to download new core...
[23:44:17] + Downloading new core: FahCore_78.exe
[23:44:17] Downloading core (/~pande/Linux/x86/Core_78.fah from www.stanford.edu)
[23:44:18] Initial: AFDE; + 10240 bytes downloaded
…

According to FAHlog.txt FahCore_78.exe is now downloaded three times and F@H errors out every time on 'proG_17 in water'.

Is there something wrong or something I can do about this?

ns89 · 2010-09-28 19:44:57

Marcel- wrote:

I see messages like these several times:

…
[22:42:00] Protein: proG_17 in water
[22:42:00] 
[22:42:01] Writing local files
[22:42:01] Extra SSE boost OK.
[22:42:02] Writing local files
[22:42:02] Completed 0 out of 250000 steps  (0%)
[22:57:02] Timered checkpoint triggered.
[23:11:50] Writing local files
[23:11:51] Completed 2500 out of 250000 steps  (1%)
[23:26:52] Timered checkpoint triggered.
[23:35:12] - Autosending finished units...
[23:35:12] Trying to send all finished work units
[23:35:12] + No unsent completed units remaining.
[23:35:12] - Autosend completed
[23:36:57] Writing local files
[23:36:57] Completed 5000 out of 250000 steps  (2%)
[23:44:17] CoreStatus = 0 (0)
[23:44:17] Client-core communications error: ERROR 0x0
[23:44:17] - Attempting to download new core...
[23:44:17] + Downloading new core: FahCore_78.exe
[23:44:17] Downloading core (/~pande/Linux/x86/Core_78.fah from www.stanford.edu)
[23:44:18] Initial: AFDE; + 10240 bytes downloaded
…

According to FAHlog.txt FahCore_78.exe is now downloaded three times and F@H errors out every time on 'proG_17 in water'.

Is there something wrong or something I can do about this?

I too am having problems with this unit:

[04:03:24] Protein: proG_17 in water
[04:03:24] 
[04:03:24] Writing local files
[04:03:24] Writing local files
[04:03:24] Completed 0 out of 250000 steps  (0%)
[04:15:25] Writing local files
[04:15:25] Completed 2500 out of 250000 steps  (1%)
[04:27:16] Writing local files
[04:27:16] Completed 5000 out of 250000 steps  (2%)
[04:39:06] Writing local files
[04:39:06] Completed 7500 out of 250000 steps  (3%)
[04:50:58] Writing local files
[04:50:58] Completed 10000 out of 250000 steps  (4%)
[04:59:57] CoreStatus = 0 (0)
[04:59:57] Sending work to server
[04:59:57] Project: 6508 (Run 16, Clone 88, Gen 54)
[04:59:57] - Error: Could not get length of results file work/wuresults_02.dat
[04:59:57] - Error: Could not read unit 02 file. Removing from queue.
[04:59:57] Trying to send all finished work units
[04:59:57] + No unsent completed units remaining.
[04:59:57] - Preparing to get new work unit...

It does this every single time with proG_17 in water.

Marcel- · 2010-09-28 19:49:13

So it isn't an issue with my specific machine, but with this work unit (or a bug in the program).

Is there a way to bypass this unit (blacklist, for instance)?

whaevr · 2010-10-02 06:34:09

Gpu3 client

Higher cpu usage but 1300 more PPD?...
Last screenshot I posted to compare

Edit:
Realize they arnt the same exact project but..I'll let it run for a while and post some benchmarks from fahmon? I still feel its moving a bit faster.
cuda 3.1

Last edited by whaevr (2010-10-02 07:47:20)

ZekeSulastin · 2010-10-10 20:26:27

I'm trying to run the F@H GPU client as built from the AUR but get the following error:

--- Opening Log file [October 10 20:13:48 UTC] 


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30r1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: Z:\opt\fah-gpu\alpha
Executable: Z:\opt\fah-gpu\Folding@home-Win32-GPU.exe
Arguments: -forcegpu nvidia_g80 -gpu 0 -verbosity 9 

[20:13:48] - Ask before connecting: No
[20:13:48] - User name: foo
[20:13:48] - User ID: bar
[20:13:48] - Machine ID: 11
[20:13:48] 
[20:13:48] Gpu species not recognized.
[20:13:48] Loaded queue successfully.
[20:13:48] 
[20:13:48] + Processing work unit
[20:13:48] Core required: FahCore_11.exe
[20:13:48] Core found.
[20:13:48] Working on queue slot 01 [October 10 20:13:48 UTC]
[20:13:48] + Working ...
[20:13:48] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -nice 19 -checkpoint 15 -verbose -lifeline 8 -version 630'

[20:13:48] - Autosending finished units... [October 10 20:13:48 UTC]
[20:13:48] Trying to send all finished work units
[20:13:48] + No unsent completed units remaining.
[20:13:48] - Autosend completed
[20:13:52] CoreStatus = C0000135 (-1073741515)
[20:13:52] Client-core communications error: ERROR 0xc0000135
[20:13:52] This is a sign of more serious problems, shutting down.

The F@H wiki indicates that error 135 means it's missing expected DLLs, but it doesn't say which

whaevr · 2010-10-11 13:00:51

Look in the directory
/opt/fah-gpu/alpha
Is there a symlink in there to nvcuda.dll? If there is then check
/opt/fah-gpu/
There should also be a symlink to nvcuda.dll there linked to /usr/lib32/wine/cudart.dll.so

You did install the lib32-nvcuda package right?

ZekeSulastin · 2010-10-11 16:40:58

Both symlinks are present.

And yes, I did install lib32-nvcuda before F@H-GPU

hatten · 2010-10-11 18:08:47

"attemp #1708 to get work failed..." I should check my clients a little more often, a restart of the daemon fixed it >_>

whaevr · 2010-10-11 20:39:27

ZekeSulastin wrote:

Both symlinks are present.
And yes, I did install lib32-nvcuda before F@H-GPU

Interesting...then all the dlls you need should be in /usr/lib32/wine/ if it all installed correctly...

For some reason each working/folding directory has to have the nvcuda.dll file in it or else it complains, which is why theres symlinks. Thats usually how I got that error before but if everything is linked correctly..

Try running

/usr/lib32/ld-2.12.1.so cudart32_30_14.dll

and post the output.
Tbh Im not even sure if thats the command I used before to check, Im away from my nvidia machine till this weekend
Im hoping I remember correctly lol

*reminds self to setup ssh on nvidia box*

Marcel- · 2010-10-11 23:04:25

For everyone's information, today I tried to reactivate F@H and it doesn't hang on 'Protein: proG_17 in water' anymore, at least not in an early stage (at the moment F@H reached 30% without problems).

jimbok · 2010-10-12 01:28:14

Whaevr,

Ran the command "ldd nvcuda.dll" from /opt/fah-gpu/alpha and found libcudart.so.3 was not found.

A quick hack, cd info /usr/lib32 and ln -s /opt/lib32/usr/lib/libcudart.so.3

There probably a better way....

ZekeSulastin · 2010-10-12 15:16:21

Jimbok, that worked, thanks

I have a feeling that was something I probably should have been able to do on my own though :S

whaevr · 2010-10-12 17:04:15

jimbok wrote:

Whaevr,
Ran the command "ldd nvcuda.dll" from /opt/fah-gpu/alpha and found libcudart.so.3 was not found.
A quick hack, cd info /usr/lib32 and ln -s /opt/lib32/usr/lib/libcudart.so.3
There probably a better way....

Yeah I'll switch over the installation of lib32-cuda-toolkit to use /usr/lib32 instead of /opt/lib32/usr/lib

Ever since x86_64 switched over I have yet to change that package to follow that format.

Weird how I never run into this stuff when I try it on my box :\

edit:
And its done, updated the cuda package, it now installs using /usr/lib32 as the prefix

Last edited by whaevr (2010-10-15 23:56:35)

diesel1 · 2010-11-02 10:28:03

Nice million Zetbo!

Diesel1.

imatechguy · 2010-11-04 04:15:36

I am having a problem getting f@h going on my file server and need some guidence. It is the standard x86_64 version that consistently fails when trying to install on my server. I choose the 6.02 version for the server because right now it's running on an old single core AMD Athlon64 3500+ I had laying around, which I understand does not support the 6.29 version of f@h. I've since seen a few posts indicating I might have better luck with the other version of f@h, but I'm a little hesitant given the problems so far and the fact that the system is running a single core CPU.

Following the WIKI everything seems to go well up to the install. When I install the pkg it creates the /opt/fah directory and there's even an executable file (according to ls -F) called fah6 in the directory but nothing else. The problems begin when I try to start the daemons. The first one, nscd, starts find but I get an odd error if I try to start the foldingathome daemon. What does that "nice:" prefix mean?

[root@Serverbox opt]#   /etc/rc.d/nscd start
:: Starting nscd                                                                                                                                                              [DONE] 
[root@Serverbox opt]#   /etc/rc.d/foldingathome start
:: Starting Folding@Home                                                                                                                                                      [DONE] 
[root@Serverbox opt]# nice: /opt/fah/fah6: No such file or directory

After that the cursor hangs until I press the enter key. What I find odd is that if I've specified a user in /etc/conf.d/foldingathome

Trying to stop the foldingathome daemon gives a failure message:

[root@Serverbox opt]#   /etc/rc.d/nscd start

ls -F code:

[root@Serverbox fah]#  ls -F
fah6*

On the off chance it might work I've even tried running "./fah6 -configonly" from the /opt/fah directory but I just get an error message stating there is no ./fah6 file.

Thanks.

[Edited to include f@h version numbers]

Last edited by imatechguy (2010-11-04 04:21:13)

georgia_tech_swagger · 2010-11-05 18:48:38

I also have the 135 error.

[18:45:58] CoreStatus = C0000135 (-1073741515)
[18:45:58] Client-core communications error: ERROR 0xc0000135
[18:45:58] This is a sign of more serious problems, shutting down.

No symlink issues:
# pwd
/opt/fah-gpu/alpha

# ldd nvcuda.dll
linux-gate.so.1 => (0xf7795000)
libdl.so.2 => /opt/lib32/lib/libdl.so.2 (0xf771e000)
libpthread.so.0 => /opt/lib32/lib/libpthread.so.0 (0xf7704000)
librt.so.1 => /opt/lib32/lib/librt.so.1 (0xf76fb000)
libstdc++.so.6 => /opt/lib32/usr/lib/libstdc++.so.6 (0xf760d000)
libm.so.6 => /opt/lib32/lib/libm.so.6 (0xf75e7000)
libgcc_s.so.1 => /opt/lib32/usr/lib/libgcc_s.so.1 (0xf75cb000)
libc.so.6 => /opt/lib32/lib/libc.so.6 (0xf7480000)
/lib/ld-linux.so.2 (0xf7796000)

What am I missing? [EDIT: It's been quite some time since I've updated ... I just hit multilib in fact ... so I'll -Syyu and see what happens]

Last edited by georgia_tech_swagger (2010-11-05 19:05:55)

Arch Linux

#526 2010-09-05 11:02:06

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#527 2010-09-05 11:13:12

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#528 2010-09-10 01:44:35

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#529 2010-09-10 02:13:33

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#530 2010-09-10 07:53:29

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#531 2010-09-12 21:52:22

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#532 2010-09-12 21:54:15

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#533 2010-09-16 01:09:31

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#534 2010-09-19 00:50:10

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#535 2010-09-25 00:29:03

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#536 2010-09-28 19:44:57

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#537 2010-09-28 19:49:13

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#538 2010-10-02 06:34:09

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#539 2010-10-10 20:26:27

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#540 2010-10-11 13:00:51

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#541 2010-10-11 16:40:58

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#542 2010-10-11 18:08:47

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#543 2010-10-11 20:39:27

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#544 2010-10-11 23:04:25

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#545 2010-10-12 01:28:14

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#546 2010-10-12 15:16:21

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#547 2010-10-12 17:04:15

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#548 2010-11-02 10:28:03

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#549 2010-11-04 04:15:36

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

#550 2010-11-05 18:48:38

Re: F@H Arch Linux Team - Recruitment & Stats thread - HELP US !.

Board footer