You are not logged in.
Haha, yeah 668MHz
Offline
Before using the smp version, I just added 'x86_64' to the arch arrray and the regular f@h app worked.
Offline
I installed according to your instructions but it segfaults every time I try to load it. I get to Requesting User ID from server, and then fah6 adds ../sysdeps/unix/sysv/linux/getpagesize.c:32:__getpagesize: Assertion ` _rtld_global_ro._dl_pagesize != 0' failed
Segmentation Fault
Solved by Running: /etc/rc.d/nscd start
Last edited by duke11235 (2010-09-10 02:10:26)
Offline
I set up my computer as you instructed, but how long does a workunit take? My computer was assigned to Protein: p6316_sh3 and has no output after Completed 0 of 500000 steps, although it is taxing my dual cores.
Offline
==Forget it== I solved it myself following the above post's instructions.
Well, now I have a bigger problem...
...
[07:55:39] + Attempting to get work packet
[07:55:39] - Connecting to assignment server
[07:55:40] + No appropriate work server was available; will try again in a bit.
[07:55:40] + Couldn't get work instructions.
[07:55:40] - Attempt #3 to get work failed, and no other work to do.
Waiting before retry.
...
Last edited by gtklocker (2010-09-10 07:56:22)
Offline
Nice 3 million jimbok!
Diesel1.
Registered GNU/Linux user #140607.
Offline
==Forget it== I solved it myself following the above post's instructions.
Well, now I have a bigger problem...
... [07:55:39] + Attempting to get work packet [07:55:39] - Connecting to assignment server [07:55:40] + No appropriate work server was available; will try again in a bit. [07:55:40] + Couldn't get work instructions. [07:55:40] - Attempt #3 to get work failed, and no other work to do. Waiting before retry. ...
Hi gtklocker,
Sometimes it can be like this for a few hours or so, give it 12/24 hours of letting your client try for work units.
Diesel1.
Registered GNU/Linux user #140607.
Offline
i'm into the project too!
username: ahel
Team 45032
do you wanna that i post the user id too, or it's private?
religion is like a penis.
It's fine to have one.
It's fine to be proud of it.
But please don't whip it out in public and start waving it around and please don't try to shove it down my childrens' throats.
Offline
It would be nice if someone could update the first post with up to date stats.
PS: Folding for 18 days
Offline
I see messages like these several times:
…
[22:42:00] Protein: proG_17 in water
[22:42:00]
[22:42:01] Writing local files
[22:42:01] Extra SSE boost OK.
[22:42:02] Writing local files
[22:42:02] Completed 0 out of 250000 steps (0%)
[22:57:02] Timered checkpoint triggered.
[23:11:50] Writing local files
[23:11:51] Completed 2500 out of 250000 steps (1%)
[23:26:52] Timered checkpoint triggered.
[23:35:12] - Autosending finished units...
[23:35:12] Trying to send all finished work units
[23:35:12] + No unsent completed units remaining.
[23:35:12] - Autosend completed
[23:36:57] Writing local files
[23:36:57] Completed 5000 out of 250000 steps (2%)
[23:44:17] CoreStatus = 0 (0)
[23:44:17] Client-core communications error: ERROR 0x0
[23:44:17] - Attempting to download new core...
[23:44:17] + Downloading new core: FahCore_78.exe
[23:44:17] Downloading core (/~pande/Linux/x86/Core_78.fah from www.stanford.edu)
[23:44:18] Initial: AFDE; + 10240 bytes downloaded
…
According to FAHlog.txt FahCore_78.exe is now downloaded three times and F@H errors out every time on 'proG_17 in water'.
Is there something wrong or something I can do about this?
Offline
I see messages like these several times:
… [22:42:00] Protein: proG_17 in water [22:42:00] [22:42:01] Writing local files [22:42:01] Extra SSE boost OK. [22:42:02] Writing local files [22:42:02] Completed 0 out of 250000 steps (0%) [22:57:02] Timered checkpoint triggered. [23:11:50] Writing local files [23:11:51] Completed 2500 out of 250000 steps (1%) [23:26:52] Timered checkpoint triggered. [23:35:12] - Autosending finished units... [23:35:12] Trying to send all finished work units [23:35:12] + No unsent completed units remaining. [23:35:12] - Autosend completed [23:36:57] Writing local files [23:36:57] Completed 5000 out of 250000 steps (2%) [23:44:17] CoreStatus = 0 (0) [23:44:17] Client-core communications error: ERROR 0x0 [23:44:17] - Attempting to download new core... [23:44:17] + Downloading new core: FahCore_78.exe [23:44:17] Downloading core (/~pande/Linux/x86/Core_78.fah from www.stanford.edu) [23:44:18] Initial: AFDE; + 10240 bytes downloaded …
According to FAHlog.txt FahCore_78.exe is now downloaded three times and F@H errors out every time on 'proG_17 in water'.
Is there something wrong or something I can do about this?
I too am having problems with this unit:
[04:03:24] Protein: proG_17 in water
[04:03:24]
[04:03:24] Writing local files
[04:03:24] Writing local files
[04:03:24] Completed 0 out of 250000 steps (0%)
[04:15:25] Writing local files
[04:15:25] Completed 2500 out of 250000 steps (1%)
[04:27:16] Writing local files
[04:27:16] Completed 5000 out of 250000 steps (2%)
[04:39:06] Writing local files
[04:39:06] Completed 7500 out of 250000 steps (3%)
[04:50:58] Writing local files
[04:50:58] Completed 10000 out of 250000 steps (4%)
[04:59:57] CoreStatus = 0 (0)
[04:59:57] Sending work to server
[04:59:57] Project: 6508 (Run 16, Clone 88, Gen 54)
[04:59:57] - Error: Could not get length of results file work/wuresults_02.dat
[04:59:57] - Error: Could not read unit 02 file. Removing from queue.
[04:59:57] Trying to send all finished work units
[04:59:57] + No unsent completed units remaining.
[04:59:57] - Preparing to get new work unit...
It does this every single time with proG_17 in water.
desktop: Xubuntu 12.04 LTS [3.2.0-37-generic x86_64]
netbook: eeepc 1005ha Arch [3.7.6-1-ck i686]
Offline
So it isn't an issue with my specific machine, but with this work unit (or a bug in the program).
Is there a way to bypass this unit (blacklist, for instance)?
Offline
Higher cpu usage but 1300 more PPD?...
Last screenshot I posted to compare
Edit:
Realize they arnt the same exact project but..I'll let it run for a while and post some benchmarks from fahmon? I still feel its moving a bit faster.
cuda 3.1
Last edited by whaevr (2010-10-02 07:47:20)
Offline
I'm trying to run the F@H GPU client as built from the AUR but get the following error:
--- Opening Log file [October 10 20:13:48 UTC]
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.30r1
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: Z:\opt\fah-gpu\alpha
Executable: Z:\opt\fah-gpu\Folding@home-Win32-GPU.exe
Arguments: -forcegpu nvidia_g80 -gpu 0 -verbosity 9
[20:13:48] - Ask before connecting: No
[20:13:48] - User name: foo
[20:13:48] - User ID: bar
[20:13:48] - Machine ID: 11
[20:13:48]
[20:13:48] Gpu species not recognized.
[20:13:48] Loaded queue successfully.
[20:13:48]
[20:13:48] + Processing work unit
[20:13:48] Core required: FahCore_11.exe
[20:13:48] Core found.
[20:13:48] Working on queue slot 01 [October 10 20:13:48 UTC]
[20:13:48] + Working ...
[20:13:48] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -nice 19 -checkpoint 15 -verbose -lifeline 8 -version 630'
[20:13:48] - Autosending finished units... [October 10 20:13:48 UTC]
[20:13:48] Trying to send all finished work units
[20:13:48] + No unsent completed units remaining.
[20:13:48] - Autosend completed
[20:13:52] CoreStatus = C0000135 (-1073741515)
[20:13:52] Client-core communications error: ERROR 0xc0000135
[20:13:52] This is a sign of more serious problems, shutting down.
The F@H wiki indicates that error 135 means it's missing expected DLLs, but it doesn't say which
Offline
Look in the directory
/opt/fah-gpu/alpha
Is there a symlink in there to nvcuda.dll? If there is then check
/opt/fah-gpu/
There should also be a symlink to nvcuda.dll there linked to /usr/lib32/wine/cudart.dll.so
You did install the lib32-nvcuda package right?
Offline
Both symlinks are present.
And yes, I did install lib32-nvcuda before F@H-GPU
Offline
"attemp #1708 to get work failed..." I should check my clients a little more often, a restart of the daemon fixed it >_>
Offline
Both symlinks are present.
And yes, I did install lib32-nvcuda before F@H-GPU
Interesting...then all the dlls you need should be in /usr/lib32/wine/ if it all installed correctly...
For some reason each working/folding directory has to have the nvcuda.dll file in it or else it complains, which is why theres symlinks. Thats usually how I got that error before but if everything is linked correctly..
Try running
/usr/lib32/ld-2.12.1.so cudart32_30_14.dll
and post the output.
Tbh Im not even sure if thats the command I used before to check, Im away from my nvidia machine till this weekend
Im hoping I remember correctly lol
*reminds self to setup ssh on nvidia box*
Offline
For everyone's information, today I tried to reactivate F@H and it doesn't hang on 'Protein: proG_17 in water' anymore, at least not in an early stage (at the moment F@H reached 30% without problems).
Offline
Whaevr,
Ran the command "ldd nvcuda.dll" from /opt/fah-gpu/alpha and found libcudart.so.3 was not found.
A quick hack, cd info /usr/lib32 and ln -s /opt/lib32/usr/lib/libcudart.so.3
There probably a better way....
Registered Linux User #402088
Offline
Jimbok, that worked, thanks
I have a feeling that was something I probably should have been able to do on my own though :S
Offline
Whaevr,
Ran the command "ldd nvcuda.dll" from /opt/fah-gpu/alpha and found libcudart.so.3 was not found.
A quick hack, cd info /usr/lib32 and ln -s /opt/lib32/usr/lib/libcudart.so.3
There probably a better way....
Yeah I'll switch over the installation of lib32-cuda-toolkit to use /usr/lib32 instead of /opt/lib32/usr/lib
Ever since x86_64 switched over I have yet to change that package to follow that format.
Weird how I never run into this stuff when I try it on my box :\
edit:
And its done, updated the cuda package, it now installs using /usr/lib32 as the prefix
Last edited by whaevr (2010-10-15 23:56:35)
Offline
Nice million Zetbo!
Diesel1.
Registered GNU/Linux user #140607.
Offline
I am having a problem getting f@h going on my file server and need some guidence. It is the standard x86_64 version that consistently fails when trying to install on my server. I choose the 6.02 version for the server because right now it's running on an old single core AMD Athlon64 3500+ I had laying around, which I understand does not support the 6.29 version of f@h. I've since seen a few posts indicating I might have better luck with the other version of f@h, but I'm a little hesitant given the problems so far and the fact that the system is running a single core CPU.
Following the WIKI everything seems to go well up to the install. When I install the pkg it creates the /opt/fah directory and there's even an executable file (according to ls -F) called fah6 in the directory but nothing else. The problems begin when I try to start the daemons. The first one, nscd, starts find but I get an odd error if I try to start the foldingathome daemon. What does that "nice:" prefix mean?
[root@Serverbox opt]# /etc/rc.d/nscd start
:: Starting nscd [DONE]
[root@Serverbox opt]# /etc/rc.d/foldingathome start
:: Starting Folding@Home [DONE]
[root@Serverbox opt]# nice: /opt/fah/fah6: No such file or directory
After that the cursor hangs until I press the enter key. What I find odd is that if I've specified a user in /etc/conf.d/foldingathome
Trying to stop the foldingathome daemon gives a failure message:
[root@Serverbox opt]# /etc/rc.d/nscd start
ls -F code:
[root@Serverbox fah]# ls -F
fah6*
On the off chance it might work I've even tried running "./fah6 -configonly" from the /opt/fah directory but I just get an error message stating there is no ./fah6 file.
Thanks.
[Edited to include f@h version numbers]
Last edited by imatechguy (2010-11-04 04:21:13)
Offline
I also have the 135 error.
[18:45:58] CoreStatus = C0000135 (-1073741515)
[18:45:58] Client-core communications error: ERROR 0xc0000135
[18:45:58] This is a sign of more serious problems, shutting down.
No symlink issues:
# pwd
/opt/fah-gpu/alpha
# ldd nvcuda.dll
linux-gate.so.1 => (0xf7795000)
libdl.so.2 => /opt/lib32/lib/libdl.so.2 (0xf771e000)
libpthread.so.0 => /opt/lib32/lib/libpthread.so.0 (0xf7704000)
librt.so.1 => /opt/lib32/lib/librt.so.1 (0xf76fb000)
libstdc++.so.6 => /opt/lib32/usr/lib/libstdc++.so.6 (0xf760d000)
libm.so.6 => /opt/lib32/lib/libm.so.6 (0xf75e7000)
libgcc_s.so.1 => /opt/lib32/usr/lib/libgcc_s.so.1 (0xf75cb000)
libc.so.6 => /opt/lib32/lib/libc.so.6 (0xf7480000)
/lib/ld-linux.so.2 (0xf7796000)
What am I missing? [EDIT: It's been quite some time since I've updated ... I just hit multilib in fact ... so I'll -Syyu and see what happens]
Last edited by georgia_tech_swagger (2010-11-05 19:05:55)
Res Publica Non Dominetur
Laptop: Arch x86 | Thinkpad X220 | Core i5 2410-M | 8 GB DDR3 | Sandy Bridge
Desktop: Arch x86_64 | Custom | Core i7 920 | 6 GB DDR3 | GeForce 260 GTX
Offline