You are not logged in.
Hi,
This is my first post, so I'll try to do the best ![]()
I successfully compiled a C++/CUDA code, but it fails to execute. This is the output of executing "./nbody":
> ./nbody
Start time :
Sun Jan 25 14:19:59 CET 2015
myReal = double
dt = 1.00e-04 us | p = 1.00 pa | T_BG = 297.0 K | harmo_lmax = 12
recordTimeAfterThermalization = 36000000000000.00 s | thermTime = 0.00e+00 us
nbody.cpp::showHelp() empty for now.
Error: only 0 Devices available, 1 requested. Exiting.
*** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007f2cedf5a21a in waitpid () from /usr/lib/libc.so.6
#1 0x00007f2cedee2bfb in do_system () from /usr/lib/libc.so.6
#2 0x00007f2cf68ee654 in TUnixSystem::StackTrace() () from /usr/lib/root/libCore.so.5.34
#3 0x00007f2cf68f074c in TUnixSystem::DispatchSignals(ESignals) () from /usr/lib/root/libCore.so.5.34
#4 <signal handler called>
#5 0x00007f2cedfca572 in __memcpy_avx_unaligned () from /usr/lib/libc.so.6
#6 0x00007f2cf68acbed in ROOT::TGenericClassInfo::CreateRuleSet(std::vector<ROOT::TSchemaHelper, std::allocator<ROOT::TSchemaHelper> >&, bool) () from /usr/lib/root/libCore.so.5.34
#7 0x00007f2cf68ad055 in ROOT::TGenericClassInfo::GetClass() () from /usr/lib/root/libCore.so.5.34
#8 0x00007f2cf3fe4f5a in TTree::Class() () from /usr/lib/root/libTree.so.5.34
#9 0x00007f2cf68694bd in TObject::InheritsFrom(TClass const*) const () from /usr/lib/root/libCore.so.5.34
#10 0x00007f2cf592de0a in TDirectoryFile::Save() () from /usr/lib/root/libRIO.so.5.34
#11 0x00007f2cf592c068 in TDirectoryFile::Close(char const*) () from /usr/lib/root/libRIO.so.5.34
#12 0x00007f2cf5922bf4 in TFile::Close(char const*) () from /usr/lib/root/libRIO.so.5.34
#13 0x00007f2cf6833108 in ?? () from /usr/lib/root/libCore.so.5.34
#14 0x00007f2cf683360a in TROOT::CloseFiles() () from /usr/lib/root/libCore.so.5.34
#15 0x00007f2ceded972f in __cxa_finalize () from /usr/lib/libc.so.6
#16 0x00007f2cf680eaa3 in ?? () from /usr/lib/root/libCore.so.5.34
#17 0x00007ffff9cadc10 in ?? ()
#18 0x00007f2cf6fd5847 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
===========================================================
The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 0x00007f2cedfca572 in __memcpy_avx_unaligned () from /usr/lib/libc.so.6
#6 0x00007f2cf68acbed in ROOT::TGenericClassInfo::CreateRuleSet(std::vector<ROOT::TSchemaHelper, std::allocator<ROOT::TSchemaHelper> >&, bool) () from /usr/lib/root/libCore.so.5.34
#7 0x00007f2cf68ad055 in ROOT::TGenericClassInfo::GetClass() () from /usr/lib/root/libCore.so.5.34
#8 0x00007f2cf3fe4f5a in TTree::Class() () from /usr/lib/root/libTree.so.5.34
#9 0x00007f2cf68694bd in TObject::InheritsFrom(TClass const*) const () from /usr/lib/root/libCore.so.5.34
#10 0x00007f2cf592de0a in TDirectoryFile::Save() () from /usr/lib/root/libRIO.so.5.34
#11 0x00007f2cf592c068 in TDirectoryFile::Close(char const*) () from /usr/lib/root/libRIO.so.5.34
#12 0x00007f2cf5922bf4 in TFile::Close(char const*) () from /usr/lib/root/libRIO.so.5.34
#13 0x00007f2cf6833108 in ?? () from /usr/lib/root/libCore.so.5.34
#14 0x00007f2cf683360a in TROOT::CloseFiles() () from /usr/lib/root/libCore.so.5.34
#15 0x00007f2ceded972f in __cxa_finalize () from /usr/lib/libc.so.6
#16 0x00007f2cf680eaa3 in ?? () from /usr/lib/root/libCore.so.5.34
#17 0x00007ffff9cadc10 in ?? ()
#18 0x00007f2cf6fd5847 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
===========================================================
Segmentation fault (core dumped)
The output of "lspci | grep -e NVIDIA" is
> lspci | grep -e NVIDIA
01:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch (rev a3)
02:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch (rev a3)
02:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch (rev a3)
03:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
I am using:
- GCC 4.9.2 20141224
- CUDA 6.5.14-1
- root 5.34/24
- Graphics card is a NVidia GTX 760, driver 304.125
- Arch 3.18.2-2-ARCH X86_64
- KDE 4.12
I've searched for the error: "Error: only 0 Devices available, 1 requested. Exiting." but I only found one link:
http://www.tuicool.com/articles/jI3UNj
but the guy who resolved the issue doesn't have the same configuration than I, so I can't figure it out. In the NVidia X Server Settings, the GPU is 0 (there is the section "GPU 0 - (GK104)") and in the CUDA code, device 0 is specified:
"int deviceID = 0;" and
"int devID = globals::deviceID;
cudaSetDevice(devID);"
I installed CUDA from the official repositories and root5 from the AUR, thus I didn't do a manual installation of the packages.
Sorry if the form of the post isn't correct :S If more infos are needed I'll post them.
Thanks for any help on the matter !
Madsub
Last edited by Madsub (2015-11-02 16:17:36)
Offline
Hi,
The nbody example from the sdk is working for me but as I have a GTX 970 I am using cuda 6.5.19. But as fas as I remember this example works with version 6.5.14. One thing you could try is downgrading the kernel to version 3.17.6 if you still have it in the pacman cache. Even the latest nvidia driver, 346.35, has problems with the latest kernel. Unified memory does not work for example. See devtalk and bbs post
Last edited by wwmm (2015-01-26 15:11:03)
Offline
Hey !
I finally managed to fix the issue. I was using the old NVidia's drivers (nvidia-304xx - legacy branch). A simple update to the most recent drivers fixed the issue.
Offline