You are not logged in.
I need to rebuild tensorflow to support my old GPU (computing capability 5.0).
I have tried to compile to for a few days now but get a somewhat cryptic error message "No space left on device error.".
I have done the following...
asp tensorflow
into ~/src/asp.
I have then modified ~/src/asp/tensorflow/trunk/PKGBUILD to only include my GPU computing capability.
I then went into the trunk folder and run:
extra-x86_64-build -r ~/bigdata/arch
The first time I run it without any switches but after getting the no space left error I changed to the bigdata hard drive but with the same result.
This is the relevant output from extra-x86_64-build:
[14,030 / 14,797] Compiling llvm/lib/Support/TimeProfiler.cpp [for host]; 1s local ... (4 actions, 3 running)
INFO: Elapsed time: 14843.234s, Critical Path: 133.81s
INFO: 15818 processes: 1248 internal, 14570 local.
INFO: Build completed successfully, 15818 total actions
INFO: Build completed successfully, 15818 total actions
Mon May 10 01:30:02 PM CEST 2021 : === Preparing sources in dir: /tmp/tmp.FQUaQ3ZSCJ
~/tensorflow/src/tensorflow-2.4.1 ~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow ~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1
/tmp/tmp.FQUaQ3ZSCJ/tensorflow/include ~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1
Mon May 10 01:30:11 PM CEST 2021 : === Building wheel
warning: no files found matching 'README'
warning: no files found matching '*.pyd' under directory '*'
warning: no files found matching '*.pyi' under directory '*'
warning: no files found matching '*.pd' under directory '*'
warning: no files found matching '*.dylib' under directory '*'
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.csv' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*.proto' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
error: could not write to 'build/bdist.linux-x86_64/wheel/tensorflow/python/_pywrap_tensorflow_internal.so': No space left on device
==> ERROR: A failure occurred in build().
Aborting...
==> ERROR: Build failed, check /home/andreas/bigdata/arch/extra-x86_64/andreas/build
And this is df -h:
Filesystem Size Used Avail Use% Mounted on
dev 7.8G 0 7.8G 0% /dev
run 7.8G 1.2M 7.8G 1% /run
/dev/sda2 125G 90G 30G 76% /
tmpfs 7.8G 241M 7.6G 4% /dev/shm
/dev/sda3 314G 152G 147G 51% /home
/dev/sda1 253M 142M 111M 57% /boot
/dev/md127 293G 149G 130G 54% /mnt/TheFridge
tmpfs 7.8G 412K 7.8G 1% /tmp
tmpfs 1.6G 11M 1.6G 1% /run/user/1000
overlaid 1.6G 11M 1.6G 1% /run/user/1000/andreas-chromium
I don't understand what the error is supposed to mean.
Also extra-x86_64-build wants to rebuild everything even when I don't use the -c parameter, is there a fix for that?
Offline
Mon May 10 01:30:02 PM CEST 2021 : === Preparing sources in dir: /tmp/tmp.FQUaQ3ZSCJ
tmp is stored in memory, which is (default) limited to half your physical memory .
do you have a swap-file or partition ?
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
I guessed it could be that but I ignored it because i don't know how the ramdisk size could be changed when using extra-x86_64-build. I guess I could use the manual method instead.
I have 16Gb of physical ram and a swapfile of 64Gb.
EDIT: I tried to remove the two tmp.mount files hoping it will prevent systemd from mount the ramdisk. Would have been nice if I didn't have to recompile everything all the time.
EDIT2: That did not work at all. Now I'm also trying to create a new container the manual method but the tmpfs is magically created without the help of systemd. I guess I have to buy more ram instead.
EDIT3: Found this https://bbs.archlinux.org/viewtopic.php?id=226438 and as far as I understand it means that it is simply impossible to use extra-x86_64-build to compile large packages like tensorflow without a huge amount of ram.
Last edited by AlgoJerViA (2021-05-10 20:33:57)
Offline
extra-x86_64-build builds in /var/lib/archbuild/extra-x86_64/$USER/build ?
Edit:
Unless you are using something like
extra-x86_64-build -c -r /tmp/dir
Last edited by loqs (2021-05-10 21:11:11)
Offline
I then went into the trunk folder and run:
extra-x86_64-build -r ~/bigdata/arch
The first time I run it without any switches but after getting the no space left error I changed to the bigdata hard drive but with the same result.
The problem as I understands it is that extra-x86_64-build uses systemd-nspawn that is a wrapper around systemd-nspawn that is a tool to use linux containers. The problem is that systemd-nspawn hard codes the use of half the memory for the tmp folder ignoring the usual tmp.mount from systemd.
I'm trying to build tensorflow outside of a chroot now with tmpfs turned of for /tmp but it was extremely slow. It has been running for nine hours now and is not even half way through the compilation.
Offline
Did some searching and the problem may be caused by tensorflow build system bazel*, not by systemd-nspawn .
Check https://stackoverflow.com/questions/343 … mory-usage .
* not the first build system that appears to be tailored to dedicated build farms with thousands of cores and terabytes of high speed memory / storage.
Often those systems effectively increase build time dramatically on less powerfull systems.
Last edited by Lone_Wolf (2021-05-11 10:44:53)
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
This worked for me:
@@ -118,7 +122,8 @@ prepare() {
export CXX=g++-10
export BAZEL_ARGS="--config=mkl -c opt --copt=-I/usr/include/openssl-1.0 --host_copt=-I/usr/include/openssl-1.0 --linkopt=-l:libssl.so.1.0.0 --linkopt=-l:libcrypto.so.1.0.0 --host_linkopt=-l:libssl.so.1.0.0 --host_linkopt=-l:libcrypto.so.1.0.0"
+ export TMPDIR="$srcdir"
}
build() {
Stops pip from using /tmp.
Offline