You are not logged in.

#1 2021-05-10 11:51:02

AlgoJerViA
Member
Registered: 2014-08-06
Posts: 22

Rebuilding tensorflow, No space left on device error.

I need to rebuild tensorflow to support my old GPU (computing capability 5.0).
I have tried to compile to for a few days now but get a somewhat cryptic error message "No space left on device error.".

I have done the following...

asp tensorflow

into ~/src/asp.
I have then modified ~/src/asp/tensorflow/trunk/PKGBUILD to only include my GPU computing capability.
I then went into the trunk folder and run:

extra-x86_64-build -r ~/bigdata/arch

The first time I run it without any switches but after getting the no space left error I changed to the bigdata hard drive but with the same result.

This is the relevant output from extra-x86_64-build:

[14,030 / 14,797] Compiling llvm/lib/Support/TimeProfiler.cpp [for host]; 1s local ... (4 actions, 3 running)
INFO: Elapsed time: 14843.234s, Critical Path: 133.81s
INFO: 15818 processes: 1248 internal, 14570 local.
INFO: Build completed successfully, 15818 total actions
INFO: Build completed successfully, 15818 total actions
Mon May 10 01:30:02 PM CEST 2021 : === Preparing sources in dir: /tmp/tmp.FQUaQ3ZSCJ
~/tensorflow/src/tensorflow-2.4.1 ~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow ~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1
/tmp/tmp.FQUaQ3ZSCJ/tensorflow/include ~/tensorflow/src/tensorflow-2.4.1
~/tensorflow/src/tensorflow-2.4.1
Mon May 10 01:30:11 PM CEST 2021 : === Building wheel
warning: no files found matching 'README'
warning: no files found matching '*.pyd' under directory '*'
warning: no files found matching '*.pyi' under directory '*'
warning: no files found matching '*.pd' under directory '*'
warning: no files found matching '*.dylib' under directory '*'
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.csv' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*.proto' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
error: could not write to 'build/bdist.linux-x86_64/wheel/tensorflow/python/_pywrap_tensorflow_internal.so': No space left on device
==> ERROR: A failure occurred in build().
    Aborting...
==> ERROR: Build failed, check /home/andreas/bigdata/arch/extra-x86_64/andreas/build

And this is df -h:

Filesystem      Size  Used Avail Use% Mounted on
dev             7.8G     0  7.8G   0% /dev
run             7.8G  1.2M  7.8G   1% /run
/dev/sda2       125G   90G   30G  76% /
tmpfs           7.8G  241M  7.6G   4% /dev/shm
/dev/sda3       314G  152G  147G  51% /home
/dev/sda1       253M  142M  111M  57% /boot
/dev/md127      293G  149G  130G  54% /mnt/TheFridge
tmpfs           7.8G  412K  7.8G   1% /tmp
tmpfs           1.6G   11M  1.6G   1% /run/user/1000
overlaid        1.6G   11M  1.6G   1% /run/user/1000/andreas-chromium

I don't understand what the error is supposed to mean.
Also extra-x86_64-build wants to rebuild everything even when I don't use the -c parameter, is there a fix for that?

Offline

#2 2021-05-10 13:16:28

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: Rebuilding tensorflow, No space left on device error.

Mon May 10 01:30:02 PM CEST 2021 : === Preparing sources in dir: /tmp/tmp.FQUaQ3ZSCJ

tmp is stored in memory, which is (default) limited to half your physical memory .
do you have a swap-file or partition ?


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#3 2021-05-10 15:38:18

AlgoJerViA
Member
Registered: 2014-08-06
Posts: 22

Re: Rebuilding tensorflow, No space left on device error.

I guessed it could be that but I ignored it because i don't know how the ramdisk size could be changed when using extra-x86_64-build. I guess I could use the manual method instead.

I have 16Gb of physical ram and a swapfile of 64Gb.

EDIT: I tried to remove the two tmp.mount files hoping it will prevent systemd from mount the ramdisk. Would have been nice if I didn't have to recompile everything all the time.

EDIT2: That did not work at all. Now I'm also trying to create a new container the manual method but the tmpfs is magically created without the help of systemd. I guess I have to buy more ram instead.

EDIT3: Found this https://bbs.archlinux.org/viewtopic.php?id=226438 and as far as I understand it means that it is simply impossible to use extra-x86_64-build to compile large packages like tensorflow without a huge amount of ram.

Last edited by AlgoJerViA (2021-05-10 20:33:57)

Offline

#4 2021-05-10 21:07:32

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: Rebuilding tensorflow, No space left on device error.

extra-x86_64-build builds in /var/lib/archbuild/extra-x86_64/$USER/build ?
Edit:
Unless you are using something like

extra-x86_64-build -c -r /tmp/dir

Last edited by loqs (2021-05-10 21:11:11)

Offline

#5 2021-05-11 06:01:53

AlgoJerViA
Member
Registered: 2014-08-06
Posts: 22

Re: Rebuilding tensorflow, No space left on device error.

AlgoJerViA wrote:

I then went into the trunk folder and run:

extra-x86_64-build -r ~/bigdata/arch

The first time I run it without any switches but after getting the no space left error I changed to the bigdata hard drive but with the same result.

smile

The problem as I understands it is that extra-x86_64-build uses systemd-nspawn that is a wrapper around systemd-nspawn that is a tool to use linux containers. The problem is that systemd-nspawn hard codes the use of half the memory for the tmp folder ignoring the usual tmp.mount from systemd.

I'm trying to build tensorflow outside of a chroot now with tmpfs turned of for /tmp but it was extremely slow. It has been running for nine hours now and is not even half way through the compilation.

Offline

#6 2021-05-11 10:44:16

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: Rebuilding tensorflow, No space left on device error.

Did some searching and the problem may be caused by tensorflow build system bazel*, not by systemd-nspawn .

Check https://stackoverflow.com/questions/343 … mory-usage .


* not the first build system that appears to be tailored to dedicated build farms with thousands of cores and terabytes of high speed memory / storage.
Often those systems effectively increase build time dramatically on less powerfull systems.

Last edited by Lone_Wolf (2021-05-11 10:44:53)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#7 2021-07-09 10:21:05

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: Rebuilding tensorflow, No space left on device error.

This worked for me:

@@ -118,7 +122,8 @@ prepare() {
   export CXX=g++-10
 
   export BAZEL_ARGS="--config=mkl -c opt --copt=-I/usr/include/openssl-1.0 --host_copt=-I/usr/include/openssl-1.0 --linkopt=-l:libssl.so.1.0.0 --linkopt=-l:libcrypto.so.1.0.0 --host_linkopt=-l:libssl.so.1.0.0 --host_linkopt=-l:libcrypto.so.1.0.0"
+  export TMPDIR="$srcdir"
 }
 
 build() {

Stops pip from using /tmp.

Offline

Board footer

Powered by FluxBB