Tensorflowの1.13.1がリリースされたのでソースビルド。
ただし、XLAをdisableにしないとビルドできない問題がある(以下のissue)。
Fedora29でもソースビルドしなくてもpipでインストールできるようになった(はず)。
気にしなければ、pipでインストールするかnvidia-docker上に構築するほうがよい。
環境
- Fedora29 x86_64
- python3.7(virtualenv)
- CUDA10.0 + cuDNN7.5
- GPU=>NVIDIA GTX1080
事前の準備
Bazelもソースビルドする(リポジトリのバージョンだとTensorflowが期待するバージョンと一致しない)。
Bazelの19.2〜22.0までのバージョであればビルド可能。
今回は19.2をソースビルドした。
ソースビルドの方法は以下を参照。
ビルドが終わったらoutputフォルダのパスをPATHに追加しておく。
$ export PATH=$PATH:"bazel dir"output
Tensorflowのビルド
公式のページ通りだけど、まずはvirtualenv(virtualenvwapper)でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストール。
$ mkvirtualenv tf -p python3
$ pip3 install pip six numpy wheel mock
$ pip3 install -U keras_applications==1.0.6 --no-deps
$ pip install -U keras_preprocessing==1.0.5 --no-deps
githubからソースを取得。
wget https://github.com/tensorflow/tensorflow/archive/v1.13.1.tar.gz
configureでは、CUDAのサポートを有効とすることと、Host compilerにGCC7のgccのパスを指定してあげること。
$ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
INFO: Invocation ID: 557cf704-5d53-4a73-8118-153c5d42f71e
You have bazel 0.21.0- (@non-git) installed.
Please specify the location of python. [Default is /home/xxxxx/.virtualenvs/tf/bin/python]:
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'site' has no attribute 'getsitepackages'
Found possible Python library paths:
/home/xxxxx/.virtualenvs/tf/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/home/xxxxx/.virtualenvs/tf/lib/python3.7/site-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]:
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]:
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.
Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]:
Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/lib64/ccache/gcc]: /home/xxxxx/gcc/7.3/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apacha Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
ビルド。
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ pip3 install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp37-cp37m-linux_x86_64.whl
あとは2時間待つ
ビルドが完了したらpipパッケージを作成して、インストール!
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ pip3 install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp37-cp37m-linux_x86_64.whl
インストール後の確認
- tf.__version__が1.13.1であること。
- tf.Session()でGPUデバイスを認識していること。
$ python3
Python 3.7.2 (default, Jan 16 2019, 19:49:22)
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.13.1'
>>> tf.Session()
2019-03-06 22:14:48.432753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-06 22:14:48.433565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:09:00.0
totalMemory: 7.93GiB freeMemory: 7.54GiB
2019-03-06 22:14:48.433588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-06 22:14:48.434683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-06 22:14:48.434699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-06 22:14:48.434708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-06 22:14:48.435178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7339 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
以上