ただし、XLAをdisableにしないとビルドできない問題がある(以下のissue)。
- https://github.com/tensorflow/tensorflow/issues/26155
- https://github.com/tensorflow/tensorflow/issues/24373
Fedora29でもソースビルドしなくてもpipでインストールできるようになった(はず)。
気にしなければ、pipでインストールするかnvidia-docker上に構築するほうがよい。
環境
- Fedora29 x86_64
- python3.7(virtualenv)
- CUDA10.0 + cuDNN7.5
- GPU=>NVIDIA GTX1080
事前の準備
- CUDA10.0+cuDNN7.5をインストールしておく。
CUDAはRPM Fusionのサイトを参考にインストール。
https://rpmfusion.org/Howto/CUDA?highlight=%28CategoryHowto%29
CUDA10.1はライブラリのインストールパスが変わった関係でビルドできない。
(https://github.com/tensorflow/tensorflow/issues/26150) - GCC7.3をソースビルドしておく。
CUDAはFedora29のGCC8はサポートしていない。
前の記事を参考にGCC7.3をソースビルドしておく。
BAZELのビルド
Bazelもソースビルドする(リポジトリのバージョンだとTensorflowが期待するバージョンと一致しない)。
Bazelの19.2〜22.0までのバージョであればビルド可能。
今回は19.2をソースビルドした。
ソースビルドの方法は以下を参照。
ビルドが終わったらoutputフォルダのパスをPATHに追加しておく。
$ export PATH=$PATH:"bazel dir"output
Tensorflowのビルド
$ mkvirtualenv tf -p python3 $ pip3 install pip six numpy wheel mock $ pip3 install -U keras_applications==1.0.6 --no-deps $ pip install -U keras_preprocessing==1.0.5 --no-deps
githubからソースを取得。
wget https://github.com/tensorflow/tensorflow/archive/v1.13.1.tar.gz
configureでは、CUDAのサポートを有効とすることと、Host compilerにGCC7のgccのパスを指定してあげること。
$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". INFO: Invocation ID: 557cf704-5d53-4a73-8118-153c5d42f71e You have bazel 0.21.0- (@non-git) installed. Please specify the location of python. [Default is /home/xxxxx/.virtualenvs/tf/bin/python]: Traceback (most recent call last): File "", line 1, in AttributeError: module 'site' has no attribute 'getsitepackages' Found possible Python library paths: /home/xxxxx/.virtualenvs/tf/lib/python3.7/site-packages Please input the desired Python library path to use. Default is [/home/xxxxx/.virtualenvs/tf/lib/python3.7/site-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: Y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: No TensorRT support will be enabled for TensorFlow. Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/lib64/ccache/gcc]: /home/xxxxx/gcc/7.3/bin/gcc Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished
ビルド。
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg $ pip3 install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp37-cp37m-linux_x86_64.whl
あとは2時間待つ
ビルドが完了したらpipパッケージを作成して、インストール!
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg $ pip3 install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp37-cp37m-linux_x86_64.whl
インストール後の確認
- tf.__version__が1.13.1であること。
- tf.Session()でGPUデバイスを認識していること。
$ python3 Python 3.7.2 (default, Jan 16 2019, 19:49:22) [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> tf.__version__ '1.13.1' >>> tf.Session() 2019-03-06 22:14:48.432753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-06 22:14:48.433565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:09:00.0 totalMemory: 7.93GiB freeMemory: 7.54GiB 2019-03-06 22:14:48.433588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-06 22:14:48.434683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-06 22:14:48.434699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-06 22:14:48.434708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-06 22:14:48.435178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7339 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
以上
0 件のコメント:
コメントを投稿