2019年9月15日日曜日

Fedora 30でTensorflow 2.0.0rc1(CUDA10.1 cuDNN7.6)をビルドする

2019/9/16 CUDAのHost compilerに指定するGCCのバージョンが間違っていたため修正(@PINTO03091 さんからの指摘)。

目的


Tensorflowの2.0.0rc1をFedora 30でソースビルドする。
2.0.0の正式版リリースに向けての準備と備忘録。


環境


  • Fedora 30 x86_64
  • python 3.7.4(virtualenv)
  • CUDA 10.1 + cuDNN 7.6.3
  • CPU AMD Ryzen 7 1700
  • GPU GeForce GTX 1070


事前の準備


GCC5.5をソースビルドしておく。

理由

CUDAはFedora 30のGCC9.2.1をサポートしていない。
前回はGCC7.3をソースビルドしてCUDAのHostコンパイラーに指定していた。
しかし、2.0.0rc1のビルドではXLA関連でビルドエラーとなり、GCC5.5にしたところビルドできた(CUDAのHostコンパイラーとXLA関連のビルドエラーの詳細は未確認。ただ、前回も同様なこともあった)。



GCC5.5のビルド

実はFedora 30のGCC9.2.1ではGCC5.5をビルドできない(これはFedora 29のGCC8でもおなじ)。
このため、GCC7.3を用意して、GCC5.5をビルドする。

GCC5.5.0のソースをダウンロード。しかし、これをGCC6以上でビルドしようとするとエラーとなってしまうため、パッチを入手して適用してからビルドを行う。

(筆者はたしかFedora 28あたりにぶち当たった...それから環境を残してある...)


Bazelのビルド


Bazelをソースビルドする(リポジトリのバージョンだとTensorflowが期待するバージョンと一致しないことがあるため)。今回は0.25.3をソースビルドした。
ソースビルドの方法は以下を参照。


ビルドが終わったらoutputフォルダのパスをPATHに追加するか、/usr/local/bin配下にoutput/bazelをコピーする。


CUDA、cuDNNのインストール


CUDA: 10.1、cuDNN: 7.6.3をインストール。RPM Fusion Howto/ CUDA を参考にインストールを行う。[CUDA Toolkit]、[Machine Learning repository]でCUDAとcuDNNをインストールする。

$ dnf list cuda
インストール済みパッケージ
cuda.x86_64            10.1.243-1                  @cuda

$ dnf list libcudnn7
インストール済みパッケージ
libcudnn7.x86_64       7.6.3.30-1.cuda10.1         @nvidia-machine-learning


(cuDNNはダウンロードサイトからダウンロード&インストールしたのだが、なぜかビルドでエラーとなってしまった。理由はわからず...)


Tensorflowのビルド


公式の手順通りだけど、まずはvirtualenv(virtualenvwapper)でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストール。

$ mkvirtualenv  --python=python3 tf2.0rc1
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1'
$ pip install keras_applications==1.0.6 --no-deps
$ pip install keras_preprocessing==1.0.5 --no-deps

Githubからv2.0.0-rc1 tagを指定して取得。

git clone -b v2.0.0-rc1 https://github.com/tensorflow/tensorflow.git

configure。
CUDAのサポートを有効とすることと、Host compilerにGCC7 GCC5のgccのパスを指定してあげること。

$ ./configure 
WARNING: Running Bazel server needs to be killed, because the startup options are different.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.25.3- (@non-git) installed.
Please specify the location of python. [Default is /home/xxxx/.virtualenvs/tf2.0rc1/bin/python]: 




Traceback (most recent call last):
  File "", line 1, in 
AttributeError: module 'site' has no attribute 'getsitepackages'
Found possible Python library paths:
  /home/xxxx/.virtualenvs/tf2.0rc1/lib/python3.7/site-packages
Please input the desired Python library path to use.  Default is [/home/xxxx/.virtualenvs/tf2.0rc1/lib/python3.7/site-packages]


Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.


Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.


Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.


Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.


Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.


Found CUDA 10.1 in:
    /usr/local/cuda/lib64
    /usr/local/cuda/include
Found cuDNN 7 in:
    /usr/local/cuda/lib64
    /usr/local/cuda/include




Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1]: 6.1




Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.


Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/lib64/ccache/gcc]: /xxx/xxx/gcc/5.5/bin/gcc




Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 




Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.


Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl             # Build with MKL support.
    --config=monolithic      # Config for mostly static monolithic build.
    --config=gdr             # Build with GDR support.
    --config=verbs           # Build with libverbs support.
    --config=ngraph          # Build with Intel nGraph support.
    --config=numa            # Build with NUMA support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
    --config=v2              # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=noaws           # Disable AWS S3 filesystem support.
    --config=nogcp           # Disable GCP support.
    --config=nohdfs          # Disable HDFS support.
    --config=noignite        # Disable Apache Ignite support.
    --config=nokafka         # Disable Apache Kafka support.
    --config=nonccl          # Disable NVIDIA NCCL support.
Configuration finished

ビルド。
TensorFlow 2.0の場合、オプションに  --config=v2 をつける。
NVIDIA NCCLを無効とするため、--config=nonccl もつける(NCCLのライブラリをインストールしたがビルドエラーとなったので外す)。

$ bazel build --config=opt --config=v2 --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --config=cuda --config=nonccl  --verbose_failures //tensorflow/tools/pip_package:build_pip_package

...

INFO: Elapsed time: 9788.259s, Critical Path: 378.10s
INFO: 25250 processes: 25250 local.
INFO: Build completed successfully, 34867 total actions

ビルドが完了したら、pipパッケージを作成して、インストール!

$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ pip install /tmp/tensorflow_pkg/tensorflow-2.0.0rc1-cp37-cp37m-linux_x86_64.whl 

インストール後の確認

  • tf.__version__が2.0.0-rc1であること。
  • GPUデバイスを認識していること。

$ python
Python 3.7.4 (default, Jul  9 2019, 16:32:37) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.0.0-rc1'
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2019-09-15 22:22:42.479984: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2993965000 Hz
2019-09-15 22:22:42.480879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5604fd404b10 executing computations on platform Host. Devices:
2019-09-15 22:22:42.480911: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-09-15 22:22:42.483667: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-09-15 22:22:42.637759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-15 22:22:42.638947: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5604fd4ae4b0 executing computations on platform CUDA. Devices:
2019-09-15 22:22:42.638973: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1
2019-09-15 22:22:42.639193: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-15 22:22:42.639825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:09:00.0
2019-09-15 22:22:42.644617: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-09-15 22:22:42.697319: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-09-15 22:22:42.731915: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-09-15 22:22:42.741549: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-09-15 22:22:42.796043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-09-15 22:22:42.803769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-09-15 22:22:42.896828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-09-15 22:22:42.897112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-15 22:22:42.898295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-15 22:22:42.899517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-09-15 22:22:42.899595: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-09-15 22:22:42.901482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-15 22:22:42.901502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-09-15 22:22:42.901509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-09-15 22:22:42.901613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-15 22:22:42.902209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-15 22:22:42.902767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 7079 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5999673194408362367
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 3782286546875987284
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 2769759803836561573
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7423911527
locality {
  bus_id: 1
  links {
  }
}
incarnation: 9925876743775063780
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1"
]



0 件のコメント:

コメントを投稿