2020年3月28日土曜日

Fedora 31でTensorflow 2.2-rc1(CUDA10.2 cuDNN7.6.5)をビルドする

目的


Tensorflowの2.2-rc1をFedora 30でソースビルドする。
2.2の正式版リリースに向けての準備と備忘録。


環境


  • Fedora 31 x86_64
  • python 3.7.6(virtualenv)
  • CUDA 10.2 + cuDNN 7.6.5
  • CPU AMD Ryzen 7 1700
  • GPU GeForce GTX 1070


事前の準備


GCC8のビルド


前回の記事と同様、CUDA 10.2がサポートするGCCは8で、Fedora 31のGCC9ではビルドができない。このため、まずはGCC8のビルドを行う。
GCCのビルドについては、以前の記事を参照。


GCC8.4のソースダウンロード&ビルド


以下でビルドを行う。
# GCC build from source
$ wget https://ftp.gnu.org/gnu/gcc/gcc-8.4.0/gcc-8.4.0.tar.gz
$ tar xf gcc-8.4.0.tar.gz
$ ./contrib/download_prerequisites
$ ../configure \
--enable-bootstrap \
--enable-languages=c,c++ \
--prefix=/home/xxx/gcc/8.4 \
--enable-shared \
--enable-threads=posix \
--enable-checking=release \
--disable-multilib \
--with-system-zlib \
--enable-__cxa_atexit \
--disable-libunwind-exceptions \
--enable-gnu-unique-object \
--enable-linker-build-id \
--with-gcc-major-version-only \
--with-linker-hash-style=gnu \
--enable-plugin \
--enable-initfini-array \
--with-isl \
--enable-libmpx \
--enable-gnu-indirect-function \
--build=x86_64-redhat-linux
$ make -j$(nproc)
$ make install


specsファイルの作成


コンパイルしたGCC7でビルドした際に、適切な動的リンクライブラリ(libstdc++.so)がリンクされるようにSPECEファイルを修正する。
$ /home/xxx/gcc/8.4/bin/gcc -dumpspecs > specs
$ vi specs
# before
*link_libgcc:
%D
# after
*link_libgcc:
%{!static:%{!static-libgcc:-rpath /home/nobuo/gcc/8.4/lib64/}} %D
$ mv specs /home/xxx/gcc/8.4/lib/gcc/x86_64-redhat-linux/8/
$ diff -up specs /home/xxx/gcc/8.4/lib/gcc/x86_64-redhat-linux/8/specs
--- specs 2020-03-22 15:27:41.626467627 +0900
+++ /home/xxx/gcc/8.4/lib/gcc/x86_64-redhat-linux/8/specs 2020-03-22 15:25:53.875378926 +0900
@@ -107,7 +107,7 @@ collect2
*link_libgcc:
-%D
+%{!static:%{!static-libgcc:-rpath /home/xxx/gcc/8.4/lib64/}} %D
*md_exec_prefix:


Environment Modulesの設定


GCC8をEnvironment Modulesで切り替えられるようにする。/etc/modulefiles 配下に、gcc8xのファイルを作成する。
※筆者の環境にはソースビルドしたGCC5、GCC7がある。
#%Module 1.0
#
# gcc-8.X module for use with 'environment-modules' package:
#
conflict gcc5x gcc7x
prepend-path PATH /home/xxx/gcc/8.4/bin/


Bazelのビルド


Bazelをソースビルドする(リポジトリのバージョンだとTensorflowが期待するバージョンと一致しないことがあるため)。2.2-rc1はBazel2.0.0が必要なため、ソースビルドした。

公式のソースビルド方法はここを参照。手順どおりであり詳細の説明は割愛。


CUDA、cuDNNのインストール


CUDA: 10.2、cuDNN: 7.6.5をインストール。CUDAはRPM Fusion Howto/ CUDA を参考にインストールを行う。cuDNNはNVIDIAのダウンロードサイトからダウンロード、インストールを行う。


Tensorflowのビルド


さて、本題。TensorFlow 2.2-rc1をビルドする。


virtualenvの設定


まずはvirtualenv(virtualenvwapper)でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストールする。
$ mkvirtualenv -p python3 tf2.2-rc1
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1'
$ pip install keras_applications --no-deps
$ pip install keras_preprocessing --no-deps


ビルド


Githubからソースを取得し、configureスクリプト実行し、ビルドを行う。

  • CUDAのサポートを有効とする。
  • Host compilerにGCC8のgccのパスを指定してあげる。
  • ビルドオプションには"--config=v2"と"--config=nonccl "(NCCLのライブラリをインストールしたがビルドエラーとなったので外す)を指定。
  • 日本語環境では依存関係のダウンロードで失敗したので、ビルド前に"LANG=C"で回避。
$ wget https://github.com/tensorflow/tensorflow/archive/v2.2.0-rc2.tar.gz
$ tar xf v2.2.0-rc2.tar.gz
$ cd tensorflow-2.2.0-rc2/
$ ./configure
You have bazel 2.0.0- (@non-git) installed.
Please specify the location of python. [Default is /home/xxx/.virtualenvs/tf2.2-rc1/bin/python]:
Found possible Python library paths:
/home/xxx/.virtualenvs/tf2.2-rc1/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/home/xxx/.virtualenvs/tf2.2-rc1/lib/python3.7/site-packages]
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.
Found CUDA 10.2 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Found cuDNN 7 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1]:
Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /home/xxx/gcc/8.4/bin/gcc
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
$ LANG=C
$ bazel build \
--config=opt \
--config=v2 \
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
--config=cuda \
--config=nonccl \
--verbose_failures \
//tensorflow/tools/pip_package:build_pip_package
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ pip install /tmp/tensorflow_pkg/tensorflow-2.2.0rc1-cp37-cp37m-linux_x86_64.whl

なお、エラーは以下であった。

ERROR: An error occurred during the fetch of repository 'local_config_cuda':
Traceback (most recent call last):
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 1210
_create_local_cuda_repository(<1 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
Traceback (most recent call last):
File "script.py", line 88, in <module>
main()
File "script.py", line 77, in main
check_cuda_lib(path, check_soname=args[i + 1] == "True")
File "script.py", line 62, in check_cuda_lib
output = subprocess.check_output([objdump, "-p", path]).decode("ascii")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 46: ordinal not in range(128)
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 1210
_create_local_cuda_repository(<1 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
Traceback (most recent call last):
File "script.py", line 88, in <module>
main()
File "script.py", line 77, in main
check_cuda_lib(path, check_soname=args[i + 1] == "True")
File "script.py", line 62, in check_cuda_lib
output = subprocess.check_output([objdump, "-p", path]).decode("ascii")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 46: ordinal not in range(128)
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 1210
_create_local_cuda_repository(<1 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
Traceback (most recent call last):
File "script.py", line 88, in <module>
main()
File "script.py", line 77, in main
check_cuda_lib(path, check_soname=args[i + 1] == "True")
File "script.py", line 62, in check_cuda_lib
output = subprocess.check_output([objdump, "-p", path]).decode("ascii")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 46: ordinal not in range(128)
INFO: Elapsed time: 17.516s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
currently loading: tensorflow/tools/pip_package


インストール確認


  • tf.__version__が2.2-rc1であること。
  • GPUデバイスを認識していること。

$ python
Python 3.7.6 (default, Jan 30 2020, 09:44:41)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-03-28 20:33:39.876383: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
>>> tf.__version__
'2.2.0-rc1'
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2020-03-28 20:33:53.907523: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2993890000 Hz
2020-03-28 20:33:53.909094: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f4490000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-28 20:33:53.909175: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-03-28 20:33:53.913388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-03-28 20:33:54.328519: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-28 20:33:54.329111: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5636ff009c50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-28 20:33:54.329200: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1
2020-03-28 20:33:54.330516: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-28 20:33:54.331671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2020-03-28 20:33:54.331729: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-03-28 20:33:54.383662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-28 20:33:54.414185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-28 20:33:54.421131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-28 20:33:54.487881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-28 20:33:54.495069: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-28 20:33:54.600519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-28 20:33:54.600790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-28 20:33:54.602014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-28 20:33:54.602748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-03-28 20:33:54.603373: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-03-28 20:33:55.428570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-28 20:33:55.428613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-03-28 20:33:55.428620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-03-28 20:33:55.429484: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-28 20:33:55.430019: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-28 20:33:55.430531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 7019 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1662223253576067142
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 987987008933275735
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 6863376769852385113
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7360981248
locality {
bus_id: 1
links {
}
}
incarnation: 11906225243831981649
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1"
]
>>>

OK!

0 件のコメント:

コメントを投稿