目次[非表示]
目的
Tensorflowの2.2-rc1をFedora 30でソースビルドする。
2.2の正式版リリースに向けての準備と備忘録。
環境
- Fedora 31 x86_64
- python 3.7.6(virtualenv)
- CUDA 10.2 + cuDNN 7.6.5
- CPU AMD Ryzen 7 1700
- GPU GeForce GTX 1070
事前の準備
GCC8のビルド
前回の記事と同様、CUDA 10.2がサポートするGCCは8で、Fedora 31のGCC9ではビルドができない。このため、まずはGCC8のビルドを行う。
GCCのビルドについては、以前の記事を参照。
GCCのビルドについては、以前の記事を参照。
GCC8.4のソースダウンロード&ビルド
以下でビルドを行う。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# GCC build from source | |
$ wget https://ftp.gnu.org/gnu/gcc/gcc-8.4.0/gcc-8.4.0.tar.gz | |
$ tar xf gcc-8.4.0.tar.gz | |
$ ./contrib/download_prerequisites | |
$ ../configure \ | |
--enable-bootstrap \ | |
--enable-languages=c,c++ \ | |
--prefix=/home/xxx/gcc/8.4 \ | |
--enable-shared \ | |
--enable-threads=posix \ | |
--enable-checking=release \ | |
--disable-multilib \ | |
--with-system-zlib \ | |
--enable-__cxa_atexit \ | |
--disable-libunwind-exceptions \ | |
--enable-gnu-unique-object \ | |
--enable-linker-build-id \ | |
--with-gcc-major-version-only \ | |
--with-linker-hash-style=gnu \ | |
--enable-plugin \ | |
--enable-initfini-array \ | |
--with-isl \ | |
--enable-libmpx \ | |
--enable-gnu-indirect-function \ | |
--build=x86_64-redhat-linux | |
$ make -j$(nproc) | |
$ make install |
specsファイルの作成
コンパイルしたGCC7でビルドした際に、適切な動的リンクライブラリ(libstdc++.so)がリンクされるようにSPECEファイルを修正する。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ /home/xxx/gcc/8.4/bin/gcc -dumpspecs > specs | |
$ vi specs | |
# before | |
*link_libgcc: | |
%D | |
# after | |
*link_libgcc: | |
%{!static:%{!static-libgcc:-rpath /home/nobuo/gcc/8.4/lib64/}} %D | |
$ mv specs /home/xxx/gcc/8.4/lib/gcc/x86_64-redhat-linux/8/ | |
$ diff -up specs /home/xxx/gcc/8.4/lib/gcc/x86_64-redhat-linux/8/specs | |
--- specs 2020-03-22 15:27:41.626467627 +0900 | |
+++ /home/xxx/gcc/8.4/lib/gcc/x86_64-redhat-linux/8/specs 2020-03-22 15:25:53.875378926 +0900 | |
@@ -107,7 +107,7 @@ collect2 | |
*link_libgcc: | |
-%D | |
+%{!static:%{!static-libgcc:-rpath /home/xxx/gcc/8.4/lib64/}} %D | |
*md_exec_prefix: |
Environment Modulesの設定
GCC8をEnvironment Modulesで切り替えられるようにする。/etc/modulefiles 配下に、gcc8xのファイルを作成する。
※筆者の環境にはソースビルドしたGCC5、GCC7がある。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#%Module 1.0 | |
# | |
# gcc-8.X module for use with 'environment-modules' package: | |
# | |
conflict gcc5x gcc7x | |
prepend-path PATH /home/xxx/gcc/8.4/bin/ |
Bazelのビルド
Bazelをソースビルドする(リポジトリのバージョンだとTensorflowが期待するバージョンと一致しないことがあるため)。2.2-rc1はBazel2.0.0が必要なため、ソースビルドした。
公式のソースビルド方法はここを参照。手順どおりであり詳細の説明は割愛。
CUDA、cuDNNのインストール
CUDA: 10.2、cuDNN: 7.6.5をインストール。CUDAはRPM Fusion Howto/ CUDA を参考にインストールを行う。cuDNNはNVIDIAのダウンロードサイトからダウンロード、インストールを行う。
Tensorflowのビルド
さて、本題。TensorFlow 2.2-rc1をビルドする。
virtualenvの設定
まずはvirtualenv(virtualenvwapper)でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストールする。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ mkvirtualenv -p python3 tf2.2-rc1 | |
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1' | |
$ pip install keras_applications --no-deps | |
$ pip install keras_preprocessing --no-deps |
ビルド
Githubからソースを取得し、configureスクリプト実行し、ビルドを行う。
- CUDAのサポートを有効とする。
- Host compilerにGCC8のgccのパスを指定してあげる。
- ビルドオプションには"--config=v2"と"--config=nonccl "(NCCLのライブラリをインストールしたがビルドエラーとなったので外す)を指定。
- 日本語環境では依存関係のダウンロードで失敗したので、ビルド前に"LANG=C"で回避。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ wget https://github.com/tensorflow/tensorflow/archive/v2.2.0-rc2.tar.gz | |
$ tar xf v2.2.0-rc2.tar.gz | |
$ cd tensorflow-2.2.0-rc2/ | |
$ ./configure | |
You have bazel 2.0.0- (@non-git) installed. | |
Please specify the location of python. [Default is /home/xxx/.virtualenvs/tf2.2-rc1/bin/python]: | |
Found possible Python library paths: | |
/home/xxx/.virtualenvs/tf2.2-rc1/lib/python3.7/site-packages | |
Please input the desired Python library path to use. Default is [/home/xxx/.virtualenvs/tf2.2-rc1/lib/python3.7/site-packages] | |
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: | |
No OpenCL SYCL support will be enabled for TensorFlow. | |
Do you wish to build TensorFlow with ROCm support? [y/N]: | |
No ROCm support will be enabled for TensorFlow. | |
Do you wish to build TensorFlow with CUDA support? [y/N]: Y | |
CUDA support will be enabled for TensorFlow. | |
Do you wish to build TensorFlow with TensorRT support? [y/N]: | |
No TensorRT support will be enabled for TensorFlow. | |
Found CUDA 10.2 in: | |
/usr/local/cuda/lib64 | |
/usr/local/cuda/include | |
Found cuDNN 7 in: | |
/usr/local/cuda/lib64 | |
/usr/local/cuda/include | |
Please specify a list of comma-separated CUDA compute capabilities you want to build with. | |
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. | |
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1]: | |
Do you want to use clang as CUDA compiler? [y/N]: | |
nvcc will be used as CUDA compiler. | |
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /home/xxx/gcc/8.4/bin/gcc | |
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: | |
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: | |
Not configuring the WORKSPACE for Android builds. | |
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. | |
--config=mkl # Build with MKL support. | |
--config=monolithic # Config for mostly static monolithic build. | |
--config=ngraph # Build with Intel nGraph support. | |
--config=numa # Build with NUMA support. | |
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. | |
--config=v2 # Build TensorFlow 2.x instead of 1.x. | |
Preconfigured Bazel build configs to DISABLE default on features: | |
--config=noaws # Disable AWS S3 filesystem support. | |
--config=nogcp # Disable GCP support. | |
--config=nohdfs # Disable HDFS support. | |
--config=nonccl # Disable NVIDIA NCCL support. | |
Configuration finished | |
$ LANG=C | |
$ bazel build \ | |
--config=opt \ | |
--config=v2 \ | |
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \ | |
--config=cuda \ | |
--config=nonccl \ | |
--verbose_failures \ | |
//tensorflow/tools/pip_package:build_pip_package | |
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg | |
$ pip install /tmp/tensorflow_pkg/tensorflow-2.2.0rc1-cp37-cp37m-linux_x86_64.whl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ERROR: An error occurred during the fetch of repository 'local_config_cuda': | |
Traceback (most recent call last): | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 1210 | |
_create_local_cuda_repository(<1 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository | |
_find_libs(repository_ctx, <2 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs | |
_check_cuda_libs(repository_ctx, <2 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs | |
execute(repository_ctx, <1 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/remote_config/common.bzl", line 208, in execute | |
fail(<1 more arguments>) | |
Repository command failed | |
Traceback (most recent call last): | |
File "script.py", line 88, in <module> | |
main() | |
File "script.py", line 77, in main | |
check_cuda_lib(path, check_soname=args[i + 1] == "True") | |
File "script.py", line 62, in check_cuda_lib | |
output = subprocess.check_output([objdump, "-p", path]).decode("ascii") | |
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 46: ordinal not in range(128) | |
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package '@local_config_cuda//cuda': Traceback (most recent call last): | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 1210 | |
_create_local_cuda_repository(<1 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository | |
_find_libs(repository_ctx, <2 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs | |
_check_cuda_libs(repository_ctx, <2 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs | |
execute(repository_ctx, <1 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/remote_config/common.bzl", line 208, in execute | |
fail(<1 more arguments>) | |
Repository command failed | |
Traceback (most recent call last): | |
File "script.py", line 88, in <module> | |
main() | |
File "script.py", line 77, in main | |
check_cuda_lib(path, check_soname=args[i + 1] == "True") | |
File "script.py", line 62, in check_cuda_lib | |
output = subprocess.check_output([objdump, "-p", path]).decode("ascii") | |
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 46: ordinal not in range(128) | |
WARNING: Target pattern parsing failed. | |
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last): | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 1210 | |
_create_local_cuda_repository(<1 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository | |
_find_libs(repository_ctx, <2 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs | |
_check_cuda_libs(repository_ctx, <2 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs | |
execute(repository_ctx, <1 more arguments>) | |
File "/home/nobuo/tensorflow-2.2.0-rc2/third_party/remote_config/common.bzl", line 208, in execute | |
fail(<1 more arguments>) | |
Repository command failed | |
Traceback (most recent call last): | |
File "script.py", line 88, in <module> | |
main() | |
File "script.py", line 77, in main | |
check_cuda_lib(path, check_soname=args[i + 1] == "True") | |
File "script.py", line 62, in check_cuda_lib | |
output = subprocess.check_output([objdump, "-p", path]).decode("ascii") | |
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 46: ordinal not in range(128) | |
INFO: Elapsed time: 17.516s | |
INFO: 0 processes. | |
FAILED: Build did NOT complete successfully (0 packages loaded) | |
currently loading: tensorflow/tools/pip_package |
インストール確認
- tf.__version__が2.2-rc1であること。
- GPUデバイスを認識していること。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.7.6 (default, Jan 30 2020, 09:44:41) | |
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> import tensorflow as tf | |
2020-03-28 20:33:39.876383: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2 | |
>>> tf.__version__ | |
'2.2.0-rc1' | |
>>> from tensorflow.python.client import device_lib | |
>>> device_lib.list_local_devices() | |
2020-03-28 20:33:53.907523: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2993890000 Hz | |
2020-03-28 20:33:53.909094: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f4490000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices: | |
2020-03-28 20:33:53.909175: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version | |
2020-03-28 20:33:53.913388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 | |
2020-03-28 20:33:54.328519: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 20:33:54.329111: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5636ff009c50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: | |
2020-03-28 20:33:54.329200: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1 | |
2020-03-28 20:33:54.330516: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 20:33:54.331671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: | |
pciBusID: 0000:09:00.0 name: GeForce GTX 1070 computeCapability: 6.1 | |
coreClock: 1.7085GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s | |
2020-03-28 20:33:54.331729: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2 | |
2020-03-28 20:33:54.383662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 | |
2020-03-28 20:33:54.414185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 | |
2020-03-28 20:33:54.421131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 | |
2020-03-28 20:33:54.487881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 | |
2020-03-28 20:33:54.495069: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 | |
2020-03-28 20:33:54.600519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 | |
2020-03-28 20:33:54.600790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 20:33:54.602014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 20:33:54.602748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 | |
2020-03-28 20:33:54.603373: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2 | |
2020-03-28 20:33:55.428570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: | |
2020-03-28 20:33:55.428613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 | |
2020-03-28 20:33:55.428620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N | |
2020-03-28 20:33:55.429484: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 20:33:55.430019: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 20:33:55.430531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 7019 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1) | |
[name: "/device:CPU:0" | |
device_type: "CPU" | |
memory_limit: 268435456 | |
locality { | |
} | |
incarnation: 1662223253576067142 | |
, name: "/device:XLA_CPU:0" | |
device_type: "XLA_CPU" | |
memory_limit: 17179869184 | |
locality { | |
} | |
incarnation: 987987008933275735 | |
physical_device_desc: "device: XLA_CPU device" | |
, name: "/device:XLA_GPU:0" | |
device_type: "XLA_GPU" | |
memory_limit: 17179869184 | |
locality { | |
} | |
incarnation: 6863376769852385113 | |
physical_device_desc: "device: XLA_GPU device" | |
, name: "/device:GPU:0" | |
device_type: "GPU" | |
memory_limit: 7360981248 | |
locality { | |
bus_id: 1 | |
links { | |
} | |
} | |
incarnation: 11906225243831981649 | |
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1" | |
] | |
>>> |
OK!
0 件のコメント:
コメントを投稿