目的
Tensorflowの2.0.0rc1をFedora 30でソースビルドする。
2.0.0の正式版リリースに向けての準備と備忘録。
環境
- Fedora 30 x86_64
- python 3.7.4(virtualenv)
- CUDA 10.1 + cuDNN 7.6.3
- CPU AMD Ryzen 7 1700
- GPU GeForce GTX 1070
事前の準備
GCC5.5をソースビルドしておく。
理由
TensorFlowの1.13.1をFedora29でビルドするとXLA関連でエラーとなる問題(https://t.co/IPCEgEZsqS)があるのだけど、GCCのバージョンを5.5でビルドしたら成功した。— nb.o (@Nextremer_nb_o) March 9, 2019
GCCのバージョンに依存した問題の模様。
GCC8(package)NG
GCC7(source build)NG
GCC5(source build)OK!
GCC5.5のビルド
実はFedora 30のGCC9.2.1ではGCC5.5をビルドできない(これはFedora 29のGCC8でもおなじ)。
このため、GCC7.3を用意して、GCC5.5をビルドする。
このため、GCC7.3を用意して、GCC5.5をビルドする。
(筆者はたしかFedora 28あたりにぶち当たった...それから環境を残してある...)
Bazelのビルド
Bazelをソースビルドする(リポジトリのバージョンだとTensorflowが期待するバージョンと一致しないことがあるため)。今回は0.25.3をソースビルドした。
ソースビルドの方法は以下を参照。
ビルドが終わったらoutputフォルダのパスをPATHに追加するか、/usr/local/bin配下にoutput/bazelをコピーする。
CUDA、cuDNNのインストール
CUDA: 10.1、cuDNN: 7.6.3をインストール。RPM Fusion Howto/ CUDA を参考にインストールを行う。[CUDA Toolkit]、[Machine Learning repository]でCUDAとcuDNNをインストールする。
(cuDNNはダウンロードサイトからダウンロード&インストールしたのだが、なぜかビルドでエラーとなってしまった。理由はわからず...)
公式の手順通りだけど、まずはvirtualenv(virtualenvwapper)でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストール。
Githubからv2.0.0-rc1 tagを指定して取得。
configure。
CUDAのサポートを有効とすることと、Host compilerにGCC7 GCC5のgccのパスを指定してあげること。
ビルド。
TensorFlow 2.0の場合、オプションに --config=v2 をつける。
NVIDIA NCCLを無効とするため、--config=nonccl もつける(NCCLのライブラリをインストールしたがビルドエラーとなったので外す)。
ビルドが完了したら、pipパッケージを作成して、インストール!
インストール後の確認
$ dnf list cuda インストール済みパッケージ cuda.x86_64 10.1.243-1 @cuda $ dnf list libcudnn7 インストール済みパッケージ libcudnn7.x86_64 7.6.3.30-1.cuda10.1 @nvidia-machine-learning
(cuDNNはダウンロードサイトからダウンロード&インストールしたのだが、なぜかビルドでエラーとなってしまった。理由はわからず...)
Tensorflowのビルド
$ mkvirtualenv --python=python3 tf2.0rc1 $ pip install pip six numpy wheel setuptools mock 'future>=0.17.1' $ pip install keras_applications==1.0.6 --no-deps $ pip install keras_preprocessing==1.0.5 --no-deps
Githubからv2.0.0-rc1 tagを指定して取得。
git clone -b v2.0.0-rc1 https://github.com/tensorflow/tensorflow.git
configure。
CUDAのサポートを有効とすることと、Host compilerに
$ ./configure WARNING: Running Bazel server needs to be killed, because the startup options are different. WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.25.3- (@non-git) installed. Please specify the location of python. [Default is /home/xxxx/.virtualenvs/tf2.0rc1/bin/python]: Traceback (most recent call last): File "", line 1, in AttributeError: module 'site' has no attribute 'getsitepackages' Found possible Python library paths: /home/xxxx/.virtualenvs/tf2.0rc1/lib/python3.7/site-packages Please input the desired Python library path to use. Default is [/home/xxxx/.virtualenvs/tf2.0rc1/lib/python3.7/site-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: Y CUDA support will be enabled for TensorFlow. Do you wish to build TensorFlow with TensorRT support? [y/N]: No TensorRT support will be enabled for TensorFlow. Found CUDA 10.1 in: /usr/local/cuda/lib64 /usr/local/cuda/include Found cuDNN 7 in: /usr/local/cuda/lib64 /usr/local/cuda/include Please specify a list of comma-separated CUDA compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1]: 6.1 Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/lib64/ccache/gcc]: /xxx/xxx/gcc/5.5/bin/gcc Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=numa # Build with NUMA support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. --config=v2 # Build TensorFlow 2.x instead of 1.x. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apache Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished
ビルド。
TensorFlow 2.0の場合、オプションに --config=v2 をつける。
NVIDIA NCCLを無効とするため、--config=nonccl もつける(NCCLのライブラリをインストールしたがビルドエラーとなったので外す)。
$ bazel build --config=opt --config=v2 --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --config=cuda --config=nonccl --verbose_failures //tensorflow/tools/pip_package:build_pip_package ... INFO: Elapsed time: 9788.259s, Critical Path: 378.10s INFO: 25250 processes: 25250 local. INFO: Build completed successfully, 34867 total actions
ビルドが完了したら、pipパッケージを作成して、インストール!
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg $ pip install /tmp/tensorflow_pkg/tensorflow-2.0.0rc1-cp37-cp37m-linux_x86_64.whl
インストール後の確認
- tf.__version__が2.0.0-rc1であること。
- GPUデバイスを認識していること。
$ python Python 3.7.4 (default, Jul 9 2019, 16:32:37) [GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> tf.__version__ '2.0.0-rc1' >>> from tensorflow.python.client import device_lib >>> device_lib.list_local_devices() 2019-09-15 22:22:42.479984: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2993965000 Hz 2019-09-15 22:22:42.480879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5604fd404b10 executing computations on platform Host. Devices: 2019-09-15 22:22:42.480911: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2019-09-15 22:22:42.483667: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2019-09-15 22:22:42.637759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-15 22:22:42.638947: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5604fd4ae4b0 executing computations on platform CUDA. Devices: 2019-09-15 22:22:42.638973: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1 2019-09-15 22:22:42.639193: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-15 22:22:42.639825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:09:00.0 2019-09-15 22:22:42.644617: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2019-09-15 22:22:42.697319: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2019-09-15 22:22:42.731915: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2019-09-15 22:22:42.741549: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2019-09-15 22:22:42.796043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2019-09-15 22:22:42.803769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2019-09-15 22:22:42.896828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-09-15 22:22:42.897112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-15 22:22:42.898295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-15 22:22:42.899517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-09-15 22:22:42.899595: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2019-09-15 22:22:42.901482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-09-15 22:22:42.901502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-09-15 22:22:42.901509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-09-15 22:22:42.901613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-15 22:22:42.902209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-15 22:22:42.902767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 7079 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 5999673194408362367 , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 3782286546875987284 physical_device_desc: "device: XLA_CPU device" , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 2769759803836561573 physical_device_desc: "device: XLA_GPU device" , name: "/device:GPU:0" device_type: "GPU" memory_limit: 7423911527 locality { bus_id: 1 links { } } incarnation: 9925876743775063780 physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1" ]