nb.oの日記: Fedora 34 でTensorflow 2.5-rc2（CUDA11.3 cuDNN8.2.0）をビルドする

目的

Tensorflowの2.5-rc2をFedora 34でソースビルドする。

2.5の正式版リリースに向けての準備と備忘録。

環境

Fedora 34 x86_64
Python 3.9.4（virtualenv）
GCC 11.0.1 20210324
CUDA 11.3 + cuDNN 8.2.0
CPU AMD Ryzen 7 1700
GPU GeForce GTX 1070

事前の準備

GCC10.2のビルド

前回の記事と同様、CUDA 11.3がサポートするGCCは10.2.1で、Fedora 34のGCC11ではビルドができない。このため、まずはGCC10のビルドを行う。

GCCのビルドについては、以前の記事を参照。

GCC9.3のソースダウンロード&ビルド

ソースをダウンロードし、ビルドする。

$ wget http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
$ tar xf gcc-10.2.0.tar.gz
$ cd gcc-10.2.0/
$ ./contrib/download_prerequisites 
$ mkdir build
$ cd build/
$ ../configure \
    --enable-bootstrap \
    --enable-languages=c,c++ \
    --prefix=/home/xxxxx/gcc/10.2 \
    --enable-shared \
    --enable-threads=posix \
    --enable-checking=release \
    --disable-multilib \
    --with-system-zlib \
    --enable-__cxa_atexit \
    --disable-libunwind-exceptions \
    --enable-gnu-unique-object \
    --enable-linker-build-id \
    --with-gcc-major-version-only \
    --with-linker-hash-style=gnu \
    --enable-plugin \
    --enable-initfini-array \
    --with-isl \
    --enable-libmpx \
    --enable-gnu-indirect-function \
    --build=x86_64-redhat-linux
$ make -j16
$ make install

specsファイルの作成

コンパイルしたGCC9でビルドした際に、適切な動的リンクライブラリ（libstdc++.so）がリンクされるようにSPECEファイルを修正する。

$ /home/xxxxx/gcc/10.2/bin/gcc -dumpspecs > specs
$ vi specs

# before
*link_libgcc:
%D

# after
*link_libgcc:
%{!static:%{!static-libgcc:-rpath /home/xxxx/gcc/10.2/lib64/}} %D

$ mv specs /home/xxxx/gcc/10.2/lib/gcc/x86_64-redhat-linux/10/

Environment Modulesの設定

GCC10をEnvironment Modulesで切り替えられるようにする。environment-modulesをインストール後、/etc/modulefiles 配下に、gcc10xのファイルを作成する。

$ sudo dnf install environment-modules

$ sudo vi /etc/modulefiles/gcc10x

#%Module 1.0
#
#  gcc-10.X module for use with 'environment-modules' package:
#

prepend-path    PATH                    /home/xxxxx/gcc/10.2/bin/

Bazelのビルド

Bazel をソースビルドする。最新の3.7.4をソースビルドした。

公式のソースビルド方法はここを参照。手順どおりであり詳細の説明は割愛。

CUDA、cuDNNのインストール

CUDA: 11.3、cuDNN: 8.2.0をインストール。

CUDAはRPM Fusion Howto/ CUDA を参考にインストールを行う。
cuDNNはNVIDIAのダウンロードサイトからダウンロード、インストールを行う。
2021.5.1時点ではRPM Fusion（Machine Learning repository）からインストールできるcuDNNがCUDA11.1の組み合わせで古いため。

Tensorflowのビルド

さて、本題。TensorFlow 2.5-rc2をビルドする。

virtualenvの設定

まずはvirtualenv（virtualenvwapper）でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストールする。

$ mkvirtualenv -p python3  tf2.5-rc2
$ pip install pip numpy wheel
$ pip install keras_preprocessing --no-deps

ビルド

Githubからソースを取得し、configureスクリプト実行し、ビルドを行う。

CUDAのサポートを有効とする。
Host compilerにGCC9のgccのパスを指定してあげる。
ビルドオプションには"--config=v2"と"--config=nonccl "を指定。

(tf2.5-rc1) $ git clone -b r2.5 https://github.com/tensorflow/tensorflow.git
(tf2.5-rc2) $ cd tensorflow
(tf2.5-rc2) $ ./configure 
(tf2.5-rc2) $ bazel build \
    --config=cuda \
    --config=v2 \
    --config=nonccl \
    --config=opt \
    //tensorflow/tools/pip_package:build_pip_package
(tf2.5-rc2) $ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
(tf2.5-rc2) $ pip install /tmp/tensorflow_pkg/tensorflow-2.5.0rc2-cp39-cp39-linux_x86_64.whl

インストール確認

tf.__version__が2.5-rc2であること。
GPUデバイスを認識していること。

(tf2.5-rc2) $ python
Python 3.9.4 (default, Apr  6 2021, 00:00:00) 
[GCC 11.0.1 20210324 (Red Hat 11.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-05-01 21:07:37.359112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2
>>> tf.__version__
'2.5.0-rc2'
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2021-05-01 21:08:05.854665: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-01 21:08:05.865237: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-01 21:08:05.908808: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-01 21:08:05.909879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2021-05-01 21:08:05.909938: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-01 21:08:05.942599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-05-01 21:08:05.942733: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-05-01 21:08:05.953447: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-05-01 21:08:05.970201: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-05-01 21:08:05.982505: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-05-01 21:08:05.994212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-05-01 21:08:05.995767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-05-01 21:08:05.995901: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-01 21:08:05.996970: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-01 21:08:05.997553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-05-01 21:08:05.997592: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-01 21:08:06.441157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-01 21:08:06.441208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-05-01 21:08:06.441217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-05-01 21:08:06.441408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-01 21:08:06.441949: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-01 21:08:06.442510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-01 21:08:06.442984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 6992 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:0a:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17887149015766682436
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7332429824
locality {
  bus_id: 1
  links {
  }
}
incarnation: 2150787946068778776
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:0a:00.0, compute capability: 6.1"
]
>>>

OK!

nb.oの日記

2021年5月1日土曜日

Fedora 34 でTensorflow 2.5-rc2（CUDA11.3 cuDNN8.2.0）をビルドする

目的

環境