nb.oの日記: Fedora 33でTensorflow 2.4-rc2（CUDA11.1 cuDNN8.0.5）をビルドする

目的

Tensorflowの2.4-rc2をFedora 33でソースビルドする。

2.4の正式版リリースに向けての準備と備忘録。

環境

Fedora 33 x86_64
python 3.9.0（virtualenv）
GCC 10.2.1
CUDA 11.1 + cuDNN 8.0.5
CPU AMD Ryzen 7 1700
GPU GeForce GTX 1070

事前の準備

GCC9のビルド

前回の記事と同様、CUDA 11.1がサポートするGCCは9で、Fedora 33のGCC10ではビルドができない。このため、まずはGCC9のビルドを行う。

GCCのビルドについては、以前の記事を参照。

GCC9.3のソースダウンロード&ビルド

ソースをダウンロードし、ビルドする。

$ wget https://ftp.gnu.org/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar.gz
$ cd gcc-9.3.0/
$ ./contrib/download_prerequisites 
$ mkdir build
$ cd build/
$ ../configure \
    --enable-bootstrap \
    --enable-languages=c,c++ \
    --prefix=/home/xxxx/gcc/9.3 
    --enable-shared \
    --enable-threads=posix \
    --enable-checking=release \
    --disable-multilib \
    --with-system-zlib \
    --enable-__cxa_atexit \
    --disable-libunwind-exceptions \
    --enable-gnu-unique-object \
    --enable-linker-build-id \
    --with-gcc-major-version-only \
    --with-linker-hash-style=gnu \
    --enable-plugin \
    --enable-initfini-array \
    --with-isl \
    --enable-libmpx \
    --enable-gnu-indirect-function \
    --build=x86_64-redhat-linux
$ make -j16
$ make install

specsファイルの作成

コンパイルしたGCC9でビルドした際に、適切な動的リンクライブラリ（libstdc++.so）がリンクされるようにSPECEファイルを修正する。

$ /home/xxxx/gcc/9.3/bin/gcc -dumpspecs > specs
$ vi specs

# before
*link_libgcc:
%D

# after
*link_libgcc:
%{!static:%{!static-libgcc:-rpath /home/xxxx/gcc/9.3/lib64/}} %D

$ mv specs /home/xxxx/gcc/9.3/lib/gcc/x86_64-redhat-linux/9/

Environment Modulesの設定

GCC8をEnvironment Modulesで切り替えられるようにする。/etc/modulefiles 配下に、gcc9xのファイルを作成する。

#%Module 1.0
#
#  gcc-9.X module for use with 'environment-modules' package:
#

conflict        gcc5x gcc7x gcc9x
prepend-path    PATH                    /home/xxxx/gcc/9.3/bin/

Bazelのビルド

Bazel をソースビルドする。最新の3.7をソースビルドした。

公式のソースビルド方法はここを参照。手順どおりであり詳細の説明は割愛。

CUDA、cuDNNのインストール

CUDA: 11.1、cuDNN: 8.0.5をインストール。CUDAはRPM Fusion Howto/ CUDA を参考にインストールを行う。Which driver Packageにもあるとおり、RPM FusionとCUDAのリポジトリの両方にnvidia driverが存在するが、バージョンの不一致を起こしてしまうことがある。手順どおりにインストールしないとCUDAとnvidia driverの不一致で使えない。。。

cuDNNはNVIDIAのダウンロードサイトからダウンロード、インストールを行う。

Tensorflowのビルド

さて、本題。TensorFlow 2.4-rc1をビルドする。

virtualenvの設定

まずはvirtualenv（virtualenvwapper）でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストールする。

$ mkvirtualenv -p python3  tf2.4-rc2
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1'
$ pip install keras_applications --no-deps
$ pip install keras_preprocessing --no-deps

ビルド

Githubからソースを取得し、configureスクリプト実行し、ビルドを行う。

CUDAのサポートを有効とする。
Host compilerにGCC9のgccのパスを指定してあげる。
ビルドオプションには"--config=v2"と"--config=nonccl "を指定。

(tf2.4-rc1) $ wget https://github.com/tensorflow/tensorflow/archive/v2.4.0-rc1.tar.gz
(tf2.4-rc1) $ tar xf v2.4.0-rc1.tar.gz 
(tf2.4-rc1) $ cd tensorflow-2.4.0-rc1/
(tf2.4-rc1) $ ./configure 
(tf2.4-rc1) $ bazel build \
                --config=opt \
                --config=v2 \
                --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
                --config=cuda \
                --config=nonccl \
                --verbose_failures \
                //tensorflow/tools/pip_package:build_pip_package
(tf2.4-rc1) $ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
(tf2.4-rc1) $ pip install /tmp/tensorflow_pkg/tensorflow-2.4.0rc2-cp39-cp39m-linux_x86_64.whl

インストール確認

tf.__version__が2.4-rc1であること。
GPUデバイスを認識していること。

(tf2.4-rc1) $ python
    Python 3.9.0 (default, Oct  6 2020, 00:00:00) 
[GCC 10.2.1 20200826 (Red Hat 10.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-11-21 09:17:05.361081: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2020-11-21 09:17:17.759422: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-21 09:17:17.800045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:17.803501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2020-11-21 09:17:17.803527: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-21 09:17:17.805845: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-21 09:17:17.805916: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-21 09:17:17.806788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-11-21 09:17:17.816671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-11-21 09:17:17.854541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2020-11-21 09:17:17.862372: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2020-11-21 09:17:17.961227: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2020-11-21 09:17:17.961410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:17.962245: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:17.962836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-21 09:17:17.963571: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-21 09:17:18.745397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-21 09:17:18.745448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2020-11-21 09:17:18.745463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2020-11-21 09:17:18.747106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:18.747776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:18.748358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:18.748909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 7120 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
2020-11-21 09:17:18.751666: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12838272530603917041
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7466015808
locality {
  bus_id: 1
  links {
  }
}
incarnation: 3621828384593309124
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1"
]

OK!

nb.oの日記

2020年11月21日土曜日

Fedora 33でTensorflow 2.4-rc2（CUDA11.1 cuDNN8.0.5）をビルドする

目的

環境