nb.oの日記: 2020.11

目的

Tensorflowの2.4-rc2をFedora 33でソースビルドする。

2.4の正式版リリースに向けての準備と備忘録。

環境

Fedora 33 x86_64
python 3.9.0（virtualenv）
GCC 10.2.1
CUDA 11.1 + cuDNN 8.0.5
CPU AMD Ryzen 7 1700
GPU GeForce GTX 1070

事前の準備

GCC9のビルド

前回の記事と同様、CUDA 11.1がサポートするGCCは9で、Fedora 33のGCC10ではビルドができない。このため、まずはGCC9のビルドを行う。

GCCのビルドについては、以前の記事を参照。

GCC9.3のソースダウンロード&ビルド

ソースをダウンロードし、ビルドする。

$ wget https://ftp.gnu.org/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar.gz
$ cd gcc-9.3.0/
$ ./contrib/download_prerequisites 
$ mkdir build
$ cd build/
$ ../configure \
    --enable-bootstrap \
    --enable-languages=c,c++ \
    --prefix=/home/xxxx/gcc/9.3 
    --enable-shared \
    --enable-threads=posix \
    --enable-checking=release \
    --disable-multilib \
    --with-system-zlib \
    --enable-__cxa_atexit \
    --disable-libunwind-exceptions \
    --enable-gnu-unique-object \
    --enable-linker-build-id \
    --with-gcc-major-version-only \
    --with-linker-hash-style=gnu \
    --enable-plugin \
    --enable-initfini-array \
    --with-isl \
    --enable-libmpx \
    --enable-gnu-indirect-function \
    --build=x86_64-redhat-linux
$ make -j16
$ make install

specsファイルの作成

コンパイルしたGCC9でビルドした際に、適切な動的リンクライブラリ（libstdc++.so）がリンクされるようにSPECEファイルを修正する。

$ /home/xxxx/gcc/9.3/bin/gcc -dumpspecs > specs
$ vi specs

# before
*link_libgcc:
%D

# after
*link_libgcc:
%{!static:%{!static-libgcc:-rpath /home/xxxx/gcc/9.3/lib64/}} %D

$ mv specs /home/xxxx/gcc/9.3/lib/gcc/x86_64-redhat-linux/9/

Environment Modulesの設定

GCC8をEnvironment Modulesで切り替えられるようにする。/etc/modulefiles 配下に、gcc9xのファイルを作成する。

#%Module 1.0
#
#  gcc-9.X module for use with 'environment-modules' package:
#

conflict        gcc5x gcc7x gcc9x
prepend-path    PATH                    /home/xxxx/gcc/9.3/bin/

Bazelのビルド

Bazel をソースビルドする。最新の3.7をソースビルドした。

公式のソースビルド方法はここを参照。手順どおりであり詳細の説明は割愛。

CUDA、cuDNNのインストール

CUDA: 11.1、cuDNN: 8.0.5をインストール。CUDAはRPM Fusion Howto/ CUDA を参考にインストールを行う。Which driver Packageにもあるとおり、RPM FusionとCUDAのリポジトリの両方にnvidia driverが存在するが、バージョンの不一致を起こしてしまうことがある。手順どおりにインストールしないとCUDAとnvidia driverの不一致で使えない。。。

cuDNNはNVIDIAのダウンロードサイトからダウンロード、インストールを行う。

Tensorflowのビルド

さて、本題。TensorFlow 2.4-rc1をビルドする。

virtualenvの設定

まずはvirtualenv（virtualenvwapper）でTensorflow用の仮想Python環境を作成し、必要なモジュールをインストールする。

$ mkvirtualenv -p python3  tf2.4-rc2
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1'
$ pip install keras_applications --no-deps
$ pip install keras_preprocessing --no-deps

ビルド

Githubからソースを取得し、configureスクリプト実行し、ビルドを行う。

CUDAのサポートを有効とする。
Host compilerにGCC9のgccのパスを指定してあげる。
ビルドオプションには"--config=v2"と"--config=nonccl "を指定。

(tf2.4-rc1) $ wget https://github.com/tensorflow/tensorflow/archive/v2.4.0-rc1.tar.gz
(tf2.4-rc1) $ tar xf v2.4.0-rc1.tar.gz 
(tf2.4-rc1) $ cd tensorflow-2.4.0-rc1/
(tf2.4-rc1) $ ./configure 
(tf2.4-rc1) $ bazel build \
                --config=opt \
                --config=v2 \
                --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
                --config=cuda \
                --config=nonccl \
                --verbose_failures \
                //tensorflow/tools/pip_package:build_pip_package
(tf2.4-rc1) $ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
(tf2.4-rc1) $ pip install /tmp/tensorflow_pkg/tensorflow-2.4.0rc2-cp39-cp39m-linux_x86_64.whl

インストール確認

tf.__version__が2.4-rc1であること。
GPUデバイスを認識していること。

(tf2.4-rc1) $ python
    Python 3.9.0 (default, Oct  6 2020, 00:00:00) 
[GCC 10.2.1 20200826 (Red Hat 10.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-11-21 09:17:05.361081: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2020-11-21 09:17:17.759422: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-21 09:17:17.800045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:17.803501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2020-11-21 09:17:17.803527: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-21 09:17:17.805845: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-21 09:17:17.805916: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-21 09:17:17.806788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-11-21 09:17:17.816671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-11-21 09:17:17.854541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2020-11-21 09:17:17.862372: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2020-11-21 09:17:17.961227: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2020-11-21 09:17:17.961410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:17.962245: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:17.962836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-21 09:17:17.963571: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-21 09:17:18.745397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-21 09:17:18.745448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2020-11-21 09:17:18.745463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2020-11-21 09:17:18.747106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:18.747776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:18.748358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-21 09:17:18.748909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 7120 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
2020-11-21 09:17:18.751666: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12838272530603917041
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7466015808
locality {
  bus_id: 1
  links {
  }
}
incarnation: 3621828384593309124
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1"
]

OK!

目的

YoctoでTensorFlow Liteのpip packageをビルド、インストールするレシピを作成する。

bitbakeしたイメージをラズパイ4で動かしてみる。

動機

TensorFlowのレシピはあったりするが（以前ブログにした）、TensorFlow LiteのPython interpreterのレシピがなかったこと、あとからpipでインストールするのが結構めんどくさいので、レシピを作成することにした。

meta-tensorflow-lite

meta-tensorflow-liteとしてGitHubにレシピを公開している。v2.3.1に対応。

リファレンス

TensorFlow Liteのビルドは以下が参考になる。

ビルド

リポジトリのREADMEにもある通り。

確認はラズパイ4の32 / 64bitで実施。
イメージはcore-image-weston（他も動くはず）。
dunfell, zeusに対応。
zeusは人生初めてPRをもらった！（Support also zeus version #1）

必要なリポジトリをClone。

$ git clone git://git.yoctoproject.org/poky.git
$ git clone git://git.yoctoproject.org/meta-raspberrypi
$ git clone git://git.openembedded.org/meta-openembedded
$ git clone https://github.com/NobuoTsukamoto/meta-tensorflow-lite.git
$ source poky/oe-init-build-env rpi-build

レイヤーの追加

$ bitbake-layers add-layer ../meta-openembedded/meta-oe/
$ bitbake-layers add-layer ../meta-openembedded/meta-python/
$ bitbake-layers add-layer ../meta-openembedded/meta-networking/
$ bitbake-layers add-layer ../meta-openembedded/meta-multimedia/
$ bitbake-layers add-layer ../meta-raspberrypi/
$ bitbake-layers add-layer ../meta-tensorflow-lite/

conf/local.confにpython3-tensorflow-liteを追加。あと、opencvのpythonやgitも追加すると楽。

MACHINE ?= "raspberrypi4-64"
IMAGE_INSTALL_append = " python3-tensorflow-lite"

あとはBitbakeしてSDカードに書き込み。

$ bitbake core-image-weston

このリポジトリでTensorFlow Liteモデル（Float, INT8）が動くことを確認。

（注意：Edge TPUは動作しません）

TensorFlow Lite Python interpreterをビルドするYoctoのrecipeを作成してみた。

ラズパイ4 AArch64、core-image-weston
(たぶんarm32bitもOK)

ようやく、picamera+opencv pythonが動くところまで確認できた（picameraが認識できなくて一週間悩んだ...）。https://t.co/nW38Oe58MQ pic.twitter.com/DdetVOi7Tz
— nb.o (@Nextremer_nb_o) July 12, 2020

その他

やったこと

tensorflow/lite/tools/pip_package/build_pip_package.shでpip packageを作成してインストールするレシピを作成。

curlがつかえないのでwgetに置き換える。
001-Change-curl-to-wget-command.patch
クロスコンパイラが「arm-linux-gnueabihf-g++」、「aarch64-linux-gnu-g++」固定なので、yoctoのクロスコンパイラを指定するようにする。
001-Remove-toolchain-setup-and-pybind11.patch
あわせて、tensorflow/lite/tools/make/Makefile
001-TensorFlow-Lite_Makefile.patch
作成したpip packageがどこにあるのか分からなかった... とても長いパス...

PiCameraが有効にならなかった

meta-rasberrypiの設定でPiCameraを有効としてもまったく認識しなくて1週間悩んでいた...

Interface誌で「My オリジナルLinuxの作り方」連載されている！みつきんさん（@yusuke_mitsuki）！！にもアドバイスいただいたのだが認識せず。。。（その節はありがとうございます！）

picameraが認識できなかった件、みつきんさん（@yusuke_mitsuki）にいろいろアドバイスいただきました。
ありがとうございます！！

（Interface誌の連載、読んでます！）https://t.co/R3THlZuxmd
— nb.o (@Nextremer_nb_o) July 12, 2020

結局はboot/config.txtのPiCameraを有効にする「start_x=1」を643行目？以降に記述するとカメラを認識できなくなるようだ...

同じような事象で困っていた人は他にもいたようだ...

（個体なのか何なのか）認識する場合もあるので、もし、ラズパイ4で同じ事象があったら、行数を意識すること。

終わりに

あとは、Edge TPUも動かしてみたいなぁ...

nb.oの日記

2020年11月21日土曜日

Fedora 33でTensorflow 2.4-rc2（CUDA11.1 cuDNN8.0.5）をビルドする

目的

環境