2020年3月28日土曜日

Fedora 31でTensorflow 1.15.2(CUDA10.2 cuDNN7.6.5)をビルドする

目的


前回のブログにつづき、TensorFlow 1.15.2をソースビルドする。1.x系と2.x系は両方あった方がよい。Dockerを使えばよいのだが、簡単に使える方がよいのでビルドする。


環境


前回と同様である。
  • Fedora 31 x86_64
  • python 3.7.6(virtualenv)
  • CUDA 10.2 + cuDNN 7.6.5
  • CPU AMD Ryzen 7 1700
  • GPU GeForce GTX 1070


事前準備


こちらも前回のブログと同様なので割愛。
詳細は、こちらを参照。ただし、Bazelのバージョンは0.26.1を使用する。


Tensorflowのビルド


さて、本題。TensorFlow 1.15.2をビルドする。1.x系の注意事項。

virtualenvの設定


$ kvirtualenv -p python3 tf1.15.2
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1'
$ pip install keras_applications --no-deps
$ pip install keras_preprocessing --no-deps


ビルド


ソースを展開後、third_party/nccl/build_defs.bzl.tpl の116行目を削除する。これでCUDA 10.2 + cuDNN 7.6.5でのビルドができる。

$ diff -up third_party/nccl/build_defs.bzl.tpl third_party/nccl/build_defs.bzl.tpl.org
--- third_party/nccl/build_defs.bzl.tpl 2020-03-28 21:06:18.313022179 +0900
+++ third_party/nccl/build_defs.bzl.tpl.org 2020-03-28 21:06:05.835143967 +0900
@@ -113,6 +113,7 @@ def _device_link_impl(ctx):
"--cmdline=--compile-only",
"--link",
"--compress-all",
+ "--bin2c-path=%s" % bin2c.dirname,
"--create=%s" % tmp_fatbin.path,
"--embedded-fatbin=%s" % fatbin_h.path,
] + images,


ビルドを行う。途中でsys_gettidの問題でエラーとなる。
なお、/home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f のパスは、bazelのキャッシュであり毎回変わるので注意。

$ wget https://github.com/tensorflow/tensorflow/archive/v1.15.2.tar.gz
$ tar xf v1.15.2.tar.gz
$ cd tensorflow-1.15.2/
$ bazel build \
--config=opt \
--config=v1 \
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
--config=cuda \
--config=nonccl \
--verbose_failures \
//tensorflow/tools/pip_package:build_pip_package

ERROR: /home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/external/grpc/BUILD:507:1: C++ compilation of rule '@grpc//:gpr_base' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/execroot/org_tensorflow && \
exec env - \
PATH=/home/xxx/.virtualenvs/tf2.2-rc1/bin:/home/xxx/.local/bin:/home/xxx/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin \
PWD=/proc/self/cwd \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/host/bin/external/grpc/_objs/gpr_base/log_linux.pic.d '-frandom-seed=bazel-out/host/bin/external/grpc/_objs/gpr_base/log_linux.pic.o' '-DGRPC_ARES=0' -iquote external/grpc -iquote bazel-out/host/bin/external/grpc -isystem external/grpc/include -isystem bazel-out/host/bin/external/grpc/include '-std=c++11' -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 '-march=native' -g0 -c external/grpc/src/core/lib/gpr/log_linux.cc -o bazel-out/host/bin/external/grpc/_objs/gpr_base/log_linux.pic.o)
Execution platform: @bazel_tools//platforms:host_platform
external/grpc/src/core/lib/gpr/log_linux.cc:43:13: error: ambiguating new declaration of ‘long int gettid()’
static long gettid(void) { return syscall(__NR_gettid); }
^~~~~~
In file included from /usr/include/unistd.h:1170,
from external/grpc/src/core/lib/gpr/log_linux.cc:41:
/usr/include/bits/unistd_ext.h:34:16: note: old declaration ‘__pid_t gettid()’
extern __pid_t gettid (void) __THROW;
^~~~~~
external/grpc/src/core/lib/gpr/log_linux.cc:43:13: warning: ‘long int gettid()’ defined but not used [-Wunused-function]
static long gettid(void) { return syscall(__NR_gettid); }
^~~~~~
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 2333.653s, Critical Path: 275.29s
INFO: 7193 processes: 7193 local.
FAILED: Build did NOT complete successfully

ビルドエラーが発生後、該当の external/grpc/src/core/lib/gpr/log_linux.cc を修正する。"gettid"を "sys_gettid"に置き換えて競合しないようにする。

--- /home/nobuo/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/external/grpc/src/core/lib/gpr/log_linux.cc.org 2020-03-28 21:51:12.510625621 +0900
+++ /home/nobuo/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/external/grpc/src/core/lib/gpr/log_linux.cc 2020-03-28 21:50:36.429895235 +0900
@@ -40,7 +40,7 @@
#include <time.h>
#include <unistd.h>
-static long gettid(void) { return syscall(__NR_gettid); }
+static long sys_gettid(void) { return syscall(__NR_gettid); }
void gpr_log(const char* file, int line, gpr_log_severity severity,
const char* format, ...) {
@@ -70,7 +70,7 @@ void gpr_default_log(gpr_log_func_args*
gpr_timespec now = gpr_now(GPR_CLOCK_REALTIME);
struct tm tm;
static __thread long tid = 0;
- if (tid == 0) tid = gettid();
+ if (tid == 0) tid = sys_gettid();
timer = static_cast<time_t>(now.tv_sec);
final_slash = strrchr(args->file, '/');
ビルド完了後、pipパッケージを作成して、インストールする。

$ bazel build \
--config=opt \
--config=v1 \
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
--config=cuda \
--config=nonccl \
--verbose_failures \
//tensorflow/tools/pip_package:build_pip_package
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ pip install /tmp/tensorflow_pkg/tensorflow-1.15.2-cp37-cp37m-linux_x86_64.whl

インストール確認



  • tf.__version__が1.15.2であること。
  • GPUデバイスを認識していること。
  • $ python
    Python 3.7.6 (default, Jan 30 2020, 09:44:41)
    [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    >>> tf.__version__
    '1.15.2'
    >>> from tensorflow.python.client import device_lib
    >>> device_lib.list_local_devices()
    2020-03-28 21:40:55.825351: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2993890000 Hz
    2020-03-28 21:40:55.827348: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a6744ec540 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-03-28 21:40:55.827545: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
    2020-03-28 21:40:55.878251: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
    2020-03-28 21:40:56.375602: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-03-28 21:40:56.379531: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a67459c1d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
    2020-03-28 21:40:56.379602: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1
    2020-03-28 21:40:56.379917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-03-28 21:40:56.380660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085
    pciBusID: 0000:09:00.0
    2020-03-28 21:40:56.381097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
    2020-03-28 21:40:56.383815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
    2020-03-28 21:40:56.390800: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
    2020-03-28 21:40:56.391482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
    2020-03-28 21:40:56.400449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
    2020-03-28 21:40:56.407601: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
    2020-03-28 21:40:56.424543: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-03-28 21:40:56.424919: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-03-28 21:40:56.426401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-03-28 21:40:56.430032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
    2020-03-28 21:40:56.432332: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
    2020-03-28 21:40:56.439218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
    2020-03-28 21:40:56.439362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
    2020-03-28 21:40:56.439453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
    2020-03-28 21:40:56.442431: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-03-28 21:40:56.444038: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2020-03-28 21:40:56.444975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 6983 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1)
    [name: "/device:CPU:0"
    device_type: "CPU"
    memory_limit: 268435456
    locality {
    }
    incarnation: 1257751299983349578
    , name: "/device:XLA_CPU:0"
    device_type: "XLA_CPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 7465901702980950830
    physical_device_desc: "device: XLA_CPU device"
    , name: "/device:XLA_GPU:0"
    device_type: "XLA_GPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 14187726965059283561
    physical_device_desc: "device: XLA_GPU device"
    , name: "/device:GPU:0"
    device_type: "GPU"
    memory_limit: 7323051623
    locality {
    bus_id: 1
    links {
    }
    }
    incarnation: 12656403030500723883
    physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1"
    ]
    >>>

    OK!

    0 件のコメント:

    コメントを投稿