目的
前回のブログにつづき、TensorFlow 1.15.2をソースビルドする。1.x系と2.x系は両方あった方がよい。Dockerを使えばよいのだが、簡単に使える方がよいのでビルドする。
環境
前回と同様である。
- Fedora 31 x86_64
- python 3.7.6(virtualenv)
- CUDA 10.2 + cuDNN 7.6.5
- CPU AMD Ryzen 7 1700
- GPU GeForce GTX 1070
事前準備
こちらも前回のブログと同様なので割愛。
詳細は、こちらを参照。ただし、Bazelのバージョンは0.26.1を使用する。
Tensorflowのビルド
さて、本題。TensorFlow 1.15.2をビルドする。1.x系の注意事項。
- CUDA 10.2 + cuDNN 7.6.5が対応されていないため対処が必要。
- tensorflow: build fails with cuda 10.2 #76935
build failed with cuda 10.2 #34429 - sys_gettidがglibc 2.30で被るため対処が必要。
Update grpc dependency for glibc 2.30 compatibility #33758
virtualenvの設定
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ kvirtualenv -p python3 tf1.15.2 | |
$ pip install pip six numpy wheel setuptools mock 'future>=0.17.1' | |
$ pip install keras_applications --no-deps | |
$ pip install keras_preprocessing --no-deps |
ビルド
ソースを展開後、third_party/nccl/build_defs.bzl.tpl の116行目を削除する。これでCUDA 10.2 + cuDNN 7.6.5でのビルドができる。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ diff -up third_party/nccl/build_defs.bzl.tpl third_party/nccl/build_defs.bzl.tpl.org | |
--- third_party/nccl/build_defs.bzl.tpl 2020-03-28 21:06:18.313022179 +0900 | |
+++ third_party/nccl/build_defs.bzl.tpl.org 2020-03-28 21:06:05.835143967 +0900 | |
@@ -113,6 +113,7 @@ def _device_link_impl(ctx): | |
"--cmdline=--compile-only", | |
"--link", | |
"--compress-all", | |
+ "--bin2c-path=%s" % bin2c.dirname, | |
"--create=%s" % tmp_fatbin.path, | |
"--embedded-fatbin=%s" % fatbin_h.path, | |
] + images, |
ビルドを行う。途中でsys_gettidの問題でエラーとなる。
なお、/home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f のパスは、bazelのキャッシュであり毎回変わるので注意。
ビルドエラーが発生後、該当の external/grpc/src/core/lib/gpr/log_linux.cc を修正する。"gettid"を "sys_gettid"に置き換えて競合しないようにする。
ビルド完了後、pipパッケージを作成して、インストールする。
なお、/home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f のパスは、bazelのキャッシュであり毎回変わるので注意。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ wget https://github.com/tensorflow/tensorflow/archive/v1.15.2.tar.gz | |
$ tar xf v1.15.2.tar.gz | |
$ cd tensorflow-1.15.2/ | |
$ bazel build \ | |
--config=opt \ | |
--config=v1 \ | |
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \ | |
--config=cuda \ | |
--config=nonccl \ | |
--verbose_failures \ | |
//tensorflow/tools/pip_package:build_pip_package |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ERROR: /home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/external/grpc/BUILD:507:1: C++ compilation of rule '@grpc//:gpr_base' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command | |
(cd /home/xxx/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/execroot/org_tensorflow && \ | |
exec env - \ | |
PATH=/home/xxx/.virtualenvs/tf2.2-rc1/bin:/home/xxx/.local/bin:/home/xxx/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin \ | |
PWD=/proc/self/cwd \ | |
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/host/bin/external/grpc/_objs/gpr_base/log_linux.pic.d '-frandom-seed=bazel-out/host/bin/external/grpc/_objs/gpr_base/log_linux.pic.o' '-DGRPC_ARES=0' -iquote external/grpc -iquote bazel-out/host/bin/external/grpc -isystem external/grpc/include -isystem bazel-out/host/bin/external/grpc/include '-std=c++11' -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 '-march=native' -g0 -c external/grpc/src/core/lib/gpr/log_linux.cc -o bazel-out/host/bin/external/grpc/_objs/gpr_base/log_linux.pic.o) | |
Execution platform: @bazel_tools//platforms:host_platform | |
external/grpc/src/core/lib/gpr/log_linux.cc:43:13: error: ambiguating new declaration of ‘long int gettid()’ | |
static long gettid(void) { return syscall(__NR_gettid); } | |
^~~~~~ | |
In file included from /usr/include/unistd.h:1170, | |
from external/grpc/src/core/lib/gpr/log_linux.cc:41: | |
/usr/include/bits/unistd_ext.h:34:16: note: old declaration ‘__pid_t gettid()’ | |
extern __pid_t gettid (void) __THROW; | |
^~~~~~ | |
external/grpc/src/core/lib/gpr/log_linux.cc:43:13: warning: ‘long int gettid()’ defined but not used [-Wunused-function] | |
static long gettid(void) { return syscall(__NR_gettid); } | |
^~~~~~ | |
Target //tensorflow/tools/pip_package:build_pip_package failed to build | |
INFO: Elapsed time: 2333.653s, Critical Path: 275.29s | |
INFO: 7193 processes: 7193 local. | |
FAILED: Build did NOT complete successfully |
ビルドエラーが発生後、該当の external/grpc/src/core/lib/gpr/log_linux.cc を修正する。"gettid"を "sys_gettid"に置き換えて競合しないようにする。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- /home/nobuo/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/external/grpc/src/core/lib/gpr/log_linux.cc.org 2020-03-28 21:51:12.510625621 +0900 | |
+++ /home/nobuo/.cache/bazel/_bazel_xxx/aca0f394050ca263374306622f61644f/external/grpc/src/core/lib/gpr/log_linux.cc 2020-03-28 21:50:36.429895235 +0900 | |
@@ -40,7 +40,7 @@ | |
#include <time.h> | |
#include <unistd.h> | |
-static long gettid(void) { return syscall(__NR_gettid); } | |
+static long sys_gettid(void) { return syscall(__NR_gettid); } | |
void gpr_log(const char* file, int line, gpr_log_severity severity, | |
const char* format, ...) { | |
@@ -70,7 +70,7 @@ void gpr_default_log(gpr_log_func_args* | |
gpr_timespec now = gpr_now(GPR_CLOCK_REALTIME); | |
struct tm tm; | |
static __thread long tid = 0; | |
- if (tid == 0) tid = gettid(); | |
+ if (tid == 0) tid = sys_gettid(); | |
timer = static_cast<time_t>(now.tv_sec); | |
final_slash = strrchr(args->file, '/'); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ bazel build \ | |
--config=opt \ | |
--config=v1 \ | |
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \ | |
--config=cuda \ | |
--config=nonccl \ | |
--verbose_failures \ | |
//tensorflow/tools/pip_package:build_pip_package | |
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg | |
$ pip install /tmp/tensorflow_pkg/tensorflow-1.15.2-cp37-cp37m-linux_x86_64.whl |
インストール確認
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.7.6 (default, Jan 30 2020, 09:44:41) | |
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> import tensorflow as tf | |
>>> tf.__version__ | |
'1.15.2' | |
>>> from tensorflow.python.client import device_lib | |
>>> device_lib.list_local_devices() | |
2020-03-28 21:40:55.825351: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2993890000 Hz | |
2020-03-28 21:40:55.827348: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a6744ec540 initialized for platform Host (this does not guarantee that XLA will be used). Devices: | |
2020-03-28 21:40:55.827545: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version | |
2020-03-28 21:40:55.878251: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 | |
2020-03-28 21:40:56.375602: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 21:40:56.379531: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a67459c1d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: | |
2020-03-28 21:40:56.379602: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1 | |
2020-03-28 21:40:56.379917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 21:40:56.380660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: | |
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085 | |
pciBusID: 0000:09:00.0 | |
2020-03-28 21:40:56.381097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2 | |
2020-03-28 21:40:56.383815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 | |
2020-03-28 21:40:56.390800: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 | |
2020-03-28 21:40:56.391482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 | |
2020-03-28 21:40:56.400449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 | |
2020-03-28 21:40:56.407601: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 | |
2020-03-28 21:40:56.424543: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 | |
2020-03-28 21:40:56.424919: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 21:40:56.426401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 21:40:56.430032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 | |
2020-03-28 21:40:56.432332: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2 | |
2020-03-28 21:40:56.439218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: | |
2020-03-28 21:40:56.439362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 | |
2020-03-28 21:40:56.439453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N | |
2020-03-28 21:40:56.442431: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 21:40:56.444038: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero | |
2020-03-28 21:40:56.444975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 6983 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1) | |
[name: "/device:CPU:0" | |
device_type: "CPU" | |
memory_limit: 268435456 | |
locality { | |
} | |
incarnation: 1257751299983349578 | |
, name: "/device:XLA_CPU:0" | |
device_type: "XLA_CPU" | |
memory_limit: 17179869184 | |
locality { | |
} | |
incarnation: 7465901702980950830 | |
physical_device_desc: "device: XLA_CPU device" | |
, name: "/device:XLA_GPU:0" | |
device_type: "XLA_GPU" | |
memory_limit: 17179869184 | |
locality { | |
} | |
incarnation: 14187726965059283561 | |
physical_device_desc: "device: XLA_GPU device" | |
, name: "/device:GPU:0" | |
device_type: "GPU" | |
memory_limit: 7323051623 | |
locality { | |
bus_id: 1 | |
links { | |
} | |
} | |
incarnation: 12656403030500723883 | |
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:09:00.0, compute capability: 6.1" | |
] | |
>>> | |
OK!