J
J
JackBoner2019-04-26 20:20:49
Python
JackBoner, 2019-04-26 20:20:49

Tensorflow runs faster on CPU than on GPU. How to set up correctly?

I can not understand why the model is trained on the CPU 2-3 faster than on the GPU.
windows 10
tensorflow 1.13.1
keras 2.2.4
CUDA 10.1
Model available:

network = models.Sequential()
network.add(layers.Dense(5, activation='relu', input_shape=(5,), kernel_regularizer=regularizers.l2(0.05), activity_regularizer=regularizers.l1(0.01)))
network.add(layers.Dense(2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

Log:
Using TensorFlow backend.
2019-04-26 19:42:22.001733: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-04-26 19:42:22.242292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2019-04-26 19:42:22.242786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.858856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.859063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:22.859197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:22.859446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9401497665143581718
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6620742943
locality {
  bus_id: 1
  links {
  }
}
incarnation: 3794371743575443843
physical_device_desc: "device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
2019-04-26 19:42:22.871318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.871539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.871806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:22.871938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:22.872124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:22.874432: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-26 19:42:24.455242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:24.455451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:24.455650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:24.455810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:24.455997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:24.846946: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_90.dll locally

Added paths to toolkit and cupti to %PATH%
CUDA\lib64
CUDA\include
CUDA\bin

During training, the GPU is loaded by 10%, but at the same time, almost the entire memory is occupied.
And the CPU is loaded by 60-70%, as if training takes place on it, and not on the GPU
. Where is the execution actually taking place? If on the GPU, then why is it several times slower than on the CPU?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Maxim Galaktionov, 2019-09-02
@AkumeiNiHao

You need to install tensorflow-gpu.
And check that everything is ok:
#test.py
import tensorflow as tf
#allow growth to take up minimal resources
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question