Why does a convolutional neural network only get to 70% of correct answers?

F

FoxBoost2018-08-19 17:56:55

Python

FoxBoost, 2018-08-19 17:56:55

For educational purposes, I wrote a digit classification neural network based on good images from the Chars74k dataset. I mainly used TensorFlow, I used Keras only for reading images into NumPy arrays. The neural network has one convolutional layer of 8x8 convolution for 16 features, ReLU is used as a non-linearity function, and all this is summed up with a sub-discredit layer. Next comes the usual fully connected layer with 4096 inputs and 10 outputs. By fitting hyperparameters, the accuracy reaches a maximum of 70%, and then it rises and falls. What could be the problem? Is the problem in the network architecture (add or change layers) and/or in the selection of hyperparameters? The source code with the dataset can also be viewed here: https://bitbucket.org/smolyardev/chars74k

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
import os


class Batch:
    def __init__(self, data):
        self.i = 0
        self.data = data
        self.data_len = len(data)

    def next(self, step=100):
        if self.i + step < self.data_len:
            res = self.data[self.i:self.i + step]
            self.i += step
        else:
            res = self.data[self.i:]
            self.i = step - (self.data_len - self.i)
            res.extend(self.data[:self.i])
        return res


folder = 'data'
i = 0
train = []
for path in os.listdir(folder):
    imgs = "{}/{}".format(folder, path)
    if os.path.isdir(imgs) and path.startswith("Sample"):
        for ipath in os.listdir(imgs):
            img = "{}/{}".format(imgs, ipath)
            if os.path.isfile(img) and img.endswith(".png"):
                proc = image.load_img(img, target_size=(32, 32))
                data = image.img_to_array(proc)
                data /= 255
                y_data = np.zeros(10)
                y_data[i] = 1.
                train.append({"x": data, "y": y_data})
        i += 1

np.random.shuffle(train)

x = tf.placeholder(dtype=tf.float32, shape=(None, 32, 32, 3))
y = tf.placeholder(dtype=tf.float32, shape=(None, 10))

w1 = tf.Variable(tf.truncated_normal([8, 8, 3, 16], stddev=0.1))
b1 = tf.Variable(tf.constant(0., shape=[16]))

conv1 = tf.nn.conv2d(x, w1, strides=[1, 1, 1, 1], padding="SAME") + b1
h_conv1 = tf.nn.relu(conv1)
pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
flat1 = tf.reshape(pool1, [-1, 4096])

w2 = tf.Variable(tf.truncated_normal([4096, 10], stddev=0.1))
b2 = tf.Variable(tf.constant(0., shape=[10]))
h2 = tf.matmul(flat1, w2) + b2

y_conv = tf.nn.softmax(h2)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_conv, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.25).minimize(cross_entropy)
correct = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

batch = Batch(train)

for i in range(2000):
    next_tr = batch.next(100)
    n_x = [t['x'] for t in next_tr]
    n_y = [t['y'] for t in next_tr]
    _, res = sess.run((train_step, accuracy), feed_dict={x: n_x, y: n_y})
    if i % 100 == 0:
        print(res)

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

A

Arseny Kravchenko, 2018-08-19
@Arseny_Info

There are not enough training logs to answer such a question. And not only by the train, but also by validation (judging by the code, there is no validation at all now).
Hypotheses:
- it is better to use at least 2-3 convolutional layers with 3*3 convolutions than one 8*8, because this will provide more non-linearity;
- learning rate = .25 may be too high, so that in the end the network starts to oscillate around a local minimum.

S

Sergey, 2018-08-19
@begemot_sun

Let's start with the question, what is NS?
Everything is very simple. This is some function of the set of variables f(x1, ..., xn, k1, ... kn).
What do you do when you teach NN?
You find k1 .. kn such that the result of the function for all examples is with a minimum deviation.
That. you are performing an approximation problem in a multidimensional space.
Further.
An example on the fingers (on the plane).
You have a parabola. You are trying to approximate it with a linear function: f(x) = ax+b.
What do you think your accuracy will be? Definitely not 100%.
That. any approximation is the process of approximating one function to a set of points (or to another function).
And if you are not satisfied with the accuracy of training, then it is enough to increase the complexity of the network, either in width or in depth. That. you increase the complexity of the approximating function.
But here it is necessary to look at such a phenomenon as retraining.
On the fingers, this is when you try to approximate the line with a parabola. Your parabola will degenerate into a line each time, but in fact you don't need it. You need the approximating function to generalize all the properties of the approximating function.
How do you like this explanation? :)

T

Tati_m, 2018-08-20
@Tati_m

Tell me, would you be interested in participating in a neural network training project (image recognition)?

A

Alexander Skusnov, 2018-08-20
@AlexSku

Alexnet has 25 layers.