Restoring 3D objects using machine learning methods, what am I doing wrong?

I

intTosha2018-04-29 18:28:11

Machine learning

intTosha, 2018-04-29 18:28:11

Good evening. For 3 months I have been struggling with the task of restoring a 3D model of a hand from a photo.
Generated a sample of data - 10,000 images of a hand made in Blender, with different positions of the bones. As an output vector, I decided to take the positions of the vertices of the 3D model. (Yes, I understand that it was more reasonable to set the positions of the bones, but I decided to try this way)
I made the data augmentation in such a way that it is almost impossible to meet two identical images in the sample. (the image of a hand is superimposed on any other photo, and filters are applied so that the hand does not stand out against the general background. Noises that imitate not very good shooting quality are also added.) One of the photos looks something like this:

A model of a convolutional neural network was built in Keras (I can’t provide a picture, because I couldn’t install graphvis).

inp = Input(shape=(res,res,3))
bath_0 = BatchNormalization(axis=1)(inp)
x1 = Conv2D(primitives, kernel_size=(9, 9), border_mode='same', activation='relu')(bath_0)
pool_1 = MaxPooling2D(pool_size=(2, 2))(x1)
bath_1 = BatchNormalization(axis=1)(pool_1)
x2 = Conv2D(primitives*2, kernel_size=(3, 3), border_mode='same', activation='relu')(bath_1)
x3 = Conv2D(primitives*2, kernel_size=(3, 3), border_mode='same', activation='relu')(x2)
x4 = Conv2D(primitives*2, kernel_size=(3, 3), border_mode='same', activation='relu')(x3)
pool_2 = MaxPooling2D(pool_size=(2, 2))(x4)
bath_2 = BatchNormalization(axis=1)(pool_2)
x5 = Conv2D(primitives*4, kernel_size=(3, 3), border_mode='same', activation='relu')(bath_2)
x6 = Conv2D(primitives*4, kernel_size=(3, 3), border_mode='same', activation='relu')(x5)
x7 = Conv2D(primitives*4, kernel_size=(3, 3), border_mode='same', activation='relu')(x6)
pool_3 = MaxPooling2D(pool_size=(2, 2))(x7)
x8 = Flatten()(pool_3)
x9 = Dense(1700,activation='relu')(x8)
d_1 = Dropout(0.5)(x9)
x10 = Dense(1700,activation='relu')(d_1)
d_2 = Dropout(0.5)(x10)
x11 = Dense(1700 ,activation='relu')(d_2)
out = Dense(out_size,activation='tanh')(x11)

What I managed to do: the trained neuron learned to bend the hand in the right direction, but the fingers always remain in the same position, regardless of the photo.
By the way, at first there was a problem: the neuron always built the same model (absolutely identical). I decided by adding photos without a hand to the selection. The output layer in this case received zero for all neurons.
Yes, what is the problem, you already explain!
And the problem is that the fingers are always in the same position for all models. Only the brush is bent. Like here:

Please answer the following questions, because I don't know what to think anymore.
1. What is my mistake? What am I doing wrong?
2. Maybe you should consider using Convolution3D?
3. How would you solve the problem in my place?
Thank you for attention.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Arseny Kravchenko, 2018-05-07
@Arseny_Info

Maybe the dataset is too complex - start with more photos and less aggressive augmentations (you can start with just a black background).
Well, the architecture is not very suitable, read what they have been doing in this task in recent years https://github.com/xinghaochen/awesome-hand-pose-e...