What should be the architecture of the network to search for an object in a photo?

1

101-s2020-04-28 20:47:59

Machine learning

101-s, 2020-04-28 20:47:59

Given: the faces of people, you need to find, let's say the eyes.
There is a dataset with eye borders in the photo (x1,y1) (x2,y2)
Will this be a regression task for the network? That is, the selection of suitable coordinates.

Is such a model suitable? the output is 4 values.

mobile_net = tf.keras.applications.MobileNetV2(input_shape=(192, 192, 3), include_top=False)
mobile_net.trainable = False
model = tf.keras.Sequential([
    mobile_net,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(4, activation='sigmoid') # ?функция активации sigmoid, т.к. данные от 0 до 1
])

model.compile(optimizer="rmsprop", loss='mean_squared_error', metrics=['mae'])
# Т.к. задача регрессии, удобнее использовать mean square error(средне-квадратичная ошибка).
# В качестве метрики берем mean absolute error средняя абсолютная ошибка (MAE)