Image recognition only with NumPy?

A

alan312021-11-02 06:36:18

Python

alan31, 2021-11-02 06:36:18

I want to make a neural network that recognizes objects in images, but everywhere it says that you need to use the opencv and other libraries, that is, which are already prepared for this.
But I just want to make such a neural network only with the help of numpy , that is, from scratch, is it possible?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

W

WhiteApfel, 2021-11-02
@WhiteApfel

Numpy itself does not know how to process images, build connections between "neurons" and in general it is more for managing data in arrays.
But is it purely theoretically possible? Maybe. How? Well...
First of all, you need to somehow get an array of pixels. This depends on the context. If you take a picture from a file, then there is the good old PIL library for reading images. No, of course, you can write a decoder, but I'm not sure that you need it so "your own".

from PIL import Image
import numpy

img = Image.open('image.png')
pixel_map = np.array(img)

This will be an array of string elements, where each string element is represented by an array of [r, g, b] or [r, g, b, a] if transparent.

# Например, изображение с такими пикселями

# [красный][зелёный]
#  [синий]  [белый]

# будет представлено массивом
[, ]

It is worth recognizing that this format is poorly suited for learning. Usually images are converted to B/W, compressed in size, and that's it. Compression, like reading, is best done through the PIL library, it works faster than pure python (it seems to be written in C, but this is not accurate).
For training, they usually unpack a two-dimensional array of pixels (our pixel was also an array, but we consider it a single object, because we could turn it into something else) into a one-dimensional one. For example, put pixels sequentially from rows:

pixel_sequence = [pixel for row in pixel_map for pixel in row]

Important. If we take pieces of code, we get list[list[r, g, b (, a)]]. This is not something that would be suitable for direct distribution networks (I can confuse the terminology, correct me), you can unpack the nested array again, of course, but the complexity will increase many times due to the increase in the number of input data

And then it is already possible to write logic in pure python for processing values from a numpy array. There is a good article on proglib . As a result, there will be a neural network "on numpy", although ordinary python sheets would be enough