Are there software products for determining human actions from video?

V

Victorius2019-02-23 23:31:24

Pattern recognition

Victorius, 2019-02-23 23:31:24

Good afternoon!
I heard that there are ready-made software products that allow you to determine the actions of people based on a video stream. Activities related to restaurant activities. Can anyone suggest such a software product to install and use out of the box?
The usage scenario is as follows:

The barista takes the order
barista grain grinder
Barista brews coffee
Barista whipping cream/milk
Barista accepts payment
Barista dispenses coffee

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

rPman, 2019-02-24
@rPman

Ready-made software, even in the general case, to determine what a person is doing, does not exist, but it has been the last decade, thanks to the power of gpu iron clusters and neural networks, that it is possible to create such software in narrow areas, specifically in your case - tracking the actions of a bartender.
But for this you need a decent amount of video already collected, covering all types of bartending activities (by the way, different people) and many times and preferably from several angles. Then you have to sit down yourself or hire a team of camera girls to mark each frame! captured videos what is happening on them (most likely you do not need to detect objects in the video, only the very fact of the presence of an action). It is quite possible that you will first have to hire a programmer who will clean up the video from unnecessary information (for example, cut out the part of the image that customers fall on), and possibly track the position of the bartender (there are technically libraries for this). The same programmer will select software and start building a neural network on the collected data on your gpu cluster (or if we are talking about cloud services, then a little cheaper on Google TPUs)
and as a result, if you are lucky, you will get a neural network that, for any frame of real-time video on cheap hardware, will be able to classify what a person is doing on it.
In general, you have no end of work and in terms of money it will be very, very expensive.
ps if you are not afraid of a large number of false positives, you can simply determine the position of the bartender (several cameras and a simple application on opencv), and there, if he is next to the coffee maker for more than the threshold time, he brews coffee, if he is at the cash register, he accepts payment, etc. .