Answer the question
In order to leave comments, you need to log in
A few clarifications on SIFT based on an article on Habrahabr. Can anyone help?
I would like to ask a few questions, based on the same article https://habrahabr.ru/post/106302/
First, let's define the window (neighborhood) of the key point, in which the gradients will be considered. In essence, this will be the window required for convolution with a Gaussian kernel, and it will be round and the blur radius for this kernel (sigma) is 1.5*keypoint_scale. For the Gaussian kernel, the so-called "three sigma" rule applies. It consists in the fact that the value of the Gaussian kernel is very close to zero at a distance exceeding 3*sigma. Thus, the window radius is defined as [3*sigma].
The direction of the key point is found from the histogram of directions O. The histogram consists of 36 components that evenly cover a gap of 360 degrees, and it is formed as follows: each point of the window (x, y) contributes equal to m*G(x, y, sigma ), to that component of the histogram that spans the gap containing the gradient direction theta(x, y).
The direction of the key point lies in the gap covered by the maximum component of the histogram. The values of the maximum component (max) and its two neighbors are interpolated by a parabola, and the maximum point of this parabola is taken as the direction of the key point. If there are more components in the histogram with values not less than 0.8*max, then they are similarly interpolated and additional directions are assigned to the key point.
Here, a part of the image (on the left) and (on the right) the descriptor obtained from it are schematically shown. First, let's look to the left. Here you can see the pixels represented by small squares. These pixels are taken from the square window of the descriptor, which, in turn, is divided into four more equal parts (we will call them regions below). The little arrow in the center of each pixel represents that pixel's gradient. The interesting thing is that the center of this window is between the pixels. It should be chosen as close as possible to the exact coordinates of the key point. The last detail you can see is the circle representing the convolution window with a Gaussian kernel (similar to the window for calculating the direction of the key point). For this kernel, sigma is defined equal to half the width of the descriptor window.
The cue point descriptor consists of all received histograms. As already mentioned, the dimension of the descriptor in the figure is 32 components (2x2x8), but in practice, descriptors with a dimension of 128 components (4x4x8) are used.
The resulting descriptor is normalized, after which all its components, the value of which is greater than 0.2, are truncated to the value 0.2, and then the descriptor is normalized again. In this form, the descriptors are ready for use.
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question