Answer the question
In order to leave comments, you need to log in
How to find the distance between two ads?
Two advertisements for example:
1) apartment for sale
type: 1k.kv.
cost: 1 million rubles.
area: 30m2
place: butyrsky district
2) apartment for sale
type: 1k.kv.
cost: 1.2 million rubles.
area: 28m2
place: marfino area
3) apartment for sale
type: 3k.kv.
cost: 4.2 million rubles.
area: 128m2
place: marfino district
What methods can be used to achieve approximately the same result?
similarity: 1 and 2 = 80% similarity
1 and 3 = 2% similarity
2 and 3 = 5% similarity
Answer the question
In order to leave comments, you need to log in
Great question!
Let's take the testing route.
We have two identical ads with 5 parameters:
1. title
2. type
3. cost
4. area
5. place
If they match, the result will be 100%.
Now we will change the area with a difference of 2m².
We get that, for example, 122m² is the nominal value, and 124m² is the comparable value.
122/100=1.2 we get 1%
124-122=|2| we get the difference in the sum modulo
2/1.2=1.67 we get the value of the difference in percent
100-1.67
= 98.33% match
The same for the cost
/street).
And then using the API (for example, googe maps) to compare the distance according to the following principle.
1. If there is a nominal area, how far is the nominal area from the comparison area?
2. If there is a nominal street, how far is it from the compared street.
3. If there is a nominal subway, how far is it from the compared subway?
4. If there is a nominal area and a comparable street, is it in this area?
3. If there is a nominal subway, is it in the compared area?
Well, etc.
The title can optionally be broken into words or symbols and compared.
First, we count all the characters and take their number as 100% and then compare with another heading character by character or word by word
And then give the percentage of each similarity.
I would recommend that.
The total number will be extremely uninformative and can be wrong too often
In your example, you can come up with a function that will show the desired numbers stupidly for the price.
If all the parameters are taken into account, then you need to bring them to some common denominator, and then stupidly add them up for each apartment. Then you can count the difference, attitude, whatever.
The actual solution depends on what problem you are solving. I assume you are giving the user similar options. But then the user must decide what is the decisive factor for him and what is not.
For example, the user uses search. And in the filters he indicated that the place and price are important to him. In this case, the coefficient of importance of the area will be equal to zero, and it will not play a role. For the place and price, you should already have selected coefficients and calculation methods, we leave them. By the way, as for the place, it is reasonable to take the real distance, that is, the neighboring area is closer than at the other end of the city, and this will require an additional table with constants or a more complex algorithm specifically for this component (it is ideal to take into account the address of the apartment and count in kilometers).
Place + modular difference of digital indicators of other fields: the less - the higher (more relevant).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question