R
R
Roger Martino2017-10-21 16:24:05
Parsing
Roger Martino, 2017-10-21 16:24:05

How to search for ads without linking to the site?

Hello!
The task is to get a large number of relevant real estate ads.
Initially, there are no other criteria.
All you need is a mechanism that works constantly on the server, which will deeply study everything possible on the Internet related to real estate and save it in some form (let it be a url string).
Problems that I have:
1) Need a starting point of the search. I can define this point as a query "property ad", get a bunch of sites in any search engine and then search through these sites for the information I need. Is it optimal?
2) What does it mean to "search for information"? The parser has html, but sites will have different layouts. What algorithms can I use to determine the presence of an ad and extract parameters from the page (ad name, picture, description, location, etc.)?
I need something intelligent. What can you tell me about this issue?
3) At the moment, not so relevant, but still: how to check the relevance of the announcement? Check that the ad has not been removed from publication? Here again, I rely on different site layouts. We need a parser that can approach this question with "intelligence".
Ask for help from knowledgeable people. Where to dig?
I'll be happy with any suggestions :)

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Roger Martino, 2017-10-26
@rojermartino

Maybe there are some thoughts? :)

S
Stepan, 2017-10-30
@steff

Starting point - bulletin boards for the sale of real estate: intercom, cyan, avito ...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question