Answer the question
In order to leave comments, you need to log in
How to search for ads without linking to the site?
Hello!
The task is to get a large number of relevant real estate ads.
Initially, there are no other criteria.
All you need is a mechanism that works constantly on the server, which will deeply study everything possible on the Internet related to real estate and save it in some form (let it be a url string).
Problems that I have:
1) Need a starting point of the search. I can define this point as a query "property ad", get a bunch of sites in any search engine and then search through these sites for the information I need. Is it optimal?
2) What does it mean to "search for information"? The parser has html, but sites will have different layouts. What algorithms can I use to determine the presence of an ad and extract parameters from the page (ad name, picture, description, location, etc.)?
I need something intelligent. What can you tell me about this issue?
3) At the moment, not so relevant, but still: how to check the relevance of the announcement? Check that the ad has not been removed from publication? Here again, I rely on different site layouts. We need a parser that can approach this question with "intelligence".
Ask for help from knowledgeable people. Where to dig?
I'll be happy with any suggestions :)
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question