Answer the question
In order to leave comments, you need to log in
Help with the creation of a specialized search engine
I'm going to write something like a specialized search engine. It will roam through a large number of resources (not only the web, and not only the [hyper]text resources) of the open Internet, extract the information I need and add it to a database (with a clear structure, you need to search in one field clearly, in another full text ).
Requirements:
- to minimize the delay between changing the resource and its re-indexing;
- maximize the speed of extracting useful data from the database for different queries (moreover, some queries will be asked more often than others, this can help).
I want to start with proof of concept - a software solution that, being launched on a single server (physical or in the cloud), would prove the viability of the very idea of extracting this kind of information. Then, if everything works out, expand and deepen the service.
Give links to materials on the topic, ready-made solutions, libraries, frameworks, languages, at least suitable keywords for searching.
Answer the question
In order to leave comments, you need to log in
Ready open source solution: Nutch . Everything you need to search is there, including scalability if you hook up Hadoop to it.
Copy the architecture of the same Google, otherwise your requirements are blurred.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question