Help with the creation of a specialized search engine

Y

Yuri Syrovetsky2011-05-06 16:53:01

Programming

Yuri Syrovetsky, 2011-05-06 16:53:01

I'm going to write something like a specialized search engine. It will roam through a large number of resources (not only the web, and not only the [hyper]text resources) of the open Internet, extract the information I need and add it to a database (with a clear structure, you need to search in one field clearly, in another full text ).

Requirements:
- to minimize the delay between changing the resource and its re-indexing;
- maximize the speed of extracting useful data from the database for different queries (moreover, some queries will be asked more often than others, this can help).

I want to start with proof of concept - a software solution that, being launched on a single server (physical or in the cloud), would prove the viability of the very idea of extracting this kind of information. Then, if everything works out, expand and deepen the service.

Give links to materials on the topic, ready-made solutions, libraries, frameworks, languages, at least suitable keywords for searching.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

B

b0n3Z, 2011-05-07
@b0n3Z

Ready open source solution: Nutch . Everything you need to search is there, including scalability if you hook up Hadoop to it.

P

Puma Thailand, 2011-05-06
@opium

Copy the architecture of the same Google, otherwise your requirements are blurred.