Y
Y
Yuri Syrovetsky2011-05-06 16:53:01
Programming
Yuri Syrovetsky, 2011-05-06 16:53:01

Help with the creation of a specialized search engine

I'm going to write something like a specialized search engine. It will roam through a large number of resources (not only the web, and not only the [hyper]text resources) of the open Internet, extract the information I need and add it to a database (with a clear structure, you need to search in one field clearly, in another full text ).

Requirements:
- to minimize the delay between changing the resource and its re-indexing;
- maximize the speed of extracting useful data from the database for different queries (moreover, some queries will be asked more often than others, this can help).

I want to start with proof of concept - a software solution that, being launched on a single server (physical or in the cloud), would prove the viability of the very idea of ​​extracting this kind of information. Then, if everything works out, expand and deepen the service.

Give links to materials on the topic, ready-made solutions, libraries, frameworks, languages, at least suitable keywords for searching.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
B
b0n3Z, 2011-05-07
@b0n3Z

Ready open source solution: Nutch . Everything you need to search is there, including scalability if you hook up Hadoop to it.

P
Puma Thailand, 2011-05-06
@opium

Copy the architecture of the same Google, otherwise your requirements are blurred.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question