T
T
tibitibidoh2015-10-03 14:05:49
big data
tibitibidoh, 2015-10-03 14:05:49

How much does the Runet index weigh? How fast can you collect?

There are a couple of questions that have been tormenting me for a couple of years now, I decided to ask:
1. How much can a search index similar to the Yandex search index weigh approximately?
2. What is the time of one round if there are 50 servers with a good band (let's say in Selectel)?
3. The same 2 questions, provided that only the main pages of Russian-language sites are bypassed?
ps An important condition is the exclusion from the index of heavy media and graphic files (200 KB limit).
I would be grateful for versions or data from knowledgeable sources!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
C
Curly Brace, 2015-10-03
@stasuss

if we take this as a basis:
track.ruward.ru/health
we get 2837959 sites. the average page weighs 100 kilobytes (pure html).
and then think about how it will be parsitsa and what will be pulled out of it and, on this basis, calculate how much it will take in a particular index. you don’t have an algorithm for building an index from Yandex, do you?)

V
Viverov, 2015-10-05
@Viverov

There is not one index)
There may be more indices of 15 pieces (meaning within one PS), I don’t know for sure!
And about how long to collect, write to Kalinin (head of the mail ru search department). They first put a Google index and slowly collect theirs by taking out a share of Google in their search

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question