How to store scraping data on the site?

B

BushaevDenis2017-03-16 15:25:02

PHP

BushaevDenis, 2017-03-16 15:25:02

How will it be correct to store a large amount of parsing data, for example, I have 5 thousand links, each of which has a table with 5 columns, 5000 rows.
And each link will need to be parsed once every n-days, and the result will be saved without deleting the old one.
That is, as a result, a lot of data will turn out.
What is the correct way to store all this data?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Roman Mirilaczvili, 2017-03-16
@2ord

You probably need to apply the same approach as in version control systems. Then each result of one URL can be updated (commit) without overwriting the previous one. In this case, the place in the database will increase by delta (diff).

I

igruschkafox, 2017-03-16
@igruschkafox

5000 * 5000 = 25,000,000 times a day we update all
links
25,000,000 * 365 =
9,125,000,000,000 per year archived) all the records that were before - before the update , that is, you will have the entire history of changes in the second table. Correspondingly partition the archive table for about 2 weeks . This way you will ensure that the current data will be delivered quickly Archived longer - but this is understandable Then the data for the past periods can be backed up leaving only the latest records of changes (for example, only data for six months can be stored in the archive)
If necessary, everything can be taken out and counted,
but practice says that such frantic volumes, if taken out, will be considered for a very long time, and due to the prescription of time, they are often irrelevant. According to last year's links, obviously no one will conduct analytics.