M
M
mastangt2017-03-06 16:00:19
SQL
mastangt, 2017-03-06 16:00:19

What is the best way to enter and store a large amount of data in the database?

Hello.
There is a site on which discounts on goods constantly appear and disappear.
I am creating a project in which the history of prices for all products should be stored.
The parser will run every 4 hours. There is another idea, instead of collecting this huge database, to make several positions for each product, in which the price will be placed and the time period during which the cost of the product has not changed, and so on for all discounts / prices for each product. Tell me what is the best method to implement this task, if you have your own ideas, I will only be happy to listen to them, and which database is better to use for storing and constantly fetching such amounts of data?
Even if you take not all products but only a few categories, you get about 400,000 products, and you need to pull them 6 times a day, respectively, 2,600,000 entries in the database per day.
The question is how best to enter them into the database, I think to combine, for example, 1000 products and throw them in one request to reduce the load on the database, and how and where it is better to store them, about 72,000,000 records are obtained per month, and statistics will be needed collect for a long time.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
E
entermix, 2017-03-06
@mastangt

You do not need to write all prices to the database at the time of each check, you only need to write information about the price that has changed.

products
id, name, created, ...

product_prices
id, product_id, price, created, ...

D
Denis Smirnov, 2017-03-06
@darthunix

I would use PostgreSQL, because I know it) And in more detail, I would first create my own type that describes the price at a point in time, consisting of time and price. Then I would simply create a table of goods, which would have a column with an array of this new type of prices at a point in time. And with each new download of information, I would lay out the time and price data for a suitable product in this array. Well, I would add a primary key for goods and a gin index for an array of temporary price points. In theory, in this form, the table will not become very large and you will retain the ability to quickly aggregate data. Well, the data will be easy to shard, if necessary.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question