Which shard key to choose in MongoDB?

D

Dmitry Labutin2015-08-14 13:34:13

MongoDB

Dmitry Labutin, 2015-08-14 13:34:13

I want to store prices in Monge. Significant
columns:
- Price list ID (about 3000 different price lists, will slowly grow by about 20-30 per month) -
Manufacturer ID (about 4000 different manufacturers)
- article (string)
- price
- quantity
- name about 100 million lines. Prices can be from 100 lines to several million. A product is a unique combination of brand ID + SKU. There will be about 15-20 million unique goods. That. The same product can be repeated in different price lists. The number of products of one brand can be from 1000 to several million. Typical database query:

Give a list of offers (lines from price lists) for such and such a list of goods.
Those. the input is N goods, the output is all rows where these goods are, i.e. in what price lists at what prices and how many.
Now about sharding.
Basic requirements: the main thing is to read quickly. Write speed is second.
Now my thoughts.
Because If a request can include goods from 1-2 brands, then I would like the request to go to only one shard. This suggests sharding by brand ID. But at the same time, there is a difficulty with the size of the chunks. As I wrote above, there are brands with several million positions (articles). A chunk cannot contain less than all documents with one specific vendor ID. If you do not raise the size of a chunk in a monga, then the monga starts to swear that it cannot move chunks that are larger than the maximum chunk size (and it will be so).
If the chunk size is raised, then one will become large and the data across the shards will be unevenly distributed.
If you use the Compound Shard Key, for example, manufacturer ID + article, then a request for N products of the same brand may well go to all shards, which we would like to avoid.
Actually a question - what advise to make in my situation?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

L

lega, 2015-08-14
@lega

Those. input is N goods

product == SKU?
Because the write speed is not so important, then you can consider this option - to "collapse" all the data on the product, i.e. one article contains a list of prices + manufacturers where it participates. Shard on the article.
+ saving memory - the number of documents is 5 times less, the article (and name) are common.