V
V
Vladimir Boliev2012-05-21 11:56:18
CMS
Vladimir Boliev, 2012-05-21 11:56:18

How to list 1 million items?

There was a task to place 1 million goods in the store. There are 3 options (as I see it):
1. Write your own cms
2. Finish something ready
3. Use third-party sites, for example webstore.amazon.com

In the first case, everything is clear, you can sharpen everything for yourself, put down the necessary indexes, use Sphinx etc. But development will take a long time.
The second option is that there are fears that you can hit some kind of ceiling, and a situation will arise that the time to finish the system will be lost, and you still have to write from scratch.
In the third option, there is no possibility of more flexible configuration / writing of specific modules.

It is not possible to use ready-made cms. because they are not intended for such volumes of data. In openCart and OsCommerce, the product search function takes 8 minutes. In magento, it was not even possible to import products, the save method in the product model works out 1.5 - 2 seconds. I tested everything on a local computer, on the server, of course, it will be faster, but still, the numbers are huge.

Does anyone have any experience with this kind of data? Maybe ideas?

Answer the question

In order to leave comments, you need to log in

9 answer(s)
A
Alexey Sundukov, 2012-05-21
@alekciy

Let me write about my experience.
The situation is similar, but initially the goods needed 250 items. An analysis of boxed solutions (which I didn’t do) showed that either a box at such volumes cannot guarantee fast work, or box manufacturers want such money for an enterprise that it’s cheaper to cut their own. Actually what is currently busy.
It takes time, but it allows you to fully control the engine and be sure of the loads that it will pull. In addition, it guarantees a more profitable scheme for modifying the engine, i.e. engine support becomes easier both technically and financially. And the support of the engine is actually the main expense item for software. What we managed to get at the moment is the catalog. Those. category tree, product cards, admin panel for managers (create a product, add attributes to the product). The number of products is not limited, the number and type of product characteristics is also not limited and is maintained through the admin panel (i.e. no additional coding is required). Load tests showed that with ~ 200 MB of RAM under PHP, the engine holds 300 requests / sec (when it hits the cache, the page is generated in 10-15 ms) for a long time (i.e. somewhere up to 25 million hits per day) and can keep a peak of 1000 requests, but no longer than 5 seconds, then the 50s start to fall. This is with a catalog of 250 items of 10 characteristics per product. In general, the entire bundle (web server, subd, cache) eats 1-1.5 GB of RAM. At the same time, there is a complete decoupling of data and templates, so you can have as many layout options as you like, i.e. there is no mixture of php + html.
Therefore, in our case, sawing our own shows itself as a justified strategy. I don’t see how it would be possible to get such indicators on boxed solutions, even if they were finished. Because there is no point in sawing the core of a boxed product, and without cutting it, the architecture cannot be changed. Well, modifying your product is still easier than supporting someone else's + adding modules for it.
So there is technical preparation, that is, it makes sense to do your own. Depending on experience, general requirements, I would estimate this work from three months to a year for one full-time developer. This is before the first release. Well, then the standard engine support.

I
int03e, 2012-05-21
@int03e

In openCart and OsCommerce, the product search function takes 8 minutes

Are you satisfied with the rest? I don’t know, but I suspect that Solr or Sphinx can be screwed there, the situation should improve significantly.

E
edogs, 2012-05-21
@edogs

1 million positions in itself is not scary, the number of possible properties that should be searchable is scary. 2-3 properties - one situation, 20-30 - another situation, 200-300 - the third.
Ready-made tsms out of the box will not pull such a volume, however, a reasonable option would be to take a ready-made tsms as a basis as a “body kit”, that is: payments, static pages, statistics, categories, backend, delivery, template engine, etc.
But specifically, the catalog plugin / module - write completely your own, with direct access to the database (the only link to the product ID that would be for the rest of the cms), with proper design on a 100 euro hetzner server, even the most complex search will fit in 10 seconds (if not plain text), but a set of more or less standard filters (standard search in stores) - 2-3 seconds, categories and tags will simply fly.

A
Artemy Averin, 2012-05-21
@DeDraw

If you do not have the strength and ability to write a CMS, although this is also not the best option, then:
Maybe open the mysql of the finished store and write a small script to write to this database with a third-party script?

D
Dmitry, 2012-05-21
@DedalX

“The product search function works out for 8 minutes” - so maybe it’s not about the CMS, but about the capabilities and ceiling of the hardware? In any case, for such a quantity of goods, you will need a serious server (what is a self-written CMS, what is not).

C
Chii, 2012-05-21
@Chii

Take a ready-made CMS, write a store module yourself (unfortunately, there are no quality stores in the public domain).
By the way, you can organize a module as a free project and get yourself a significant number of free open source developers after the release (they just don’t know where to go - there are simply no high-quality store-bought modules) - this way you will spend only on initial development and save a hell of a lot of resources on support .

D
Denis Turenko, 2012-05-21
@Dennion

By the way, not auto parts? And then there are a lot of pitfalls with replacement articles and numbers for spare parts, and all this together with 1 million goods will be an extremely difficult thing. Search by item number will be or somehow more difficult? I come down to the idea - to take a ready-made CMS and assemble your own IM module, store product data in a remote storage.

V
Vampiro, 2012-05-22
@Vampiro

imho on 1k of records slows down not CMS, but a DB. And to optimize the tables (build indexes, spread data into different tables, etc.) - this is the minimum that is needed. If there are no indexes on the field, then the search will crawl for 8 minutes. If yes, 8 seconds. In general, it's hard for me to imagine that with 1k records there can be a request longer than 1 second. These are not the volumes at which something starts to slow down. Buy a box, optimize queries and signs, see what happens.

D
Dmitry, 2012-05-22
@Neir0

I support the previous speaker. 8 minutes something enchanting. At me occurrence of a substring in the table without indexes on 100 records on a computer of 5 years ago so much is searched. If there are indexes, then there is a problem in the CMS and you should try another one or stupidly cut off part of the search (well, or try to optimize :) ). Does the directory itself work?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question