What is the best way to sample?

S

seowin5552015-11-26 18:49:46

PHP

seowin555, 2015-11-26 18:49:46

The task is as follows - you need to generate a large number (about 1 million) html files on the fly as quickly and safely as possible in terms of load and save them to disk.
Files are generated based on a template and lines of text. 30-50 lines per file.
Question: where is it better to select these rows (possibly consecutive ones) from the database or from a txt file so that the sampling and generation speed is the highest and the server does not fall from the load?
At generation of each file - there is a new selection of these 30-50 lines.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

R

romy4, 2015-11-26
@romy4

1 million files is already bad.
a) you will heavily load the file system if you distribute them incorrectly. For example, putting more than 4k files in one directory kills performance.
b) You will exhaust all inodes and everything will stop.
Store them in some nosql or in the same favorite database.

D

Dmitry Belyaev, 2015-11-26
@bingo347

If the selection of rows follows some criterion from the general list, then it is definitely a database, if it is just a different set of rows for each page, then you can pull it from the file (provided that each page has its own file - it will be faster
) , you have php in the tag, probably not the best solution, if you need really fast, look towards something with asynchronous IO (for example node.js or go) this will allow processing not to wait for saving data to disk and getting data from the database

A

Alexander Melnichenko, 2015-11-26
@alex87melnichenko

If I understand correctly, then the situation is as follows:
you want to create about a million files each with 30-50 lines. If you shove a record from the database into a file, then it turns out you need a million queries to the database. If you immediately pull them out with one sample, then this is also not an option.
From so many requests, the database will simply fall down
. It's better to keep it all in a file and read it line by line.

S

seowin555, 2015-11-26
@seowin555

The lines are completely random. The only requirement is that lines within the same file should not be repeated. May be repeated within different files.
Base or file one / one (not for each page its own)
------------
udp
In general, the task now is more reduced to how to speed up the generation process.
I realized that creating 100500 html files on the server is not a good idea, I decided that it would be better to put everything in a SQLLite database. And then generate the page on the fly based on the template.
In general, the purpose of all this is to generate pages for sites. Those. each site will have 250k - 1k pages.
The question arises how to speed up the process of writing to the SQLLite database (each site has its own database) and not put down the server if we generate, say, 50 sites at a time.
As I assume, it will take a very long time to write to the lightbase for each site in turn.
Those. the following scheme is obtained:
There is a SQL database running MySQL. It contains several tables. Each table has 1kk-3kk rows.
It is necessary to generate 50 sites at a time, each site has 250k - 1 kk pages.
For each page, a selection is made from several tables in the SQL database, the text is processed and written to the SQlLite database of each site.
What is the best way to optimize this whole thing so that the speed is as high as possible, and at the same time the server does not crash?