T
T
tsegorah2014-01-24 10:33:43
PostgreSQL
tsegorah, 2014-01-24 10:33:43

How to optimize the performance of a large database in postgresql?

There is a database in postgresql under rhel for very specific software (specificity lies in the fact that, for example, relationships between tables can be stored not in the database, but in application software).
Database volumes up to 100-200 gigabytes.
There is pretty fast hardware, about 200+ gigs of RAM available and four Xeon E5-4640s.
95% of queries to the database will be for data sampling, trivial selects and join of several tables with complex conditions are approximately evenly distributed in frequency. Data is queried approximately equally from different tables.
5% of requests will be for writing small amounts of data.
The question is how best to optimize everything for such a task, the option of simply putting data into the database and minimal tuning of the DBMS does not give the desired performance.
Several options immediately come to mind.
The first option is to limit yourself to just thoughtful tuning of the DBMS.
Fortunately, there is enough documentation on the off site (for example, here wiki.postgresql.org/wiki/Performance_Optimization ).
But there are questions about how the DBMS will be able to use all the resources, and whether there will be problems with a long "warm-up".
The second option is due to the fact that the entire database will fit in RAM. You can mount the RAM and configure the tablespace accordingly.
But then the question of stability and data consistency arises, because the memory is volatile. Alternatively, you can provide replication from the same database on hard,
or a single entry point for application software that will send all requests to both databases.
The third option is due to the fact that most of the requests for reading. Divide hardware resources into multiple DB replicas and organize read/write requests accordingly.
Partitioning, please do not offer, this is already a separate piece of the base, which is further inappropriate to separate.
It is clear that all options will be checked, the load on the machine will be emulated in different options, and so on, in order to determine the most optimal one.
I would like to hear advice if someone was engaged in solving similar problems. Perhaps there are other options, or somehow you can develop these, or someone in such matters has already run into a rake.
So I would be grateful for any hints on this matter.
I will be grateful to the manuals on this topic if they are not from the first pages of Google)
Thank you.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
H
hell, 2014-01-24
@tsegorah

Actually, you need to raise the value of shared_buffers to a value so that your maximum selection fits into them (the total volume of all tables of the largest join), set work_mem so that all sorts are in RAM (see explain analyze), and then (or rather - in progress) to make your machine work with all this by tweaking the kernel.

R
Ruslan Kasymov, 2014-01-24
@HDAPache

The data structure is not entirely clear from the question. Those. Is high data consistency critical? Do these transactions require?
If you are satisfied with the "consistency in the end" approach, then it probably makes sense to look towards NoSQL and use MapReduce there

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question