What is the best way to implement a highload table?

A

Anton Ivanov2014-01-29 05:13:16

PostgreSQL

Anton Ivanov, 2014-01-29 05:13:16

Hello.
Used by PostgreSQL 9.3
The table consists of the fields id (int), started (timestamp), ended(timestamp), task(varchar 2048), active(int), comment(varchar 1024), retries(int)
Number of new entries in this table - approximately 2 million lines per day.
The number of updates is the same, 2 million.
At the same time, the relevance of the line (ideally) is approximately 1 minute. That is, it will be like this:
00:00. Task(string) added since "active"
00:01. The task is processed ("active" = 0)
At the same time, completed tasks should be available for viewing (deleting is not an option) and a good search speed by id + active should be maintained (before setting the task, a check is made whether this task is in the table, or No).
I see 3 options:
1. All records in one table, index on active.
2. Create a second table (tasks_done), where to transfer the completed tasks
3. After the task is completed, delete the row from the table and place it in the nosql log (elasticsearch)
Which of the options will be preferable in terms of speed of searching for a task that has not yet been completed and searching for an already completed one? Let's say that after 2 months you can delete tasks, that is, the maximum number of records in the table is 60 million.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

Z

zxmd, 2014-02-26
@zxmd

If the table has a ts field (well, or whatever you call it) that stores the datetime, then you can use partition tables.
https://blog.engineyard.com/2013/scaling-postgresq...
www.postgresql.org/docs/9.3/static/ddl-partitionin...

M

Mikhail Beschetnov, 2014-02-28
@TerminusMKB

If you have a requirement "in the status active=1 there can be only one task with a certain value of the task field" (i.e. no duplicates are needed), you can create a composite unique partial index on the task + active fields with a constraint like "active = 1" . This will allow:
1) Not to check for the presence of a task before adding it. Instead, handle the exception that occurs when inserting duplicate values. Your variant of checking the task for existence before adding it will create good problems for you if you suddenly decide to process tasks in at least two threads.
2) Quick search by id and active
Regular transfer of done tasks is a good idea in any case - the main table with tasks will be facilitated.