A
A
Anton Ivanov2014-01-29 05:13:16
PostgreSQL
Anton Ivanov, 2014-01-29 05:13:16

What is the best way to implement a highload table?

Hello.
Used by PostgreSQL 9.3
The table consists of the fields id (int), started (timestamp), ended(timestamp), task(varchar 2048), active(int), comment(varchar 1024), retries(int)
Number of new entries in this table - approximately 2 million lines per day.
The number of updates is the same, 2 million.
At the same time, the relevance of the line (ideally) is approximately 1 minute. That is, it will be like this:
00:00. Task(string) added since "active"
00:01. The task is processed ("active" = 0)
At the same time, completed tasks should be available for viewing (deleting is not an option) and a good search speed by id + active should be maintained (before setting the task, a check is made whether this task is in the table, or No).
I see 3 options:
1. All records in one table, index on active.
2. Create a second table (tasks_done), where to transfer the completed tasks
3. After the task is completed, delete the row from the table and place it in the nosql log (elasticsearch)
Which of the options will be preferable in terms of speed of searching for a task that has not yet been completed and searching for an already completed one? Let's say that after 2 months you can delete tasks, that is, the maximum number of records in the table is 60 million.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
Z
zxmd, 2014-02-26
@zxmd

If the table has a ts field (well, or whatever you call it) that stores the datetime, then you can use partition tables.
https://blog.engineyard.com/2013/scaling-postgresq...
www.postgresql.org/docs/9.3/static/ddl-partitionin...

M
Mikhail Beschetnov, 2014-02-28
@TerminusMKB

If you have a requirement "in the status active=1 there can be only one task with a certain value of the task field" (i.e. no duplicates are needed), you can create a composite unique partial index on the task + active fields with a constraint like "active = 1" . This will allow:
1) Not to check for the presence of a task before adding it. Instead, handle the exception that occurs when inserting duplicate values. Your variant of checking the task for existence before adding it will create good problems for you if you suddenly decide to process tasks in at least two threads.
2) Quick search by id and active
Regular transfer of done tasks is a good idea in any case - the main table with tasks will be facilitated.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question