K
K
kyckos2015-04-17 18:00:00
PostgreSQL
kyckos, 2015-04-17 18:00:00

How to properly group data by time boundaries?

Good afternoon.
We have a model with sample data

id  DateTime                  Value
133 "2015-04-17 12:12:54 +04" 23
134 "2015-04-17 12:13:54 +04" 23
135 "2015-04-17 12:23:54 +04" 23
136 "2015-04-17 12:44:54 +04" 25
137 "2015-04-17 14:52:54 +04" 42
138 "2015-04-17 15:45:54 +04" 42

We need to query all records whose DateTime does not overlap with the nearest records within 15 minutes.
Ie 133 is suitable
136 is suitable because 44 - 23 > 15 minutes
137 and 138 yes the rest are not.
This is probably a clustering task, that is, you need to form groups of records with boundaries of 15 minutes. If more than 15 minutes, then this is a new cluster. The problem is that I don't know how to implement it in the form of a correct algorithm :/.
You probably need something like this.
Pseudo python code
last_row_date_time = datetime.datetime()
groups = [] #Distinct groups
for row in Data.objects.all().order_by('datetime')
     if (row.datetime - last_row_date_time) > datetime.timedelta(minutes=15)
         groups.append(row)
         print "New group"
     else:
         print "Old group"

But I don't understand how correct this is. Perhaps there are some better ways to solve the problem?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
Maxim Kudryavtsev, 2015-04-17
@kumaxim

To begin with, I will start from the fact that you have some kind of journal and all entries are in ascending time. For simplicity, I assume that you have these records in an array.
The decision in a forehead - to organize search. Step by step:

  1. We take element N
    You will separately need to think about what to do if you work with the first and last element, because in this case, one of your semi_ variables will definitely be false.

S
sakuradaj, 2015-04-17
@sakuradaj

If the variable -+15 minutes does not change and you don’t have a huge amount of data in the database now and you can rewrite adding them, then:
The current data in the database can be clustered stupidly by running it through python, I think any algorithm can be, not the point.
And cluster new records when adding:
before adding a new record, make a selection of records that are "+ -15 minutes from now".
Check the found records for the presence of groups:
If there are groups, then look for those in which all records are "+ -15 minutes from now", if not found, then create a new one and bind the record.
A group is an M2M connection.
You may need some locks at the time of adding and searching for groups.
The solution that immediately came to mind may have missed something.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question