How to analyze the number of simultaneous calls in a certain minute?

L

Leks2015-09-10 10:30:29

Analytics

Leks, 2015-09-10 10:30:29

Good afternoon.
There is a task to understand the number of simultaneous calls in a certain minute of time, having logs of the form:
11:02:25 250
11:02:14 60
11:03:08 33
11:03:10 99
Where 11:02:25 is the connection time, 250 is the duration .
Naturally, there are a lot of logs, more than 500 per minute can be. Solutions like Excel are not suitable. I would like to automate using bash or python. The output must have a number - for example, 5 simultaneous per minute 11:02.

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

A

angru, 2015-09-10
@Leksnsk

interesting problem, I did something like this. Decided for myself, but suddenly come in handy:

# -*- coding: utf-8 -*-
from pprint import pprint
from datetime import datetime, timedelta
from collections import defaultdict


INTERVAL = 15


assert 0 < INTERVAL <= 60, "алгоритм работает не совсем корректно при значениях интервала больше 60"


def get_time_list(start, end, interval):
    """
        Список "времен" в которые попадает звонок
    """
    t = start + timedelta(seconds=(interval - ((start.second % interval) or interval)))  # стартовое время в которое попадает звонок с учетом интервала
    res = []

    while t.time() <= end.time():  # используем .time() т.к. если сравнивать datetime можно перейти на следующий день
        res.append(t.time().isoformat())

        t = t + timedelta(seconds=interval)

    return res


with open('calls.log', 'r') as f:
    res = defaultdict(int)

    for line in f.readlines():
        start_time, duration = line.split()
        start_time = datetime(1, 1, 1, *map(int, start_time.split(':')))  # используем datatime вместо time, потому что к time нельзя прибавить timedelta
        end_time = start_time+ timedelta(seconds=int(duration))
        time_list = get_time_list(start_time, end_time, INTERVAL)

        for t in time_list:
            res[t] += 1

    pprint(res)

result on your set:
{'11:02:15': 1,
'11:02:30': 2,
'11:02:45': 2,
'11:03:00': 2,
'11:03:15': 3 ,
'11:03:30': 3,
'11:03:45': 2,
'11:04:00': 2,
'11:04:15': 2,
'11:04:30': 2 ,
'11:04:45': 2,
'11:05:00': 1,
'11:05:15': 1,
'11:05:30': 1,
'11:05:45': 1 ,
'11:06:00': 1,
'11:06:15': 1,
'11:06:30': 1}

A

asdz, 2015-09-10
@asdz

Convert the list of calls into a list containing 2 fields - time, event type. There are two types of events - the beginning of a call, the end of a call. Sort this list by the time column. Go through this list sequentially and enter +1 in the counter if the call start event, and -1 if the call ends. Compare the value of the counter with the previous value, save the maximum and time of the corresponding event. So you can find the maxima not only within a day, but also in any time range.

S

Saboteur, 2015-09-10
@saboteur_kiev

I would do this in pearl or python, but not in bash. Large array and inconvenient with cross-platform.
Just go through all the lines and increment all your calls in an array divided by minutes.
Then at the end you generate a report on the array.

A

Andrew, 2015-09-10
@OLS

I can give my old Delphi source code and compiled code for this task.
Only the response will contain the maximum number of concurrent busy lines per minute/hour.
If interested, please enter your e-mail

R

Roman Mirilaczvili, 2015-09-10
@2ord

"Logs" should be saved immediately in the DBMS, if possible. Say, if some program outputs data to stdout, then another program can collect it and immediately write it to SQLite/MySQL.
Let's say this: call_center | sql_collector.
And the analytics itself will "stick", having a mechanism for accessing the database.
A simple SELECT query is written using GROUP BY, COUNT. Then the resulting sample is analyzed by a smart analyzer program.