Which database is best for doing aggregations?

Y

yiicoder2015-09-03 14:11:57

MySQL

yiicoder, 2015-09-03 14:11:57

The task is to process raw statistics data.
Simple aggregation queries - Select SUM/AVG from group by (AGE,SEX,DAY,SOURCE). (usually in group by parameters 10-20 for intermediate data). The aggregated data is placed in a separate table and a search is already done on it with WHERE in which there are the same 10-20 parameters.
Now mongodb (aggregation framework) is doing all this business, I don’t like performance. (The indexes are all standing, they fit into the memory, there is clearly nowhere else to optimize mongo)
Maybe there is a database more sharpened for such tasks?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

Lesha Kiselev, 2015-09-03
@Yakud

ElasticSearch
https://www.elastic.co/guide/en/elasticsearch/refe...
https://www.elastic.co/guide/en/elasticsearch/refe...
From my own experience I can say that it does a very good job . Now there is a small cluster with 300+ GB of statistics events, everything works very fast.
Here are some more links to avoid common mistakes in cluster setup.
radar.oreilly.com/2015/04/10-elasticsearch-metrics...
https://www.loggly.com/blog/nine-tips-configuring-...
https://www.elastic.co/blog /found-optimizing-elast...
I just stumbled upon the stone described in this article:
https://www.elastic.co/blog/support-in-the-wild-my...
When setting up index mapping, specify parameter for unparsed fields:
"doc_values" : true

L

lPolar, 2015-09-09
@lPolar

Alternatively, you can use Impala, Hive on Tez with Hadoop cluster. The scalability will be 100%, the same CDH or HDP is quite easy to deploy.
If you have a lot of money and CPU is not a problem, you can use Spark SQL on top of the same Hive.