A
A
Anton2019-05-21 17:33:18
Database
Anton, 2019-05-21 17:33:18

Where to store raw data from external sources?

You need to collect and store raw data from external sources. The data is as follows: unloading from api Yandex metrics, Google Analytics, a few more similar sources. This data needs to be collected and stored while the company is operating in order to make selections for analysts, upload the necessary data to the OLAP ClickHouse storage, process it and display it in PowerBI. The volume of data is small: about 10-15 tons of lines per day for each source. For 3 years, about 40GB. Accordingly, hadoop products are not suitable here, not the right volume.
I myself thought where to upload, options: csv files for every day - not convenient for further use, if you need to explore the accumulated, make requests, search for something; mongodb - for some reason, many are afraid to use it, there are also cassandra, elasticsearch, clickhouse options. Cloud services are not yet used in the company.
It is important for me that this storage is reliable, it is possible to sometimes make search requests. So far, there is very little storage experience. Please let me know what is best for this task.
Thank you all in advance for your replies.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dimonchik, 2019-05-21
@iskinn

to Clickhouse and store
it there and you can copy it, and by partitions, it compresses itself - why invent?

I
Ivan Shumov, 2019-05-21
@inoise

Store as json files in AWS s3 and use for OLAP - AWS Athena. It almost doesn’t take up space, everyone’s favorite SQL will be, it works quite fast and Serverless, which means that you pay only for what is used

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question