A
A
Arman2015-01-23 09:47:01
Amazon Web Services
Arman, 2015-01-23 09:47:01

How to properly organize a system for storing a large amount of data (logs, counters)?

Good afternoon.
There is a small task on which we can spend time today so that tomorrow our head does not hurt.
Therefore, while we are designing on a piece of paper, we would like to know the opinion of more knowledgeable and experienced people. Or maybe someone else can give you some advice.
Conditions:
- minimum admin. actions with the database, the server
- a large number of records
- reading mostly the last n-records (~1000), but with a condition (something like: SELECT * FROM tbl WHERE author = 'pabel' ORDER BY `id` DESC LIMIT 100). Also a very large number of read operations, we give data to the public API
- grouping (records have fields by which we group or are included in the selection condition)
- simple full-text search (very rarely, but it will be necessary in different languages)
- maximum lifetime. It is desirable to store for a long time, which means that there will be a lot of data in the table / collection.
The data is generally in the form of a http request log. Fields: counter owner, resource, http data, etc.
The application itself is still being made in php, if necessary, the bottlenecks will be rewritten. But we can't decide where to store it. I want a very simple and flexible tool, we are looking towards Amazon RDS and Amazon DynamoDB. It turns out there will be a huge queue for writing and reading the last n-records.
With cloud hosting, we close the condition "minimum admin. actions with the database, server", i.e. actually just adding resources. The only thing is that they cope with the loads themselves, or do you have to deal with database replication yourself if the data reading is very large?
If there is no particular importance in the form of data storage (SQL / noSQL), then which is better to handle the load of RDS or DynamoDB?
Can use something else?
Has anyone seen articles about this? Can you recommend reading material?
Thanks in advance.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
index0h, 2015-01-23
@index0h

Elasticsearch
For logs: some_logs_source > Logstash [ > Redis ] > Elasticsearch > Kibana

A
Aristarkh Zagorodnikov, 2015-01-24
@onyxmaster

I agree about Elasticsearch for logs, instead of Logstash you can look at Graylog2 (I'm not saying that it's better, but it's worth a look).
As for counters and all sorts of time series in general, I just recently came across http://influxdb.com, I didn’t get around to experimenting, but given that this is usually not such critical data, you can play around.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question