Answer the question
In order to leave comments, you need to log in
Where and how to store many files of different sizes?
There are many files of different sizes (compressed from 1Kb to 100Mb). Files - text logs of user traffic, compressed with gzip. 1 file - logs of one user for 1 day.
One day - about 60k files (compressed about 15GB). Now I store them in directories by day, inside the day I break them into directories by prefixes. On the one hand, it is convenient - you can quickly get logs for any user. But working with a lot of such files is not convenient - they are copied slowly, fs works slowly. Given that the logs for the last 3-4 months are stored on the partition, 7-8 million files are obtained.
I am looking for such a storage (or a way of organizing) to reduce the number of files to make it easier to work with them, while at the same time to have compression. I considered the option to save files in PostgreSQL (logs in a text field so that compression works), there are fewer files, but the problem is writing large files - a text log of 1 GB in size could not be written - there is not enough memory for the script (wrote in python).
Are there any other options for storing these kinds of files?
Answer the question
In order to leave comments, you need to log in
To solve the problem of storing logs, I decided to use ClickHouse from Yandex. Columns, compression and quick access make it convenient to work with data: you can quickly get all the traffic for a specific user. Not many files are physically created, you can backup these files directly (disconnecting them from the database). For now, this is the best solution for me.
Have a look at
https://github.com/reverbrain/historydb
https://github.com/reverbrain/elliptics
But it only works well with medium to large files, i.e. a few tens of kilobytes or more.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question