R
R
ryzhikovas2015-04-05 01:04:47
big data
ryzhikovas, 2015-04-05 01:04:47

The best way to store small bitmaps. Volume > 400 Gb. DB or FS?

The task is to store a large number of images of the Earth's surface. The current implementation assumes the representation of the entire surface (with the exception of the polar regions) in the Mercator projection. This virtual raster is divided into 256x256 fragments - tiles. Such a representation is performed for each of the predefined zoom levels.
At the moment, attribute information about snapshots is stored in a primitive way using SQLite. A repository based on the FS directory structure has been developed for tiles. Directory distribution corresponds to B-tree indexing (a la google maps, bing). The speed of obtaining a raster fragment is quite satisfactory. However, I spent a lot of time on the implementation of bicycles - the mechanism of transactions, logging. The question is how efficiently (primarily by efficiency here I mean the speed of performing a raster data selection operation) could such a thing be implemented using standard PostgreSQL / MySQL / ets tools? What features of the database (except for interprocess communication) will reduce the speed of reading data compared to accessing small "raw" files in the FS?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
P
Pavel K, 2015-04-05
@PavelK

Of course FS!
Only here the main thing is to correctly distribute this matter =)
Ie . do not immediately drop all 400GB into one folder, but for example, according to some criteria (at least by name, at least somehow).
Oops, sorry for the obvious, the post did not load to the end.
After all, the database also stores these files on disk, and sqlite - one file .
I didn’t delve into the subtleties, because. small yet, but it was easier for me to distribute across files, perhaps some optimizations will help.
Used xfs.
Another plus is that you can easily do the display in several threads.
But you probably already know this, most likely the answer for people like me =)

X
xmoonlight, 2015-04-05
@xmoonlight

1. Paths to tiles - in the database.
2. Pictures - in on disk with prefetching and caching in RAM. Because static: nginx.
In principle, you need to calculate: it is quite possible that it makes sense to load the "hills" (or the entire matrix of tiles) immediately into the RAM-drive when loading.
"Hills": their tops are frequently used tiles. As a rule, these are the centers of large cities (you can type in usage statistics).

L
lega, 2015-04-05
@lega

You can take MongoDB, the advantages are:
* With a large load or volume, it will be possible to pour the data into sharding. This can also help save money, for example, instead of one DO server for $480, you can take 24 minimal virtual machines for $120, + there will be more cores and traffic.
* You can store extra. parameters, tags, (attribute information) and so on along with the file, so the tile and everything associated with it will be in one data block, unlike using *sql. This is good for performance because fewer indexes and fewer FS accesses.
* You can make extra. indexes
* You can use geo-indexes, selection of tiles by radius, etc.
* Also for this task (it is quite possible) atomic commits are enough, they are better in performance than full-fledged transactions.

Z
zed, 2015-04-05
@zedxxx

Personally, I'm leaning towards the DB. For example SQLite. Only, of course, not in one file, but broken into "blocks". You can look at the idea in SAS.Planet (BerkeleyDB cache) or in SACS , there is a cache right in SQLite.
Regarding FS - not every system can withstand such a load ( Windows XP and 50 million files in the SASGIS cache ), so you need to look at its type and check under load.
If there are no questions about the reliability of FS and you are confident in it, then you should consider the issue of backups, namely, the convenience and speed of creating and restoring them. IMHO, backing up millions of tiles is very inconvenient, so the database will give odds here.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question