How to organize user file storage (virtual fs or native)?

K

KonstantoS2015-08-09 23:34:40

Node.js

KonstantoS, 2015-08-09 23:34:40

Now I am writing a simple system, there in one of the points of the technical task there is a storage of user data. A kind of bicycle cloud.
It is provided that the user uploads files (allowed types) and it is possible to create folders. And also share files / folders.
So far, only 2 implementation ideas:
1. Fill everything in one place, and store both folders (virtually) and file records in the database. Well, specify the parent.
2. For each user, select a physical folder, the hierarchy is built in a human way. Records in a DB of the same type leave. But without specifying the hierarchy.
Plus, I think that you can access such a scheme via FTP, which the first one will not have.
I searched already, but I didn’t find anything intelligible, maybe I don’t understand well how to express it for the search.
Tell me how the adherents do it and what is the reason for this or that decision)
Note: implementation on NodeJS + Postgresql

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

T

Timur Shemsedinov, 2015-08-10
@MarcusAurelius

And here it all depends on the number and size of files, on the number of users and on the distribution of files among users. I explain, if there are many users (millions) and they have few files (dozens), then you will have many folders, and there are few files in them, this is not an economical consumption of the file system, it will take a lot of table of contents, and there will be a folder with slow access (in which contains user folders). If there are few users and a lot of files, then there will also be folders with a very large table of contents. Here you can either choose a file system that solves these problems or balance the folder tree yourself so that the search is optimal. How to achieve optimal search, make a balanced folder structure so that each contains not too many and not too few files with very different names. For example, you can make a 2 or 3 level folder system, which contains a file renamed to HEX, for example /EA8D253F/2145AE32/F259C201 We need to generate random folder and file names, and then write this path to the database. this will be optimal for any file system and any number of files, just increase the length of names, alphabet and nesting of folders (depending on the features of the file system and your needs, you need to study this). In addition, it solves a bunch of problems - files with the same name and files with strange characters in their names (including Arabic, Chinese and other UTF8 names), executable files and the security issue in general, relative depersonalization of data, and so on ... About FTP better forget, no users should go to FTP, this is an archaic protocol of the late wire age, used now only by me and other perverts. And if you still calculate hashes for files, several different hashes just in case, and store them together with names and all metadata in the database, then you can get rid of duplication on the disk (there are cases when different users have a large percentage of the same files) . Here are some sketches:/lib/impress.files.js#L111-L174 even files on the screw are compressed by two ZIP and GZIP depending on the size. Take it, I give it to the method...

X

xmoonlight, 2015-08-09
@xmoonlight

Immediately caught my eye:

Plus, I think that you can access such a scheme via FTP, which the first one will not have.

And who said that you can't make FTP access to a virtual structure?!) True, you will have to write your own FTP server, which Wirth understands. FS, and even then, provided that the existing ones are not enough: they usually allow you to create virtual folders for users based on the rules of the config, but (for the most part) they do not know how to climb into the database.
It is better to use a virtual FS because you will not be "attached" to file system restrictions (number of directories/files in total, max attachment depth, etc.) and you will be able to easily shard storage to multiple servers, taking into account backups, fault tolerance/RAID, etc. .P.

S

Stanislav Makarov, 2015-08-10
@Nipheris

Hashes have already been mentioned here, I advise you to think carefully about them. Now hashes are pretty fast on modern hardware, so think about a content-adressable filesystem. Mapping to a classic file system is elementary: the hash is divided into the required number of pieces and a nested structure is formed from them, something like f5/c3/ab/1414.... . Store the folder hierarchy in the database, I don’t know what is better to use for this, in theory, a real file system is just better suited for storing a nested structure than, for example, a relational database. So it will be necessary to thoroughly research in this place - which database to take for metadata about files. Well, actually, in addition to the rest of the meta-info, write a hash of the content, and find the real content by the hash. Such a decision should be sharded very well, plus, as Timur Shemsedinov noted, there will be a benefit from not duplicating files (if we implement a smarter system that will count references to a specific hash). The probability of a hash match is very very low, for reliability, you can still check for size, although it is unlikely that you will encounter such a case during use.

D

Dan Ivanov, 2015-08-10
@ptchol

And why can't you take something ready-made like webDAV. Write down your server implementation, in which to route clients to N servers if necessary. And make public \ non-public by creating symlinks in the public folder from the public storage folder.