N
N
nikitasius2011-04-30 00:35:23
MySQL
nikitasius, 2011-04-30 00:35:23

implementation of storage of links in tinyurl service

There is an idea to make a resource (how many are there already?) of short links, like tinyurl or bitly, there is a tiny address (“soundy” domain of the form ***.**), but here are the questions with the technical part ...
In order: it will be used by users on the forum, by me personally, well, and to those who just like it.
-statistics collection will be implemented through Google AppsEngine (with a hit counter servlet + which links are popular + part of the statistical content will be generated)
-nginx will act as a front on the server, which, in addition to communicating with the back (if there is a back), will pull data from the servlet to GoogleApps and enter it to the page.
But further questions are read ...
On the Internet and on Habré there is a note on how users implemented such things through ready-made solutionsor samopisnye, which can be found through the search.
But all of them (everything I saw, and I saw little) work with the . There are two variants of the database:
1) non-transactional
2) transactional

In the first ( mysql-myisam) case, a small (for the time being) consumption of RAM, but when fetching or inserting, the table is blocked for writing and reading, respectively. In the second ( mysql-innodbwe don’t set Oracle) case of locking at the record level, but the load due to transactions and the situation when the tables and the database will grow, because our idea is to store all links. At first, the project will be for “myself and friends” and the load will be, if anything, from bots or hackers .
But I want to make a project focused on a large load in order to squeeze everything out. For example, 50-100 url requests per second is 4-8 million per day (a picture posted in one topic on Habré, which got into top24, twitched 0.5-2 times per second , for example).
I'm counting on more modest numbers (10-20 requests per second), but the server already has projects that eat its resources.

I had an idea - what if we store link data in the form of local files and subfolders. ext4 gives up to 64k folders per subfolder. That is, it is quite possible to decompose /a/aa, /a/ab... where a swarm of type files abcd.ext(ext extensions, for convenience) will lie, which will give a type link ***.**/aaaabcd(naturally, nginx will process it through regexp).

Links (files) will be generated perlon the side nginx( frontwithout back, perl module in ./configure) or back in the form fastcgi-phpor tomcat/ jboss+jsp, and then written to the file.

Will Linux (or the disk subsystem ) get sick from frequent requests for hard? The page on the hard will contain only a link and will be completed by nginx in accordance with. with config and data from a servlet with GoogleApps.

If there are ready-made solutions that exclude the use of the database, or articles that describe that using the disk subsystem is not reasonable, please poke me with a link to them.

A little about the server - a "shop" server from Hetzner, AMD 2 cores, 2Gb RAM & 400Gb raid-1 (soft), in the future, a switch to the EQ4 tariff is likely if the current one is not enough (although it is enough for everything that hangs there).

Answer the question

In order to leave comments, you need to log in

8 answer(s)
V
Vitaly Peretyatko, 2011-04-30
@nikitasius

What if we store everything in redis? It seems that there is even a module for nginx that allows it to take data directly from Redis.

F
fossdev, 2011-05-02
@fossdev

I risk grabbing 100,500 minuses from php's, but 100 requests per second is a serious load only for the Apache / php / SQL bundle. The functionality is simple, make fastcgi in C, use key-value storage like memcache to store links, and 2-3 thousand requests per second on an average server will not seem like something outrageous. If you do everything right, then performance will be limited by the width of the channel.

X
XPilot, 2011-05-02
@XPilot

Maybe offtopic, but maybe it will be interesting for you to see how the service itself was organized in the deceased tr.im

P
pietrovich, 2011-04-30
@pietrovich

What you are trying to come up with on the basis of FS is a kind of “sharding” of files in daddy. What prevents sharding data in the database?
And from 20-30 requests FS will not fit.
If I were you, I would do it easier - I would implement the storage option that is easier for you to implement right now, while hiding its implementation behind some kind of IStorageEngine. And in the future, if it turns out that it is approaching the performance threshold, it migrated to another one that would also implement IStorageEngine. Fortunately, by that time, the statistics will be selected, and the requirements will be clear and, for sure, there will be time to test and select the appropriate storage option. And there is always an opportunity to transfer data, especially if you think over a system that would issue “keys” in given sets that do not overlap between versions.

D
dom1n1k, 2011-04-30
@dom1n1k

In semi-offtop order.
A year and a half ago, I seriously thought about creating a similar service. I thought about it for a long time, but eventually changed my mind. With the outward simplicity of the idea itself, practical implementation runs into many pitfalls.
1. The segment is very competitive, there are many services. Plus, all the largest "demand generators" for short links (Twitter, Google Maps, etc.) have already acquired "court" shorteners. To win over the audience, you need to offer some interesting specific feature of your own (for example, I thought to make super-detailed statistics).
2. Limited opportunities for monetization with high hosting costs. You need a powerful server that can process a lot of requests per day (don't forget about statistics), but it's useless to run ads there for obvious reasons. I planned to make the service completely paid (about 5 bucks a month), focusing it on a “professional” audience - they are the ones who need statistics, reports, etc. This, in turn, raised the issue of user support.
3. A very non-trivial technological device of an apparently simple service - this can be seen from your topic.
4. The big spam problem.
As a result, it turned out that I needed very large (at least for one person) labor / time / money costs with extremely unobvious prospects. What if it doesn't shoot? And it's even more likely that it won't work :) As a result, this project was forced out of my head by other ideas.
I do not dissuade - it is clear that you are thinking through everything quite seriously - just thinking aloud.

E
ertaquo, 2011-04-30
@ertaquo

How about the option to store everything in the database (sql or nosql), and cache frequently requested links in memory (MEMORY table with MySQL, memcached or at least shared memory)? Statistics for cached addresses can be tracked using an additional variable in the same cache, incrementing it and periodically resetting its value to the main database.

C
ComodoHacker, 2011-05-02
@ComodoHacker

it will be used by users on the forum, by me personally, and by those who just like it.
I want to make a project focused on a large load in order to squeeze everything out.

This is an architectural error. If you want it to work, make it as simple as possible.

S
Stepan, 2017-05-03
@steff

How are you doing? Done?
I ask because I want to make a similar service as an experiment.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question