How to properly organize the web server architecture?

L

loly2016-04-05 01:36:26

Django

loly, 2016-04-05 01:36:26

At the moment I am a junior and unfortunately I have to resort to the help of more experienced specialists (unfortunately because I do not like to disturb other people). The project is currently not commercial, the goal at this stage is its development. The main goal is a detailed search in the database. The database stores objects with a maximum of 20 different characteristics. Search is possible by each characteristic. User registration and login must be present.
Base replenishment sources:
1) Every 5-10 seconds JSON is 5.5MB on average. Information can be either completely new or replacing the old one. At the moment, detailed statistics are not known, but approximately half of the information will have to replace the old one. After a set of a certain database, there will be almost no new information. The first source builder will be written within the next day (say 24 hours). As soon as the approximate weight of all the information is known, I will add it with a comment.
2) Parser of a certain page every minute (required information in the amount of ~ 0.2 MB).
An example of a search (absolutely fictitious, the topic is different):
[Speed > 90 and Speed < 150 and Lifetime < 5 and [Type "A" or Type "C"] and ... and Cost < 10 and Amount([something]) > 72] or [similar filter with own parameters] and sorting by [parameter]
An example of an object (absolutely fictitious, the topic is different):
[Speed \u003d 134, Service life \u003d 2, Type \u003d "A", ..., Cost \u003d 80
] at the very beginning of development, one should not make a mistake when choosing tools:
1) How best to organize the structure? For example, parsers should obviously be separated into a separate application, but I don't know how yet.
2) What is the best web server to use? Node.js or django?
3) The most important. What database to use? The result is needed instantly, i.е. without waiting, and not after 5 minutes and not even after 10 seconds. The faster the better.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dmitry Entelis, 2016-04-05
@loly

Oh. Well, let's go ahead. Let's start from the end :-p
3) There is no instant search. So you need to immediately understand the required speed. Someone and 1500ms "instantly", and someone wants to get data in 10ms.
Critical questions:
- how much data is in the database
- how many searches occur per second
- how selective search queries are (the answer to a search query is units of records or tens of thousands)
- how consistent and relevant the search output should be, taking into account constant updates.
- how much money is there for servers ;-)
If there is not enough data (< 1gb plus or minus) I would not take a steam bath and write it all down in normal mysql by hanging 100500 indexes on it. Next, you need to measure the performance for writing-reading, if everything suits you, stop there.
If there is more data and you want to learn something new, I would look towards Elastic Search.
The guys from 2gis just implemented it a couple of years ago https://habrahabr.ru/company/2gis/blog/213765/ documentation on it is a sea. Of the minuses - issuance will always lag behind hot data.
If you need a search for hot data and at the same time the speed of mysql does not suit you - I don’t have a good answer :) You can look at some kind of cassandra, you can cut your bike, but it’s difficult for me to advise personally.
2) Node, python, ruby, php - absolutely a matter of taste. The main load (if we are not talking about bicycles) will still go to the database. And it is better to write bicycles in C ++.
1) According to your post, I have more questions than answers to be honest. What kind of json, where do they come from, how will conflicts be resolved if new jsons arrive faster than old ones are processed, etc.
In general, this is a more trivial question than the task of a quick search.

T

ThunderCat, 2016-04-05
@ThunderCat

If you know a finite number of characteristics - make one table, ideally - the fields should already be known. Then it is easier to work with indexes and you do not need to connect table relationships. This will seriously speed up the work of the database.
1. The structure will depend on the environment, django node or puff - they will all approach the task differently due to the limitations / advantages of the environment. For puff - take a half-finished item, for example, laravel, although the ears of the highload stick out of the task, here, ideally, you need to write your own smart bike. But mvts - without a doubt. For the rest - do not know what is fashionable now.
2. Node and dzhang are not servers, but frameworks.
3. I would look at the muscle. With the right settings, the speed of modern rbds is almost identical, and the information on setting up / working with a muscle is simply over the edge. The community is wide and quite responsive, there are specialists on the market. If you need it reliably / quickly / supported, then I would not rush into the exotics of the database memory type. The result is instantaneous - this is only for purgen, any software component has a delay for searching, indexing, internal data transfer, structuring and other overhead garbage.
PS: who is the login???