C
C
ChemAli2011-05-16 10:23:05
Data synchronization
ChemAli, 2011-05-16 10:23:05

Intensive data transfer from LAN to website database (synchronization)

The goal is to keep the information on the website up to date. A unit of information is, roughly speaking, a page with a number of related attribute parameters (say, title, tags, date, etc., it doesn’t matter).

Only changing data gets to the site, not the entire database. Information is updated live. Ideally, you need to reflect on the site the same state of the database in which it is located on a server within the corporate network. That is, each time a unit is edited, it is sent to the site.

There is no experience, I see such a solution: the changed units are queued, from which they are sent one by one or in batches to the web server with a POST request containing the unit (s) of information in the form of XML or JSON, where they are picked up by the importing script. Nothing is received back except for a signal that the data was successfully received. If the connection is lost, the queue fills up and when the connection is restored, sending resumes.

  • Flow rate up to 1500 units per hour.
  • The amount of data in one unit is ~ 3 kb + service garbage from xml / json.
  • Requirements for stability are minimal: full update after a connection break within an hour or two.

Is this approach normal? If not, how can it be done better?

Perhaps there are standard, generally accepted solutions that have slipped away from the firing sector?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
C
ComodoHacker, 2011-05-16
@ChemAli

The best option for replication is no replication. :) Because this is another potential point of failure in the system that needs to be monitored, another source of headache for both admins and developers. That is, the solution - the site takes data directly from the corporate database.
If for some reason this does not suit you (think about the reasons again), then do replication using the database. It will be more reliable than you can write and test yourself in a reasonable amount of time.
If the database does not know what you need, or you just really want to make your own bike, then go ahead. First select a pull or push model. Based on your requirements, pull is better, but less secure. It's not worth messing around with xml/json, write directly to the database. Paste monitoring code everywhere. The change queue can overflow if there is an extended outage. In case of a big out of sync, it's good to have a "complete rewrite of everything" code.

S
strib, 2011-05-16
@strib

Look in the direction of data replication by means of the database itself. For example, roll forward redo-log.

P
Puma Thailand, 2011-05-16
@opium

Fuck replication in general, do everything in one database and keep its backup just in case (or better, two).
For accessibility, make a second channel to the office.
1500 for 3 kb is 4.5MB, the traffic is worthless, even a backup 3g or wimax will pull.

K
kader, 2011-05-16
@kader

I see the danger of such an approach. If the answer does not come from the server, for some reason, but the post actually does, then there is a high probability of getting a duplication of sending, that is, for example, 2 identical noses will be posted.

C
ChemAli, 2011-05-17
@ChemAli

The paradigm has changed. We will fill in all the data at once on the crown. Thank you all, you are beautiful!

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question