Choose the architecture of a distributed Yii application?

V

Vitaly Petrov2013-06-13 07:21:46

MySQL

Vitaly Petrov, 2013-06-13 07:21:46

Good afternoon, I have such a task - to develop a web application that allows you to edit a certain database.
Basic technologies: Yii, PHP, MySql, Ubuntu, Apache2
In the future, this application will have to be installed by many customers, so there will be several copies of the databases.
All organizations in which this Software will be installed have a strict hierarchical structure. Let's say ... three-level. Where at the lowest level, the organization should have access only to its data, the organization located one step above - to its own and to the data of all subordinate organizations, and the highest in the hierarchy - in general, to all-all-all data.
From the beginning, of course, I wanted to try simple replication - but it won’t work, since organizations must enter data each at their own level, and it’s impossible to achieve normal master-master replication from MySql so far, as far as I know.
Now the only way I see is to do some kind of sharding. I liked, for example, the solution described in this post habrahabr.ru/post/129780/. But I don't know how to apply it nicely. It would seem that you can not bother with end-to-end numbering of tables, with guid and so on, but simply allow each level of users to work only with their own database. And then, at each new level of the hierarchy, it’s easy to define several connections to the database at the same time and search for data simultaneously in all subordinate databases through Sphinx (with which I’m also never familiar and don’t even know if it is possible to set up a connection to several databases located on it). multiple servers).
But this is all in theory, in practice, for some reason, no ideas come to mind. even the question could not be properly formulated somehow =)
What can you advise?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

T

Timur, 2013-06-29
@vitaly_KF

I have a similar situation. I am developing a SaaS application, where there are many users, while each is his own king, each has its own numbering (id), etc. etc. I also thought about several databases, but as a result I decided to use one database
. In the database, I add (int) personalID and projectID fields to the tables, and then I inherit from this class

abstract class PersonalActiveRecord extends CActiveRecord
{
  public $usePersonalID = true; // Использовать ли персональную нумерацию
  public $personalID = null;
  public $projectID = null;

  protected function personalIdExpression(){
    $sql =
      'COALESCE((SELECT personalID FROM
        (SELECT (MAX(personalID)+1) as personalID FROM '.$this->tableName().' WHERE projectID = '.$this->projectID.')
      as personalID ),1)';
    return new CDbExpression($sql);
  }

  protected function beforeSave(){
    if (!parent::beforeSave()) return false;
      if ($this->isNewRecord) {
        $this->projectID = (int)Y::param('projectID');
        if ($this->usePersonalID)
          if ($this->personalID===null) $this->personalID = $this->personalIdExpression();
      }
    return true;
  }

  public function defaultScope() {
    return array(
      'condition'=>"projectID='".(int)Y::param('projectID')."'",
    );
  }

}

M

Maxim Dyachenko, 2013-06-13
@Mendel

The first thing you need to do is deal with the guid.
I haven't tried it in Yii, but as far as I can remember offhand, there shouldn't be any major barriers to using it.
If there are any developments, model modules, etc., then it will be necessary to accustom them to the new structure. Change the types of fields in the database (including related ones), change auto-increment to generate a guide according to your logic, etc.
Create yourself a base class derived from CActiveRecord, say CDistributedActiveRecord.
This class will be responsible for both guid and replication.
Actually, all ARs inherit from it.
Architecture design must begin with a clear base planning.
It is obvious that some of the tables you have will be relevant in all databases, without filtering. For example, a table containing a list of organizations in your hierarchy does not make sense. In general, the standard logic is usually this - we have common “reference books”, each has its own “documents”. For example, the nomenclature (goods, types of services, etc.) is better attributed to directories and replicated centrally, but for example, the price list from this nomenclature is better considered a document (but again it depends on the specifics). We don’t have to do “documents” and “reference books” as strictly as in 1C, because we are doing our own architecture. For example employees/users, logically, this is a reference book, but we can refer it to a document, and generally make two base classes for them. Again, this depends a lot on your task.
Next, you will need to create a criterion by which you will determine who owns a particular document, and whether it needs to be transferred. In the simplest case, this will be the enterprise/department identifier. It will be very good if you have it simple and universal. Those. that it in all tables which are replicated was identical. Let it be called the Enterprise Owner.
Please note that if higher departments will not only view the data of downstream ones, but also make changes to them, or even worse, create new documents related to them, then you must specify the correct EnterpriseOwner for these documents. If the document implies that it should be read in a subordinate company (for example, it is a document containing a resolution on an application received from a subordinate department), then it is necessary to indicate in such a document the Company Owner equal to the one indicated in the document related to this (i.e. in in our example, the Owner from the "bid"). If you specify an Owner equal to the enterprise in which the document was created, then you will have to fence complex criteria for which document to transfer during replication and to whom to transfer.
Tip - think hard about foreign keys. How will this affect your manual replications.
Now let's move on to replication itself. Here I will make a few assumptions. If they are incorrect, please state them. To begin with, I will assume that you do not need real-time replication. Those. it would be nice if all documents appeared in the same millisecond in another database, but if there is a delay of even a few minutes, then no one will die, data integrity will not be violated, etc. The second assumption is that you will most likely not have many small changes to large records. Those. if you don't create a record but make changes to it, you can still commit the entire modified record, not just the changes, and the extra overhead isn't very high. For example, if you even transfer a file in the database that weighs several megabytes, then you either change the file itself and the size of the change is large, or its description, which occupies a small part of the total volume. Ideally, if you expect a large number of changes to this very description, without changing the "file", then it is better to split this into two related tables than to fence complex replication.
If my restrictions suit you, or you are ready to adapt to them, then we can move on to replication.
For replication, we override the methods for creating and modifying records so that they save our changes to the logs.
Logs can be both for each replicated table, and common for all. Depends on the task. Our logs contain only: an
incremental record number, which is a “hidden time”, which is end-to-end for all tables, perhaps the name of the table, and the guid of the record being changed. We also indicate what happened to the record. for replication purposes, we have two actions - save and delete. We are not interested in whether the record was created, or the old one is being edited - we still transfer it.
Next, we create another table in which we will indicate the nodes with which we have a connection. Anything can be here, including details for communicating with the node, but the required fields should be - a sign of which records to transfer (since some data can be transferred to several nodes at once, it makes no sense to put it in the replication logs) and two numbers of processed records: incoming and outgoing (not to be confused with message numbers in 1s). In these numbers we will reflect which entries from the replication logs we received or sent (or if we have offline replication, what changes we sent and received confirmation of receipt about them). Further, when transferring information to neighbors, we will make a request to the replication log, which will select only data with the required EnterpriseOwner (or without limitation if we send "up") and with a record number greater than the number already processed. Having a list of records that we need to give, we look for them in the corresponding tables, form a package for synchronization, and send it. Upon receiving a packet, we check the records to see if these records have already been processed before (we check by number, especially critical for offline synchronization).
Please note that we do not store all changes, but only the numbers of the changed records. Upon receipt, we either modify the record, or create a new one if there is none, or delete it if it is specified to delete.
How to transfer information? yes whatever. Though by mail throw in the form of json.
Don't forget to clean the "replication logs" from entries that are no longer needed by any of the nodes with which there is a connection (that is, whose numbers are less than ALL outgoing numbers for all nodes). I also want to draw your attention to the dimension of the counter-identifier in this journal. Make it bigger :)
What else is important in a large, distributed system? Carefully considered user rights are important, and LOGS are important. Many large and different dens. Even more logs. Debriefing logs are best kept separate from replication logs. They require more detail than there, on the other hand, they are rarely read, but written often, so indexing is needed to a lesser extent. be sure to indicate who made the changes, when they made them, etc. Ideally, such logs should be kept in a separate database or even in a text file.
Seems like everything I remember so far.

M

Max, 2013-06-13
@7workers

I understand that the option with a single base for some reason is not suitable?