What collective recognition method should be used to classify numbers/characters?

C

Chichi2016-08-27 08:11:28

Neural networks

Chichi, 2016-08-27 08:11:28

The structure of this question is as follows: first I give the concept of collective recognition, then an explanation of the various methods of group classification that I found, and at the end I already give my question. Those who have already eaten the dog in this case, and they may not need to explain what it is and what methods are, can just look at the headings of the methods I have given and move on to the question.
What is collective recognition/classification
Collective (group) recognition means the use of a set of classifiers, each of which makes a decision about the class of one entity, situation, image, followed by combining and coordinating the decisions of individual classifiers using some algorithm. The use of multiple classifiers generally results in higher recognition accuracy and better computational efficiency.
Some approaches to combining classifiers' decisions:
1) based on the concept of areas of competence of classifiers and the use of procedures that allow assessing the competence of classifiers in relation to each input of the classification system
2) methods of combining decisions based on the use of neural network technologies.
Competence area method
The idea of collective classification based on areas of competence is that each base classifier can work well in some area of the feature space (competence area), surpassing other classifiers in this area in terms of accuracy and reliability of decisions. The area of competence of each base classifier must be assessed somehow. The corresponding program is called "referee". The classification problem is solved in such a way that each algorithm is used only in its area of competence, i.e. where it gives the best results compared to other classifiers. In this case, only one classifier's decision is taken into account in each area. However, it is necessary to have some kind of algorithm that, for any input, determines which of the classifiers is the most competent.
One of the approaches assumes that together with each classifier a special algorithm (referee) is used, which is designed to assess the competence of the classifier. The competence of a classifier in a given area of the space of representation of objects of classification is understood as its accuracy, i.e. the probability of correct classification of objects whose description belongs to this area.
The general scheme of teaching collective recognition based on competence assessment consists of 2 steps (Fig. 1). At the 1st step, each specific base classifier is trained and tested. This step is no different from regular learning patterns. At the next step, after testing each classifier, the training sample that was used at the testing stage for some classifier is divided into two subsets, L+ and L−. In this case, the first subset includes those instances of the original test sample that were correctly classified during testing. The second subset includes the remaining instances of the test sample, i.e. those that were erroneously classified. Considering these data sets as areas of competence and incompetence of the classifier, respectively, they can be used as training data to train the "referee" algorithm. When classifying new data, the task of the referee is to determine for each input example whether it belongs to the area of competence of the algorithm or not, and if so, what is the probability of correctly classifying this example. After that, the referee instructs the most competent classifier to solve the classification problem.
Neural Network Approaches
Neural network approaches to collective classification are divided into methods that use the combination of classifiers with the help of a neural network, ensembles of networks (ensembles of neural networks) and those that use neural networks built from modules.
Neural network to combine classifiers
One of the approaches considers the use of a neural network to combine solutions of basic classifiers (Fig. 2).

The output of each base classifier is a decision vector (a vector containing “soft labels” as values) whose element values belong to a certain numerical interval [a, b]. These values are fed to the input of the neural network (it must be trained to combine the decisions of the base level classifiers), the output of which is a decision in favor of a particular class. The output of the network can also be a vector, the dimension of which is equal to the number of classes of recognizable objects, which at each position has the value of a certain confidence measure in favor of one class or another. In this case, the class with the maximum value of such a measure can be chosen as a solution.
The system for combining solutions operates as follows:
1) a set of basic classifiers is selected and trained;
2) meta-data are prepared for training the neural network. To do this, the base classifiers are tested using an interpreted data sample and for each test case a vector of decisions of the base classifiers is formed, to which a component is added, in which the name of the true membership class of the tested example is entered;
3) a metadata sample is used to train a neural network that performs decision aggregation.
Method of modular neural networks
For modular neural networks, it is proposed to use the so-called “gating network”, a neural network to assess the competence of classifiers for a specific input data vector presented to the classifiers. This option considers a neural network paradigm for combining decisions based on competency scores. The corresponding theory is here called "mixture of experts". Each classifier is assigned a “referee” program that predicts the degree of its competence in relation to a specific input supplied to the input of a set of base level classifiers (Fig. 3).
Depending on the input vector X, solutions from different classifiers can be selected and used to make a combined decision. The number of predictive network inputs is equal to the dimension of the feature space input vector. The number of network outputs is equal to the number of classifiers, i.e. L. The predictive neural network is trained to predict the measure of competence of each classifier when presented with a specific input vector, i.e. an assessment of the fact that the classifier produces the correct solution. The degree of competence is estimated by a number from the interval [0,1].
Ensembles of neural networks
Also, the architecture of the decision fusion system is proposed, which consists of several experts (neural networks). Combining the knowledge of neural networks in an ensemble has proved its effectiveness, demonstrating the promise of using collective recognition technologies in overcoming the problem of "fragility".
An ensemble of neural networks is a set of neural network models that makes a decision by averaging the results of individual models. Depending on how the ensemble is constructed, its use allows solving one of two problems: the tendency of the underlying neural network architecture to underfit (this problem is solved by the boosting meta-algorithm), or the tendency of the underlying architecture to overfit (the bagging meta-algorithm).
There are various universal voting schemes, for which the winner is the class: 1) maximum - with the maximum response of the ensemble members; 2) averaging - with the highest average response of the ensemble members; 3) majority - with the largest number of votes of the members of the ensemble.
Some other methods:
There are also ensemble machine learning algorithms such as:

Random forest (consisting of using a committee (ensemble) of decision trees)
Adaboost (algorithm for strengthening classifiers by grouping them into a committee, proposed by Yoav Freund)

My question
The question is what is the best collective recognition scheme to use for character/number/number recognition. The data sources from which I took information about the various group classification schemes date back to 2006 and I am afraid that some of the methods may be out of date. Which scheme would be more rational to use in terms of the relevance of any method.
Which of the approaches can potentially give the best indicators of accuracy and performance in the field of character / digit / license plate recognition. Perhaps some methods are outdated or have proved to be ineffective in certain areas, and I would like to know from people who understand this. Perhaps there are other more effective and relevant methods of collective recognition (group classification), which I did not mention earlier.
Sources with a more detailed description (from there I took information about various methods of collective recognition):
Methods and algorithms for collective recognition
Actual issues of using convolutional neurons ...

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Arseny Kravchenko, 2016-08-28
@ChicoId

Usually for pattern recognition, incl. symbols / numbers use separate neural networks (without ensembles). Specific implementation details depend on all the features of the task (for example, where the images come from, how noisy they are, etc.)

A

Alexander Skusnov, 2016-08-28
@AlexSku

Example from MatLab.