Ensembles
"Ensembles" of methods allow machines to learn and correct each other's mistakes, currently used in the following areas:
- Replacement of classical algorithms (but they work more accurately)
- Search engines
- Computer vision
- Object recognition
In this section we will talk about the highest achievements of these methods. Ensembles together with neural networks are the main tools that show the most accurate results, they are used by all large corporations and companies. At the same time, many people have heard about neural networks, and information about ensembles is not so widely distributed.
These methods are very effective, while the essence of the idea is quite simple. As it turned out, if you take a few not the most effective tools and teach them to correct each other's mistakes, then the quality of the resulting system it will be much higher than each one individually.
It is especially better even when the algorithms that are taken are as unstable as possible and strongly float from the input data. Therefore, "Regression" and "Decision Trees" are most often taken, they lack one strong anomaly in the obtained data to make the whole model work.
The ensemble can be assembled in any way, you can even randomly assemble classifiers and apply regression. At the same time, no one can guarantee accuracy. That is why there are three ways to create ensembles.Stacking
The idea is to train several different algorithms and pass their results to the input of the latter, who is already making a decision.
Due to its inaccuracy, stacking is used in practice much less often than other methods, which, as a rule,, more accurately.
Begging
This method is also called Bootstrap AGGregatING. One algorithm is trained many times on random samples from the data, and at the end the answers are averaged.
With random samples, the data may well be repeated. When using the 1-2-3 set, you can make selections 2-2-3, 1-2-2 and so on. On such samples, the same algorithm is trained several times, and at the end the answer is calculated by simple voting.
The most popular example of Begging is the Random Forest algorithm. When we open the camera on the phone, we can see how she outlines the faces of the people in the frame with rectangles. The neural network in this case will be it is too slow, and begging in this case copes much better, because it calculates all the data parallel to each other.
Boosting
Algorithms in this method are trained sequentially, each following pays special attention to those cases, where the previous one is wrong.
In the same way as in begging, samples are made from the source data, but in this case it is no coincidence. The new sample takes part of the data on which the previous algorithm worked incorrectly. Thus, he learns from the mistakes of the previous one.
Pros — frantic, even illegal in some countries, classification accuracy, which all grandmothers will envy at the entrance. The disadvantages have already been named — it does not parallel. Although it still works faster than neural networks that are loaded KAMAZ trucks with sand compared to nimble boosting.
We can observe a real example of boosting in the work of the Yandex search engine using this method it outputs the results to us.
Today there are three popular methods of boosting, the differences of which are well conveyed in the article CatBoost vs. LightGBM vs. XGBoost