Classical learning
The first machine learning algorithms appeared in the 50s of the 20th century, they came from the field of statistics. Algorithms they solved formal problems, they looked for patterns, evaluated the proximity of points in space and calculated their direction. Today, many things in the work of the Internet are based on such classical algorithms: recommendations, blocking suspicious transactions, personalized advertising - it all happens with the help of algorithms classical learning. Classical algorithms are very popular, and at the same time very simple. Currently they are, as the basics of arithmetic, they are always necessary.
Learning with a teacher
In classical education, special attention is paid to "learning with a teacher". In this case, the machine has a teacher who tells you how to do it correctly. He shares and explains the data, and the machine learns from specific examples. Marked-up data is most often used as a teacher, as a rule, a person prepares them.
When teaching without a teacher, a machine is presented with a lot of data without explanations and a machine I have to find patterns on my own.
Of course, when learning with a teacher, the speed of learning increases and the training is better, which is why when solving business tasks training with a teacher is used much more often. The tasks themselves are divided into two types: classification – prediction of the category of an object, and regression – prediction of a place on a numerical line.
Classification
Classification separates all objects by a feature that is known in advance. Color to color, parts documents by language, musical compositions by genre.
Today, the classification is used for:
- Spam filters
- Language definitions
- Search for similar documents
- Identification of suspicious bank transactions
Popular Algorithms: Naive Bayes, Decision Trees, Logistic Regression, K-nearest Neighbors, Support Vector Machines
Spam Filters and Naive Bayes
Previously, all spam filters worked on the Naive Bayes algorithm. The machine counted how many times suspicious words they occur in spam, and how many times in normal emails. Multiplied these two probabilities by the Bayes formula, I added up the results of all the words and marked "spam".
P(A|B) = P(B|A) · P(A) / P(B) — Bayes formulaLater, spammers learned to bypass the Bayes filter by simply inserting a lot of words with "good" ratings at the end of the letter. The method received the ironic name Bayes Poisoning, and other algorithms began to filter spam. But the method is forever it remained in textbooks as the simplest, most beautiful and one of the first practically useful.
Classifying things is the most popular task in machine learning. The machine learns to separate things according to certain factors.
For classification, a teacher is needed, who in turn will mark up the data by signs and categories, and a machine will learn to identify data by these categories. In the future, you can classify anything.
Decision trees
For more complex tasks in which it is impossible to blindly trust the answer of the machine without explanation came up with "Decision trees". The principle is that the machine automatically divides the data into questions, the answers to which are "yes" or "no". The adequacy of the questions from a human point of view may not be quite right, but the machines think through the questions so that the separation is the most accurate.
Thus, a tree of questions is obtained. If the level is higher, then there will be a more general question. "Decision trees" are involved in areas where there is high responsibility: diagnostics, medicine, finance.
"Decision trees" in their pure form are rarely used, as a rule, they are combined with larger systems, which in combination gives a result that can surpass neural networks.
Support Vector Machine (SVM)
The most popular method of classical classification is the Support Vector Machine (SVM). With the help he was classified all kinds of plants, faces in photographs, documents by subject. For many years he was the leading classifier.
The essence of the idea of SVM is quite simple and consists in the following – it is looking for how to draw parallels between the categories, so that the largest gap appears.
There is a downside to this classification – it is the search for anomalies. When one or another feature of the object strongly does not fit in, he stands out and gives the command that there are deviations. By teaching the computer "how right" is automatically obtained by the reverse classifier – "how wrong".
Today, neural networks are more often used for classification, since they were invented specifically for such tasks.
Regression
Regression is used today to solve such problems as:
- Forecast of the value of securities
- Demand analysis
- Diagnoses
- Dependence of number on time
Regression itself is a classification that predicts a number instead of a category. The cost of the goods by certain signs, the volume of demand for goods from the growth of the company, etc. With the help of regression , ideally solved tasks where there is a dependence on time.
Regression is liked to be used in the financial sector and analytical, it exists even in Excel. The principle of operation simple, the machine tries to draw a line that reflects the average dependence. Unlike humans, counting occur mathematically accurately – the average distance to each point is calculated and trying to please everyone.
Regression and classification have similarities, it is confirmed by the fact that many classifiers after improvements turn into regressors. For example, you can not just look at which class an object belongs to, but remember how close it is. Thus, regression appears.
You can read more about regression in the article регрессия
Teaching without a teacher
Teaching without a teacher was invented a little later, in the 90s of the 20th century. In practice, it is used less often, at the same time, there are tasks in which there is simply no choice.
When there is no markup and it is impossible to implement it, you can use teaching without a teacher, while in practice this option does not show good results.
Unsupervised learning is used as a method of data analysis, not as a basic algorithm. A person uploads a lot of data and observes whether clusters, dependencies appear or not.
Clustering
Clustering separates objects by an unknown attribute, in this case the machine independently decides how "better".
Today clustering is used for:
- Market segmentation (types of customers, loyalty)
- Combining close points on the map
- Image compression
- Analysis and markup of new data
- Detectors of abnormal behavior
Clustering is a classification, but there are no pre–known classes in it. She searches for similar ones herself objects and combines them into clusters. The number of clusters itself can be set in advance, or you can trust it's a car. The machine determines similar objects by the signs that were marked up to it.
Markers on maps are a good example of clustering. When it is necessary to find a certain place, the algorithm groups and selects them.
For example, you can also take image compression. This is kind of a popular problem. When saving an image in PNG, you can set the palette, for example, to 32 colors. Thus, clustering will find all the "approximately red" pixels of the image, and calculate the average from them, replacing all the red ones with it, which will serve to reduce the file.
Unfortunately, there are colors that are not so easily attributed to a particular color. In this case , it comes Another popular clustering algorithm is the K-Means Method. When using this method 32 dots are randomly thrown onto the color palette, which are called centroids. And all the other points refer to the nearest centroid of them. It is convenient and simple to search for centroids, but in real problems clusters can be completely different.There is another method, such as DBSCAN, it finds clusters of points and builds clusters around them.
Both classification and clustering can also be used as an anomaly detector. If the user's behavior differs from "normal", then you can block it and check whether it is a bot or not. At the same time, user actions are simply unloaded into the machine, and it independently analyzes the sign "normal".
This approach does not work exactly in comparison with the classification, but its application is possible.
Dimension reduction (generalization)
When generalizing, specific features are collected in a higher-level abstraction. To date, they are used for:
- Recommendation systems
- Beautiful visualizations
- Determining the subject and searching for similar documents
- Analysis of false images
- Risk management
From the very beginning, heavier methods were used, large numbers of digits were loaded, and the command was given to find they contain interesting information.
At the moment when plotting stopped bringing the desired result, it was decided to teach the machines search for information instead of people. Thus, there were methods that were later called Dimension Reduction or Feature Learning.
The undeniable practical benefit of these methods is that it is possible to combine several features into one and obtain a certain abstraction.If we look at the example of a dog, then data is taken on the shape of the ears, the shape of the nose, tail and trunk, and from these data the machine outputs an abstract result in favor of a particular breed.
In this case, information about a specific breed is lost, but a newer useful abstraction is obtained. And plus to learning everything on a smaller number of dimensions progresses much faster.
This tool is perfect for determining the subject of the text (Topic Modeling). It turned out to abstract up to the level of certain, specific meanings, and it turned out to be done even without involving a teacher with a list of categories.
This algorithm was called Latent Semantic Analysis (LSA), the idea of which was that the frequency of reading a word in the text depends on the topic. When researching scientific articles, there are more technical terms in them, if news about politics is analyzed, then the names of politicians are often found in the text. At the same time, it would be possible to take words from the article and cluster it, but in this case, specific ones would be lost connections between words. Words such as, for example, battery and battery, in different documents, these words mean it's the same thing.
Unfortunately, the accuracy of such a system leaves much to be desired.
When it is necessary to combine words and documents into one attribute, in order not to lose touch, a method was invented, which was called Singular Value Decomposition (SVD). This method copes with the task quite easily, it reveals useful thematic clusters of words that occur together.
Another popular application of the dimensionality reduction method was found in recommendation systems and collaborative filtering. As it turned out, if we abstract these evaluation systems, then a good one comes out a system of recommendations for a particular phenomenon, service or product, in general, anything.
The system that turned out is hard for a person to perceive, while when researchers were more closely new signs were considered, it was found that some of them clearly correlate with the age of the user, certain genres of cinema, etc.
A machine that knew nothing but user ratings was able to achieve very high results, completely not understanding them.
The method of searching for rules (associations)
The essence of this method is to search for associations and is used for:
- Forecast of stocks and sales
- Analysis of goods purchased together
- Product placement on shelves
- Analysis of behavior patterns on websites
In this approach, all methods of analyzing marketing strategies and other similar sequences are used.
Let's say a customer took a bottle of water in the far corner of the store and then goes to the checkout. Is it possible to put on his way, what do people usually take along with water? Surely there are goods that are obviously being bought together. This fact may not always be obvious, while the correct placement of goods on the shelves may bring additional profit.
Online stores work on the same principle, but they have the task of predicting which product the buyer is after will he come back next time?
The search for rules is not the best among all methods. The classic way is to sort through all the purchased goods using trees and sets. And the algorithms that work do not know how to generalize or reproduce information on new examples.