Neural networks and deep learning
Deep learning together with neural networks surpass all the listed algorithms, but it's hard enough to come up with, where it can actually be useful.
Currently used for:
- Instead of all possible algorithms
- Object definitions in photos and videos
- Speech recognition and synthesis
- Image processing, style transfer
- Machine translation
A neural network is a specific set of neurons and connections between them. A neuron is represented as a function with multiple inputs and only one output. His task is to take the numbers from the inputs and then execute on them the function and then pass the result to the output. The simplest example of a useful neuron is the ability to sum up all the digits from the inputs and if their sum is greater than N – give one to the input, otherwise – zero.
Connections are understood as channels through which numbers are sent by neurons. Each connection has a weight – this is the only parameter that can be conditionally represented as the strength of the bond. When the number 20 passes through the connection weighing 0.5, it turns into 10. And the neuron does not understand what exactly came to it, it sums up everything in order to control which inputs it needs to respond to and which ones it does not.
In order for the connection not to turn into a chaotic function, the neurons were connected in layers. Inside one layer are neurons not connected, but connected to the neurons of the next and previous layer. The data of this network is followed strictly in one direction – from the inputs of the first layer to the outputs of the last.
When creating a sufficient number of layers and the correct weight placement in such a network, we get the result, which we strive for.
Neurons are not prescribed in programming, everything is represented by matrices and is considered matrix products, since speed is necessary. The linear algebra section deals with working with matrices.
When a network is built, the main task is to place the weights correctly so that the neurons can respond to the right signals.
With a random arrangement of weights, a random answer is given, and we compare how much the result differs from the one we need. Then we follow the network in the opposite direction, from the outputs to the inputs and give a specific command to each neuron.
After a hundred such cycles, there are hopes that the weights in the network will be corrected in such a way that the result would be with minimal error. Scientifically, this approach is called Backpropagation or "The method of error back propagation". It took scientists twenty years to discover this approach, and before the discoveries, neural networks were trained as best they could.
A well-trained neural network can pretend to be any algorithm and often even work more accurately. This versatility made them very popular.As it turned out further, training a network with a large number of layers requires capacities that were impossible for those times. To date, any games exceed the capacity of the previously existing datacenter. Exactly at that moment, when there was no hope for an increase in capacity, everyone was very disappointed in neural networks.
When the convolutional neural network showed itself at the ImageNet competition in 2012, everyone in the world suddenly remembered about the methods of deep learning, which were described back in the 90s of the 20th century.
The differences between deep learning and classical neural networks were in the new learning methods that began to cope with large network sizes. At the same time, today only theorists share what kind of training can be considered deep, and what is not. In practice, libraries such as Keras, TensorFlow and PyTorch are used. Data libraries are more convenient than anything that existed before. In practice, they are simply called neural networks.
There are two main neural networks today - convolutional and recurrent.
Convolutional neural networks (CNN)
Convolutional networks are at the peak of popularity. They are used to search for objects in photos and videos, for face recognition, or style transfer. CNN is used wherever photos or videos are present.
At the same time, there is a problem with the images and it lies in the fact that it is not clear how to allocate signs on them. If the text can be broken down into sentences, take the properties of words from dictionaries, then with images I had to do the marking manually, explaining to the machine where this or that object is in the photo. This approach was named "handcrafting features".
There are a number of problems with "handcrafting features":
Firstly, if the subject in the photo is rotated or slightly closed, then the neural network no longer sees anything.
Secondly, it is quite difficult to name at least ten signs that distinguish some objects from others right off the bat. In the human brain, processes are evaluated by many different signs, which he does not even think about, and that is why he is not able to explain them to the machine.As a result, it turns out that the machine needs to learn to look for these signs on its own. Operations to search for the slightest signs are called convolution, from here the name of the method comes. Convolution is represented as a layer of a neural network, since a neuron is absolutely any function.
When a set of signs passes through the neural network, it automatically selects those combinations that it saw mostly. And it doesn't matter to the car whether it's a straight line or a complex geometric object, something will definitely happen activate brightly.
As a result, it turns out that the machine will see which combinations have been activated and analyze for which subject, they are characteristic of an animal or phenomenon.
The whole beauty of this idea is that in total it turns out a neural network that can independently find characteristic signs of objects. It turns out that we no longer need to do it manually. We can load it into the car data, and the network itself will make feature maps and learn to identify anything.
Recurrent neural networks (RNN)
Recurrent neural networks are the second most popular architecture today. Thanks to them , there are such useful things as machine translation of texts, as well as computer speech synthesis. With the help of them , everything is solved tasks related to sequences.
Previously, there was such a voice synthesizer from Windows XP called Microsoft Sam. And today there are much more advanced programs, such as Alice from Yandex, or Alexa from Amazon. They're not just they pronounce words without mistakes, they know how to place accents in a sentence.
The thing is that modern voice assistants are taught to speak not in letters, but in phrases. At the same time, it is impossible immediately force the neural network to issue phrases in their entirety, since in this case it will have to memorize all the phrases in the language, and these are large-scale dimensions. And in this case, the fact comes to the rescue that text, speech or music is all sequences. All words or sounds are independent units, but they depend on the previous ones. When such the connection is lost, everything collapses.Teaching a neural network to pronounce individual words or letters is not a difficult task. You can take a lot of audio files marked up into words and train them on the input word, give out sequences of signals that are similar to his pronunciation. Then everything is compared with the original from the speaker and as close as possible to the ideal. A perceptron is also suitable for such a task.
At the same time, since the perceptron does not remember what it generated earlier, there is no sequence. Every launch is like the first time for him. When they came up with the idea of adding memory to each neuron, they were invented recurrent networks in which all neurons memorized their previous responses. And when were they launched in a new way, they used the information they remembered as an additional input. It turns out that the neuron could give itself a command to the future that this or that action needs to be done differently.
With such work, only one problem appeared – remembering all the past results, a large number was formed in sets the number of inputs, and it turned out that it became impossible to train such a large volume of connections. As a result , it turned out, that the neural network cannot be trained, since it does not know how to forget.
Initially, this problem was solved like this – the neurons had their memory cut off, but later it was invented as "memory" use special cells that are similar to computer memory or processor registers.
With such work, each cell could write a digit into itself, read or reset – they were called long cells and short-term memory (LSTM).
At the moment when the neuron needed to put a reminder to itself in the future, it recorded this information in a cell, and when the history became unnecessary, the recorded data was reset in the cells and remained only "long-term" connections, by analogy with the classical perceptron. As a result, the network has learned not just to install current connections, but also set yourself reminders.
To train neural networks, they began to take a lot of voiced texts from various sources, and as it turned out, for a neural network to imitate a voice is a fairly simple task today. With the video, everything is more complicated, but they are being conducted work in this direction as well.
To get more detailed information, you can read the article Neural Network Zoo, it contains all types neural networks.