k-tree
E-learning book

Reinforcement learning

The essence of reinforcement learning is to put the machine in the real conditions in which it should find a solution or a way out.

Used for:

  • Self-driving cars
  • Robot Vacuum cleaners
  • Games
  • Automatic trading
  • Enterprise resource management

Reinforcement learning to some extent can already be compared with real artificial intelligence. Training with reinforcement, they are used for such tasks in which the goal is not just data analysis, but living in a real environment.

By environment we can also understand video games. There are robots that play games. And also the environment can be and the real world. For example, in Tesla cars there is an autopilot that learns not to hit pedestrians. Or robot vacuum cleaners, whose main task is to vacuum the floor with maximum efficiency.

The knowledge that is loaded into such robots is mostly useless and loaded into it for reference filling. In general, no matter how much data he collects, in the end he will not be able to foresee all situations it will work. That is why the goal is to minimize errors, and not to calculate all possible moves. To the robot it is necessary to learn to exist in space with maximum benefit.

The essence of reinforcement learning is to teach the robot to survive in the environment in which it was placed. The smartest robots learn exactly this way: they are placed in conditions similar to real ones, inhabit the virtual space with random people and objects, and the robot begins to learn in such a space. When the robot has shown good results in the virtual space, it is directed to the real world.

The machine does not need to remember the city – this approach is called Model-Free. In training with reinforcement robot he does not remember every movement, he tries to generalize the situation in order to get out of it with maximum benefit.

This idea is the basis of an algorithm called Q-learning and its derivatives (SARSA and DQN). The letter Q means Quality, that is, the machine learns to act in the most qualitative way in any situation, and all situations are remembered by them, as a random process.

The machine checks millions of simulations in the environment, after which it remembers all the situations and exits from them, in which the maximum benefit was obtained. But a natural question arises, how does the machine determine when the situation has developed with benefit, and when it is completely new? The answer to this question does not exist. Researchers are constantly working on this issue, inventing various ways. In some cases, all kinds of situations are prescribed manually, which allows them to handle certain exceptional cases. In other cases, they give this work to neural networks so that they find everything on their own. Thus, instead of Q-learning, Deep Q-Network (DQN) appeared.

Reinforcement Learning for ordinary users looks like a real intelligence, due to the fact that the robot makes decisions independently in real conditions.

Unfortunately, it has not yet been possible to come up with tasks in which machines would be much more efficient than others, while great for all kinds of experiments.


Do you find this article curious? /

Seen: 4 407


Read the following
Ensembles