Skip to main content

Reinforcement learning via supervised learning

Final Activity Report Summary - RLVSL (Reinforcement Learning via Supervised Learning)

Recall the way you have learned to ride a bicycle. It took several trials, most of which ended with a painful fall, and a lot of experimentation with the steering handle and the pedals, but eventually you were able to drive it anywhere you wanted. This kind of learning by trial and error is known as Reinforcement Learning (RL), where we study how an agent can learn to control its own actions through direct interaction with its environment with the purpose of achieving a long-term goal. This agent may be physical (a robot, a vehicle, a plant, ...) or virtual (software crawling the web, playing some game, routing network packages, ...). The objective is to design autonomous agents capable of learning to accomplish difficult tasks, such as balancing a bipedal robot, driving a car, playing a game, regulating a power plant, etc.

Learning through interaction with an unknown stochastic environment can be really challenging, unlike the problem of learning from a given set of correct training examples (supervised learning). The only feedback given in reinforcement learning comes in the form of a numeric reinforcement signal which evaluates the performance of the agent, without revealing any optimal choices. The agent relies on experience acquired through interaction with the process and must make efficient use of its resources to achieve a satisfactory level of performance quickly.

The aim of the project was to propose new algorithms for reinforcement learning (RL) by exploiting existing mature supervised learning (SL) methods (RL via SL), and apply these algorithms to hard control learning problems in robotics. The project was successful, in general; the new algorithms are able to deliver outstanding performance using only a fraction of resources thanks to an intelligent management scheme. These and other algorithms were tested on hard control problems from the RoboCup domain (robotic soccer) by Team Kouretes of the Technical University of Crete. Worth noting that the team received three awards at International RoboCup competitions.

The findings of the RLVSL project offer a suite of algorithms for performing efficient reinforcement learning on a variety of problems, the means for automatically exploiting new classification technology for reinforcement learning, extensive experimental evidence for the effectiveness of this approach on hard control problems, some theoretical guarantees that the agent behaviour will gradually improve, and publicly-available software for facilitating widespread distribution of this new learning technology.