Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Data-Efficient Scalable Reinforcement Learning for Practical Robotic Environments

Periodic Reporting for period 1 - DESlRE (Data-Efficient Scalable Reinforcement Learning for Practical Robotic Environments)

Reporting period: 2018-04-01 to 2020-03-31

The initial aim of the project was to develop algorithms suitable for challenging control tasks. Current algorithms that perform well in simulation typically transfers suboptimally to real test or robots. For example, one of the hot topics in research is termed- sim-to-real transfer, which aims to transfer amazing feats that algorithms can achieve in computer simulations to test time performance. If we can understand how to perform control in a highly stochastic environment, many problems in social decision-making can be solved. The overall objectives are to advance our understanding of the difficulty of such an application of algorithms and developing new ones.
The fellow has developed various new optimization and control algorithms that are designed to handle the hybrid and stochastic environment. Our theory connects to the deep theory of robust optimization and robust control, as well as the robustness of machine learning algorithms such as kernel method. One of the most significant research outcomes was the newly proposed framework of kernel distributionally robust optimization algorithm. This elegant framework is a combination of principled robust optimization theory and kernel machine learning.
Fellow's work presents new insights combining the theory of robust convex optimization and RKHS. It shows the theory of kernel methods can be used to make robust decisions for general decision-making problems. The work adds an interesting piece to both DRO and kernel method literature.

From a practical perspective, we have proposed easy-to-implement algorithms. As we discussed in recent works, one strength of our methods is its wide applicability. Many of today's learning tasks suffer from manifestations of distributional ambiguity. We believe practitioners from industry and business that wish to gain robustness in their learning or decision-making tasks can apply our kernel distributionally robust optimization algorithms.
We aim to design optimization and control algorithm that can hedge against distribution shift