European Commission logo
English English
CORDIS - EU research results
CORDIS

Provably Efficient Algorithms for Large-Scale Reinforcement Learning

Periodic Reporting for period 1 - SCALER (Provably Efficient Algorithms for Large-Scale Reinforcement Learning)

Reporting period: 2021-10-01 to 2023-03-31

The project aims to address the lack of theoretical guarantees in a key area of artificial intelligence research called reinforcement learning. Until the beginning of the project, the majority of progress in this area has been empirical, without any theoretical guarantees whatsoever. Indeed, theoretical guarantees have been restricted to small problems of no practical interest, thus hindering the applicability of this technology. The project aims to extend these theoretical guarantees to large-scale scenarios capturing more applications than has been possible before, thus making RL-based learning systems safer and more predictable for use in the real world.
The project has made good progress on all objectives outlined in the proposal, closely following the original working plan. In particular, several core contributions have been made by the team in the following topics:

A) Regret analysis of information-theoretic methods in contextual bandit problems. So far this project has resulted in one published paper [Neu, Olkhovskaya, Papini and Schwartz, NeurIPS 2022], with a follow-up being currently in development. The published results contribute to WP2 in Thread A of the project proposal, and the follow-up aims to also make progress in WP3.
B) Optimistic planning methods for large-scale infinite-horizon reinforcement learning under realizability. This project has resulted in one published paper [Moulin and Neu, ICML 2023], with a follow-up being currently in development. These results are directly based on the methodology outlined in WP1 and WP4 of the proposal, and achieve several sub-goals set out in these work packages.
C) Large-scale planning methods for Markov decision processes under approximate realizability. The first set of results has been already published [Neu and Okolo, ALT 2023], with follow-up work being currently under review. These results contribute to WP4 and WP5, and the follow-up makes steps towards addressing the challenges outlined in WP6.
D) Off-policy optimization in contextual bandit problems and reinforcement learning. The first set of results has been already published [Gabbianelli, Neu and Papini, ALT 2023], with follow-up work being currently under review. These results do not directly fit into the work packages outlined in the original proposal, largely due to the evolution of the research field over the past years, but the methodology we followed was closely inspired by WP1 and WP4.

Besides these activities that are of core importance to the original project proposal, the PI has also worked on developing a theory of generalization in statistical learning using techniques from online learning and convex analysis [Lugosi and Neu, COLT 2022]. This discovery has several potential applications in sequential decision-making, particularly in designing and analyzing information-theoretic methods for reinforcement learning, and it will serve as a strong foundation for achieving the objectives of the project in the remaining years.
All the above results above progressed the state of the art in important areas of reinforcement learning. In the remaining period, we will continue to build on the above results along the lines explained in the description of action. Besides the directions explained in said document, the progress in the first period has lead to some discoveries that will have a strong influence on how the rest of the project will be shaping up. In particular, the results connecting the generalization capabilities of statistical learning algorithm can be directly generalized to studying the performance of so-called information-theoretic sequential decision-making methods, which will have a direct impact on the future research of the entire team. We expect that these techniques will enable us to address a class of large-scale reinforcement learning problems that have been so far out of reach for traditional RL theory.
scaler.png