Periodic Reporting for period 1 - SCALER (Provably Efficient Algorithms for Large-Scale Reinforcement Learning)
Reporting period: 2021-10-01 to 2023-03-31
A) Regret analysis of information-theoretic methods in contextual bandit problems. So far this project has resulted in one published paper [Neu, Olkhovskaya, Papini and Schwartz, NeurIPS 2022], with a follow-up being currently in development. The published results contribute to WP2 in Thread A of the project proposal, and the follow-up aims to also make progress in WP3.
B) Optimistic planning methods for large-scale infinite-horizon reinforcement learning under realizability. This project has resulted in one published paper [Moulin and Neu, ICML 2023], with a follow-up being currently in development. These results are directly based on the methodology outlined in WP1 and WP4 of the proposal, and achieve several sub-goals set out in these work packages.
C) Large-scale planning methods for Markov decision processes under approximate realizability. The first set of results has been already published [Neu and Okolo, ALT 2023], with follow-up work being currently under review. These results contribute to WP4 and WP5, and the follow-up makes steps towards addressing the challenges outlined in WP6.
D) Off-policy optimization in contextual bandit problems and reinforcement learning. The first set of results has been already published [Gabbianelli, Neu and Papini, ALT 2023], with follow-up work being currently under review. These results do not directly fit into the work packages outlined in the original proposal, largely due to the evolution of the research field over the past years, but the methodology we followed was closely inspired by WP1 and WP4.
Besides these activities that are of core importance to the original project proposal, the PI has also worked on developing a theory of generalization in statistical learning using techniques from online learning and convex analysis [Lugosi and Neu, COLT 2022]. This discovery has several potential applications in sequential decision-making, particularly in designing and analyzing information-theoretic methods for reinforcement learning, and it will serve as a strong foundation for achieving the objectives of the project in the remaining years.