Learning in dynamic environments
The overall goal of the project 'Plural reinforcement learning' (PLURELEARN) was to develop algorithms, theory and applications that use a large number of learning approaches and models in a synergetic way. To realise this goal, the project team identified three objectives: developing a learning approach combining learning from a teacher and learning by trial and error; devising a structure discovery methodology for reasoning about uncertainty in high-dimensional Markov processes; and developing approaches for algorithm selection and mini-strategies. The team made progress in meeting these objectives. Research on the first objective resulted in papers on how to use a tutor or expert advice in reinforcement learning paradigms. The work showed new algorithms for the problem of learning from multiple sources, as well as how the algorithms work in medium-scale applications. The problem of structure discovery (objective 2) proved to be quite complex. After developing theoretical and applied aspects of model selection and structure discovery showing the difficulty of detecting dynamic structure, the team developed two approaches for mitigating risks. The first is based on policy gradients and geared toward problems where a simulator is available. The second is based on a robust optimisation approach, where the focus is on a couple of uncertainties between states. For the third objective, researchers designed two strategies that may lead to improved performance. The first was a way to modify options and then generate new, improved options. The second was a way to make use of 'randomly generated' options to expedite planning and learning. The project was successful in developing a new framework for planning and learning in data-driven, variable environments. The research has the potential to open up opportunities for large-scale optimisation of dynamic systems that could have a significant impact on the scale of problems that can be solved.
Keywords
Learning, dynamic environments, uncertainty, reinforcement learning