Final Report Summary - HERL (Large Scale Machine Learning for Simultaneous Heterogeneous Tasks)
learning problems. While numerous real-world applications for reinforcement learning methods exist, our focus is on foundational research into how these
methods work and sometimes fail.
The project consisted of three phases. The first was to develop a rudimentary problem generator that was capable of testing a few features relevant for
multi-task learning that had not been represented in any of the existing literature. This generator was to be focused on abstract problem representations, but including some ability to tune the properties of the generated instances. The second point was to build a software platform in which multi-task reinforcement learning algorithms could be applied to the generated benchmark instances and compared using a range of standard and novel performance metrics. The final work package was to then begin the analysis of the resulting data in order to develop an understanding of what features of the generated instances corresponded to better or worse performance from different types of learning algorithms. This portion of the work package was slated to be done in preliminary form during the initial work period (ending in mid-2014), with the remainder of the project focused on iterating through a process of developing new algorithms to address this understanding and testing them again on generated and real-world problems.
The key insight behind this progression of tasks is that in reinforcement learning, much like in search and optimization, problem structure should have a profound effect on the performance of different learning algorithms. In particular, in multi-task learning, it is believed that the inter-task correlation structure will have one of the strongest effects on algorithm performance. The work package was developed specifically to build up ways to first generate known problem structures, then use the insights obtained from analysis of these problems to design more effective learning algorithms capable of exploiting specific types of structure. As such, the research draws heavily from two fields: problem structure/problem understanding and machine learning and more specifically, reinforcement learning. The proposed work package and dissemination activities reflect this interaction with both research communities.
As a result, the Merlin generator is currently the only tool in existence able to produce multi-task learning problems with control over the structure of the inter-task relationships. Merlin also extends the state-of-the-art in the ability to control other aspects of both single and multi-task instances, such as the type of structure found in the state transition function and the distribution of rewards.
In particular, the current iteration of the Merlin generator allows for detailed control over the following properties of the generated instances:
- size of the problem instance in terms of number of states and actions,
- the number of concurrent tasks to be learned,
- the interrelationships between each task, expressed as cross-correlations,
- the connectivity structure of the state transition graph, and
- the distribution of rewards.
In addition, Merlin supports continuous MDP generation as well as discrete generation. It does so by allowing the user to specify a class of generative models that are then used to learn a continuous functional representation of the underlying (and controlled) discrete dynamics of the system. For continuous problems, Merlin supports controlling all of the above variables governing the underlying dynamics, and adds controls for
- the dimensionality of the state space representation,
- the way in which real-valued state and action values are assigned to the nodes in the underlying graph,
- the degree of ``ruggedness'' in the state-value dynamics (e.g. as an agent moves through the state-action space, how rugged or smooth is the value of each state variable), and
- the type of continuous generative model used to learn a real-values approximation of the underlying discrete state/action and reward dynamics.
The problem generator has been made available to the research community under an open-source licence at https://github.com/deong/merlin(odnośnik otworzy się w nowym oknie). We are
continually enhancing the generator to provide a wider and more varied range of problem structures, and work is currently underway to provide more comprehensive documentation so that other researchers may build on this work and/or use the generated problem instances to test their own algorithms.
As a goal of the project was to draw from the knowledge obtained in the field of multi-objective optimization, a portion of the work has remained in that research community. One of the insights that has proven useful has been the focus on problem understanding as a lens through which to view characteristics of different learning algorithms. In the optimization community, it is well known that search algorithms interact with features of problem instances in many subtle ways, and understanding these interactions is viewed as an important step in the development of new algorithms.
In extending this work to the domain of reinforcement learning, we have developed a decomposition-based variant of the well-known XCS algorithm, and are
currently testing that algorithm using both the Merlin problems as well as existing multi-task learning environments from the literature. We plan to submit this work for publication at a later date.