Meta-learning Framework for Reinforcement Learning Algorithms

Matthew N. Henry

Latest reinforcement learning algorithms perform employing a rule established in accordance to which the agent’s parameters are staying continually up to date by means of observation of the present environmental point out. A single of achievable methods to raise the effectiveness of these algorithms could use automated discovery of update policies from available data, though also adapting algorithms to distinct environmental ailments. This path of exploration nonetheless poses a large amount of challenges.

In a recent paper published on arXiv.org, authors suggest generation of steel-learning system which could learn an complete update rule, such as prediction targets (or price features) and methods to master from it by interacting with a established of environments. In their experiment, researchers use a established of 3 diverse meta-schooling environments to attempt to meta-master a total reinforcement learning update rule, demonstrating the feasibility of these kinds of method and its likely to automate and velocity up the discovery of new equipment learning algorithms.

This paper built the 1st attempt to meta-master a total RL update rule by jointly getting both equally ‘what to predict’ and ‘how to bootstrap’, changing current RL principles these kinds of as price perform and TD-learning. The results from a tiny established of toy environments showed that the identified LPG maintains prosperous info in the prediction, which was very important for economical bootstrapping. We think this is just the beginning of the completely data-driven discovery of RL algorithms there are several promising instructions to extend our perform, from procedural era of environments, to new advanced architectures and substitute methods to crank out experience. The radical generalisation from the toy domains to Atari online games reveals that it might be feasible to learn an economical RL algorithm from interactions with environments, which would perhaps lead to totally new approaches to RL.

Url to the exploration write-up: https://arxiv.org/pdf/2007.08794.pdf