Emergence of In-Context Reinforcement Learning from Noise Distillation
- URL: http://arxiv.org/abs/2312.12275v3
- Date: Wed, 12 Jun 2024 11:30:19 GMT
- Title: Emergence of In-Context Reinforcement Learning from Noise Distillation
- Authors: Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov,
- Abstract summary: We propose a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum.
We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories.
We experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.
- Score: 46.29510499540939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD$^\varepsilon$, a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum. We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.
Related papers
- N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs [39.84233556353338]
In-context learning allows models like transformers to adapt to new tasks without updating their weights.
In this work, we integrated the n-gram induction heads into transformers for in-context RL.
Our approach matches, and in some cases surpasses, the performance of Algorithm Distillation (AD) in both grid-world and pixel-based environments.
arXiv Detail & Related papers (2024-11-04T10:31:03Z) - Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training [39.21885486667879]
Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes.
Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges.
We propose a novel RAG approach known as Retrieval-augmented Adaptive Adrial Training (RAAT)
arXiv Detail & Related papers (2024-05-31T16:24:53Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - In-context Reinforcement Learning with Algorithm Distillation [35.72003039559626]
We propose a method for distilling reinforcement learning algorithms into neural networks by modeling their training histories with a causal sequence model.
A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context.
We demonstrate that AD can learn in-context in a variety of environments with sparse rewards, task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
arXiv Detail & Related papers (2022-10-25T17:57:49Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.