Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals
- URL: http://arxiv.org/abs/2302.04449v4
- Date: Sat, 20 Jul 2024 20:16:44 GMT
- Title: Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals
- Authors: Yue Wu, Yewen Fan, Paul Pu Liang, Amos Azaria, Yuanzhi Li, Tom M. Mitchell,
- Abstract summary: Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers.
Various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.
- Score: 69.76245723797368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. An auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. Experimentally, various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.
Related papers
- KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - KIWI: A Dataset of Knowledge-Intensive Writing Instructions for
Answering Research Questions [63.307317584926146]
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents.
In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer.
We construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain.
arXiv Detail & Related papers (2024-03-06T17:16:44Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Learning to Shape Rewards using a Game of Switching Controls [21.456451774045465]
We introduce an automated RS framework in which the shaping-reward function is constructed in a novel game between two agents.
We prove theoretically that our framework, which easily adopts existing RL algorithms, learns to construct a shaping-reward function that is tailored to the task.
We demonstrate the superior performance of our method against state-of-the-art RS algorithms in Cartpole and the challenging console games Gravitar, Solaris and Super Mario.
arXiv Detail & Related papers (2021-03-16T15:56:57Z) - Balancing Reinforcement Learning Training Experiences in Interactive
Information Retrieval [19.723551683930776]
Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacting.
To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents.
Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training.
arXiv Detail & Related papers (2020-06-05T00:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.