LiFT: Unsupervised Reinforcement Learning with Foundation Models as
Teachers
- URL: http://arxiv.org/abs/2312.08958v1
- Date: Thu, 14 Dec 2023 14:07:41 GMT
- Title: LiFT: Unsupervised Reinforcement Learning with Foundation Models as
Teachers
- Authors: Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim,
Karl Pertsch
- Abstract summary: We propose a framework that guides a reinforcement learning agent to acquire semantically meaningful behavior without human feedback.
In our framework, the agent receives task instructions grounded in a training environment from large language models.
We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment.
- Score: 59.69716962256727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a framework that leverages foundation models as teachers, guiding
a reinforcement learning agent to acquire semantically meaningful behavior
without human feedback. In our framework, the agent receives task instructions
grounded in a training environment from large language models. Then, a
vision-language model guides the agent in learning the multi-task
language-conditioned policy by providing reward feedback. We demonstrate that
our method can learn semantically meaningful skills in a challenging open-ended
MineDojo environment while prior unsupervised skill discovery methods struggle.
Additionally, we discuss observed challenges of using off-the-shelf foundation
models as teachers and our efforts to address them.
Related papers
- The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.
We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.
Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z) - CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives [2.4392539322920763]
Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems.
We propose CAREL as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature.
Results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems.
arXiv Detail & Related papers (2024-11-29T15:49:06Z) - Revealing the Inherent Instructability of Pre-Trained Language Models [9.504992236994697]
We show that Response Tuning (RT) removes the instruction and its corresponding mapping to the response from instruction tuning.
Our experiments demonstrate that RT, trained only on responses, can effectively respond to a wide range of instructions and exhibit helpfulness approaching that of their instruction-tuned counterparts.
arXiv Detail & Related papers (2024-10-03T13:15:19Z) - Solving Dialogue Grounding Embodied Task in a Simulated Environment
using Further Masked Language Modeling [0.0]
Our proposed method employs language modeling to enhance task understanding through state-of-the-art (SOTA) methods using language models.
Our experimental results provide compelling evidence of the superiority of our proposed method.
arXiv Detail & Related papers (2023-06-21T17:17:09Z) - Overcoming Referential Ambiguity in Language-Guided Goal-Conditioned
Reinforcement Learning [8.715518445626826]
The learner can misunderstand the teacher's intentions if the instruction ambiguously refer to features of the object.
We study how two concepts derived from cognitive sciences can help resolve those referential ambiguities.
We apply those ideas to a teacher/learner setup with two artificial agents on a simulated robotic task.
arXiv Detail & Related papers (2022-09-26T15:07:59Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.