Related papers: LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

URL: http://arxiv.org/abs/2312.08958v1
Date: Thu, 14 Dec 2023 14:07:41 GMT
Title: LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
Authors: Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim, Karl Pertsch
Abstract summary: We propose a framework that guides a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment.
Score: 59.69716962256727
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by providing reward feedback. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment while prior unsupervised skill discovery methods struggle. Additionally, we discuss observed challenges of using off-the-shelf foundation models as teachers and our efforts to address them.

Related papers

Acquiring Grounded Representations of Words with Situated Interactive Instruction [4.049850026698638]
We present an approach for acquiring grounded representations of words from mixed-initiative, situated interactions with a human instructor. The work focuses on the acquisition of diverse types of knowledge including perceptual, semantic, and procedural knowledge along with learning grounded meanings.
arXiv Detail & Related papers (2025-02-28T06:04:52Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
Revealing the Inherent Instructability of Pre-Trained Language Models [9.504992236994697]
We show that Response Tuning (RT) removes the instruction and its corresponding mapping to the response from instruction tuning. Our experiments demonstrate that RT, trained only on responses, can effectively respond to a wide range of instructions and exhibit helpfulness approaching that of their instruction-tuned counterparts.
arXiv Detail & Related papers (2024-10-03T13:15:19Z)
Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling [0.0]
Our proposed method employs language modeling to enhance task understanding through state-of-the-art (SOTA) methods using language models. Our experimental results provide compelling evidence of the superiority of our proposed method.
arXiv Detail & Related papers (2023-06-21T17:17:09Z)
Overcoming Referential Ambiguity in Language-Guided Goal-Conditioned Reinforcement Learning [8.715518445626826]
The learner can misunderstand the teacher's intentions if the instruction ambiguously refer to features of the object. We study how two concepts derived from cognitive sciences can help resolve those referential ambiguities. We apply those ideas to a teacher/learner setup with two artificial agents on a simulated robotic task.
arXiv Detail & Related papers (2022-09-26T15:07:59Z)
Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher. We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning [32.82030512053361]
We propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories. We find that human demonstrations help solve the most complex tasks. We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2020-11-01T14:39:46Z)
Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)
Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z)
Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning. In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.