Foundation Models for Semantic Novelty in Reinforcement Learning
- URL: http://arxiv.org/abs/2211.04878v1
- Date: Wed, 9 Nov 2022 13:34:45 GMT
- Title: Foundation Models for Semantic Novelty in Reinforcement Learning
- Authors: Tarun Gupta, Peter Karkus, Tong Che, Danfei Xu, Marco Pavone
- Abstract summary: Our intrinsic reward is defined based on pre-trained CLIP embeddings without any fine-tuning or learning on the target RL task.
We demonstrate that CLIP-based intrinsic rewards can drive exploration towards semantically meaningful states and outperform state-of-the-art methods in challenging sparse-reward procedurally-generated environments.
- Score: 32.707788771181676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effectively exploring the environment is a key challenge in reinforcement
learning (RL). We address this challenge by defining a novel intrinsic reward
based on a foundation model, such as contrastive language image pretraining
(CLIP), which can encode a wealth of domain-independent semantic
visual-language knowledge about the world. Specifically, our intrinsic reward
is defined based on pre-trained CLIP embeddings without any fine-tuning or
learning on the target RL task. We demonstrate that CLIP-based intrinsic
rewards can drive exploration towards semantically meaningful states and
outperform state-of-the-art methods in challenging sparse-reward
procedurally-generated environments.
Related papers
- Exploring the Precise Dynamics of Single-Layer GAN Models: Leveraging Multi-Feature Discriminators for High-Dimensional Subspace Learning [0.0]
We study the training dynamics of a single-layer GAN model from the perspective of subspace learning.
By bridging our analysis to the realm of subspace learning, we systematically compare the efficacy of GAN-based methods against conventional approaches.
arXiv Detail & Related papers (2024-11-01T10:21:12Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Text-Aware Diffusion for Policy Learning [8.32790576855495]
We propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning.
We show that TADPoLe is able to learn policies for novel goal-achievement and continuous locomotion behaviors specified by natural language, in both Humanoid and Dog environments.
arXiv Detail & Related papers (2024-07-02T03:08:20Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - A Bayesian Unification of Self-Supervised Clustering and Energy-Based
Models [11.007541337967027]
We perform a Bayesian analysis of state-of-the-art self-supervised learning objectives.
We show that our objective function allows to outperform existing self-supervised learning strategies.
We also demonstrate that GEDI can be integrated into a neuro-symbolic framework.
arXiv Detail & Related papers (2023-12-30T04:46:16Z) - Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - SASRA: Semantically-aware Spatio-temporal Reasoning Agent for
Vision-and-Language Navigation in Continuous Environments [7.5606260987453116]
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments.
Existing end-to-end learning-based methods struggle at this task as they focus mostly on raw visual observations.
We present a hybrid transformer-recurrence model which focuses on combining classical semantic mapping techniques with a learning-based method.
arXiv Detail & Related papers (2021-08-26T17:57:02Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z) - Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video [128.08590291947544]
Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.
Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning framework.
arXiv Detail & Related papers (2020-01-18T15:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.