Semantic Exploration from Language Abstractions and Pretrained
Representations
- URL: http://arxiv.org/abs/2204.05080v3
- Date: Wed, 26 Apr 2023 22:21:46 GMT
- Title: Semantic Exploration from Language Abstractions and Pretrained
Representations
- Authors: Allison C. Tam, Neil C. Rabinowitz, Andrew K. Lampinen, Nicholas A.
Roy, Stephanie C. Y. Chan, DJ Strouse, Jane X. Wang, Andrea Banino, Felix
Hill
- Abstract summary: Effective exploration is a challenge in reinforcement learning (RL)
We define novelty using semantically meaningful state abstractions.
We evaluate vision-language representations, pretrained on natural image captioning datasets.
- Score: 23.02024937564099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective exploration is a challenge in reinforcement learning (RL).
Novelty-based exploration methods can suffer in high-dimensional state spaces,
such as continuous partially-observable 3D environments. We address this
challenge by defining novelty using semantically meaningful state abstractions,
which can be found in learned representations shaped by natural language. In
particular, we evaluate vision-language representations, pretrained on natural
image captioning datasets. We show that these pretrained representations drive
meaningful, task-relevant exploration and improve performance on 3D simulated
environments. We also characterize why and how language provides useful
abstractions for exploration by considering the impacts of using
representations from a pretrained model, a language oracle, and several
ablations. We demonstrate the benefits of our approach in two very different
task domains -- one that stresses the identification and manipulation of
everyday objects, and one that requires navigational exploration in an
expansive world. Our results suggest that using language-shaped representations
could improve exploration for various algorithms and agents in challenging
environments.
Related papers
- Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM [8.46789360111679]
Humans excel at forming mental maps of their surroundings.
Having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks.
We extend this instance-level approach to 3D while increasing the pipeline's robustness.
arXiv Detail & Related papers (2024-04-27T14:20:46Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - LanGWM: Language Grounded World Model [24.86620763902546]
We focus on learning language-grounded visual features to enhance the world model learning.
Our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction.
arXiv Detail & Related papers (2023-11-29T12:41:55Z) - AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph [62.685920585838616]
abstraction ability is essential in human intelligence, which remains under-explored in language models.
We present AbsPyramid, a unified entailment graph of 221K textual descriptions of abstraction knowledge.
arXiv Detail & Related papers (2023-11-15T18:11:23Z) - Representations and Exploration for Deep Reinforcement Learning using
Singular Value Decomposition [29.237357850947433]
We provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain.
We show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free.
With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments, but it can also be effective at hard exploration tasks in DM-Hard-8 environments.
arXiv Detail & Related papers (2023-05-01T04:26:03Z) - ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous
States in Realistic 3D Scenes [72.83187997344406]
ARNOLD is a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes.
ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals.
arXiv Detail & Related papers (2023-04-09T21:42:57Z) - Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z) - CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions.
We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations.
Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z) - A Visuospatial Dataset for Naturalistic Verb Learning [18.654373173232205]
We introduce a new dataset for training and evaluating grounded language models.
Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access.
We use the collected data to compare several distributional semantics models for verb learning.
arXiv Detail & Related papers (2020-10-28T20:47:13Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.