Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement
Learning
- URL: http://arxiv.org/abs/2306.08400v1
- Date: Wed, 14 Jun 2023 09:48:48 GMT
- Title: Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement
Learning
- Authors: Evan Zheran Liu, Sahaana Suri, Tong Mu, Allan Zhou, Chelsea Finn
- Abstract summary: We ask: can embodied reinforcement learning (RL) agents indirectly learn language from non-language tasks?
We design an office navigation environment, where the agent's goal is to find a particular office, and office locations differ in different buildings (i.e., tasks)
We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases.
- Score: 56.07190845063208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Whereas machine learning models typically learn language by directly training
on language tasks (e.g., next-word prediction), language emerges in human
children as a byproduct of solving non-language tasks (e.g., acquiring food).
Motivated by this observation, we ask: can embodied reinforcement learning (RL)
agents also indirectly learn language from non-language tasks? Learning to
associate language with its meaning requires a dynamic environment with varied
language. Therefore, we investigate this question in a multi-task environment
with language that varies across the different tasks. Specifically, we design
an office navigation environment, where the agent's goal is to find a
particular office, and office locations differ in different buildings (i.e.,
tasks). Each building includes a floor plan with a simple language description
of the goal office's location, which can be visually read as an RGB image when
visited. We find RL agents indeed are able to indirectly learn language. Agents
trained with current meta-RL algorithms successfully generalize to reading
floor plans with held-out layouts and language phrases, and quickly navigate to
the correct office, despite receiving no direct language supervision.
Related papers
- Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use [16.425032085699698]
It is desirable for embodied agents to have the ability to leverage human language to gain explicit or implicit knowledge for learning tasks.
It's not clear how to incorporate rich language use to facilitate task learning.
This paper studies different types of language inputs in facilitating reinforcement learning.
arXiv Detail & Related papers (2024-10-31T17:59:52Z) - LangNav: Language as a Perceptual Representation for Navigation [63.90602960822604]
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN)
Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions.
arXiv Detail & Related papers (2023-10-11T20:52:30Z) - Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world.
Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future.
We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z) - Accessible Instruction-Following Agent [0.0]
We introduce UVLN, a novel machine-translation instructional augmented framework for cross-lingual vision-language navigation.
We extend the standard VLN training objectives to a multilingual setting via a cross-lingual language encoder.
Experiments over Room Across Room dataset prove the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-08T23:57:26Z) - Inner Monologue: Embodied Reasoning through Planning with Language
Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing.
LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them.
We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z) - CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions.
We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations.
Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z) - CALVIN: A Benchmark for Language-conditioned Policy Learning for
Long-horizon Robot Manipulation Tasks [30.936692970187416]
General-purpose robots must learn to relate human language to their perceptions and actions.
We present CALVIN, an open-source simulated benchmark to learn long-horizon language-conditioned tasks.
arXiv Detail & Related papers (2021-12-06T18:37:33Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - Language-Conditioned Goal Generation: a New Approach to Language
Grounding for RL [23.327749767424567]
In the real world, linguistic agents are also embodied agents: they perceive and act in the physical world.
This paper proposes using language to condition goal generators. Given any goal-conditioned policy, one could train a language-conditioned goal generator to generate language-agnostic goals for the agent.
arXiv Detail & Related papers (2020-06-12T09:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.