Learning to Act with Affordance-Aware Multimodal Neural SLAM
- URL: http://arxiv.org/abs/2201.09862v1
- Date: Mon, 24 Jan 2022 18:39:10 GMT
- Title: Learning to Act with Affordance-Aware Multimodal Neural SLAM
- Authors: Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai,
Gaurav Sukhatme
- Abstract summary: We propose a Neural SLAM approach that utilizes several modalities for exploration, predicts an affordance-aware semantic map, and plans over it at the same time.
We obtain more than $40%$ improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of $23.48%$ on the test unseen scenes.
- Score: 11.695095006311176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed an emerging paradigm shift toward embodied
artificial intelligence, in which an agent must learn to solve challenging
tasks by interacting with its environment. There are several challenges in
solving embodied multimodal tasks, including long-horizon planning,
vision-and-language grounding, and efficient exploration. We focus on a
critical bottleneck, namely the performance of planning and navigation. To
tackle this challenge, we propose a Neural SLAM approach that, for the first
time, utilizes several modalities for exploration, predicts an affordance-aware
semantic map, and plans over it at the same time. This significantly improves
exploration efficiency, leads to robust long-horizon planning, and enables
effective vision-and-language grounding. With the proposed Affordance-aware
Multimodal Neural SLAM (AMSLAM) approach, we obtain more than $40\%$
improvement over prior published work on the ALFRED benchmark and set a new
state-of-the-art generalization performance at a success rate of $23.48\%$ on
the test unseen scenes.
Related papers
- World Models with Hints of Large Language Models for Goal Achieving [56.91610333715712]
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals.
Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (M).DLL.M integrates the proposed hinting subgoals into the model rollouts to encourage goal discovery and reaching in challenging tasks.
arXiv Detail & Related papers (2024-06-11T15:49:08Z) - Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models [0.0]
We formalize a planning-based approach to perform multi-step problem solving with language models.
We demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches.
arXiv Detail & Related papers (2024-04-29T18:51:17Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Diffused Task-Agnostic Milestone Planner [13.042155799536657]
We propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space.
The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control.
arXiv Detail & Related papers (2023-12-06T10:09:22Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z) - Learning Context-aware Task Reasoning for Efficient Meta-reinforcement
Learning [29.125234093368732]
We propose a novel meta-RL strategy to achieve human-level efficiency in learning novel tasks.
We decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment.
Our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.
arXiv Detail & Related papers (2020-03-03T07:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.