ManipulaTHOR: A Framework for Visual Object Manipulation
- URL: http://arxiv.org/abs/2104.11213v1
- Date: Thu, 22 Apr 2021 17:49:04 GMT
- Title: ManipulaTHOR: A Framework for Visual Object Manipulation
- Authors: Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs,
Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi
- Abstract summary: We propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework.
This task extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.
- Score: 27.17908758246059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The domain of Embodied AI has recently witnessed substantial progress,
particularly in navigating agents within their environments. These early
successes have laid the building blocks for the community to tackle tasks that
require agents to actively interact with objects in their environment. Object
manipulation is an established research domain within the robotics community
and poses several challenges including manipulator motion, grasping and
long-horizon planning, particularly when dealing with oft-overlooked practical
setups involving visually rich and complex scenes, manipulation using mobile
agents (as opposed to tabletop manipulation), and generalization to unseen
environments and objects. We propose a framework for object manipulation built
upon the physics-enabled, visually rich AI2-THOR framework and present a new
challenge to the Embodied AI community known as ArmPointNav. This task extends
the popular point navigation task to object manipulation and offers new
challenges including 3D obstacle avoidance, manipulating objects in the
presence of occlusion, and multi-object manipulation that necessitates long
term planning. Popular learning paradigms that are successful on PointNav
challenges show promise, but leave a large room for improvement.
Related papers
- Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - UniTeam: Open Vocabulary Mobile Manipulation Challenge [4.523096223190858]
This report introduces our UniTeam agent - an improved baseline for the "HomeRobot: Open Vocabulary Mobile Manipulation" challenge.
The challenge poses problems of navigation in unfamiliar environments, manipulation of novel objects, and recognition of open-vocabulary object classes.
This challenge aims to facilitate cross-cutting research in embodied AI using recent advances in machine learning, computer vision, natural language, and robotics.
arXiv Detail & Related papers (2023-12-14T02:24:29Z) - Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile
Manipulation [16.79185733369416]
We propose a two-stage architecture for autonomous interaction with large articulated objects in unknown environments.
The first stage uses a learned model to estimate the articulated model of a target object from an RGB-D input and predicts an action-conditional sequence of states for interaction.
The second stage comprises of a whole-body motion controller to manipulate the object along the generated kinematic plan.
arXiv Detail & Related papers (2021-03-18T21:32:18Z) - A Long Horizon Planning Framework for Manipulating Rigid Pointcloud
Objects [25.428781562909606]
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects.
Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics.
arXiv Detail & Related papers (2020-11-16T18:59:33Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.