Learning Affordance Landscapes for Interaction Exploration in 3D
Environments
- URL: http://arxiv.org/abs/2008.09241v2
- Date: Mon, 19 Oct 2020 02:44:08 GMT
- Title: Learning Affordance Landscapes for Interaction Exploration in 3D
Environments
- Authors: Tushar Nagarajan and Kristen Grauman
- Abstract summary: Embodied agents must be able to master how their environment works.
We introduce a reinforcement learning approach for exploration for interaction.
We demonstrate our idea with AI2-iTHOR.
- Score: 101.90004767771897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied agents operating in human spaces must be able to master how their
environment works: what objects can the agent use, and how can it use them? We
introduce a reinforcement learning approach for exploration for interaction,
whereby an embodied agent autonomously discovers the affordance landscape of a
new unmapped 3D environment (such as an unfamiliar kitchen). Given an
egocentric RGB-D camera and a high-level action space, the agent is rewarded
for maximizing successful interactions while simultaneously training an
image-based affordance segmentation model. The former yields a policy for
acting efficiently in new environments to prepare for downstream interaction
tasks, while the latter yields a convolutional neural network that maps image
regions to the likelihood they permit each action, densifying the rewards for
exploration. We demonstrate our idea with AI2-iTHOR. The results show agents
can learn how to use new home environments intelligently and that it prepares
them to rapidly address various downstream tasks like "find a knife and put it
in the drawer." Project page:
http://vision.cs.utexas.edu/projects/interaction-exploration/
Related papers
- Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - Creating Multimodal Interactive Agents with Imitation and
Self-Supervised Learning [20.02604302565522]
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language.
Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time.
arXiv Detail & Related papers (2021-12-07T15:17:27Z) - Indoor Semantic Scene Understanding using Multi-modality Fusion [0.0]
We present a semantic scene understanding pipeline that fuses 2D and 3D detection branches to generate a semantic map of the environment.
Unlike previous works that were evaluated on collected datasets, we test our pipeline on an active photo-realistic robotic environment.
Our novelty includes rectification of 3D proposals using projected 2D detections and modality fusion based on object size.
arXiv Detail & Related papers (2021-08-17T13:30:02Z) - Out of the Box: Embodied Navigation in the Real World [45.97756658635314]
We show how to transfer knowledge acquired in simulation into the real world.
We deploy our models on a LoCoBot equipped with a single Intel RealSense camera.
Our experiments indicate that it is possible to achieve satisfying results when deploying the obtained model in the real world.
arXiv Detail & Related papers (2021-05-12T18:00:14Z) - Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z) - Learning Object Placements For Relational Instructions by Hallucinating
Scene Representations [26.897316325189205]
We present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image.
Our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects.
Results obtained using real-world data and human-robot experiments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-01-23T12:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.