Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment
- URL: http://arxiv.org/abs/2010.15195v2
- Date: Thu, 20 May 2021 18:23:57 GMT
- Title: Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment
- Authors: Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee,
Richard L. Lewis, Satinder Singh
- Abstract summary: First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents.
We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
- Score: 73.9469267445146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: First-person object-interaction tasks in high-fidelity, 3D, simulated
environments such as the AI2Thor virtual home-environment pose significant
sample-efficiency challenges for reinforcement learning (RL) agents learning
from sparse task rewards. To alleviate these challenges, prior work has
provided extensive supervision via a combination of reward-shaping,
ground-truth object-information, and expert demonstrations. In this work, we
show that one can learn object-interaction tasks from scratch without
supervision by learning an attentive object-model as an auxiliary task during
task learning with an object-centric relational RL agent. Our key insight is
that learning an object-model that incorporates object-attention into forward
prediction provides a dense learning signal for unsupervised representation
learning of both objects and their relationships. This, in turn, enables faster
policy learning for an object-centric relational RL agent. We demonstrate our
agent by introducing a set of challenging object-interaction tasks in the
AI2Thor environment where learning with our attentive object-model is key to
strong performance. Specifically, we compare our agent and relational RL agents
with alternative auxiliary tasks to a relational RL agent equipped with
ground-truth object-information, and show that learning with our object-model
best closes the performance gap in terms of both learning speed and maximum
success rate. Additionally, we find that incorporating object-attention into an
object-model's forward predictions is key to learning representations which
capture object-category and object-state.
Related papers
- Visual Grounding for Object-Level Generalization in Reinforcement Learning [35.39214541324909]
Generalization is a pivotal challenge for agents following natural language instructions.
We leverage a vision-language model (VLM) for visual grounding and transfer its vision-language knowledge into reinforcement learning.
We show that our intrinsic reward significantly improves performance on challenging skill learning.
arXiv Detail & Related papers (2024-08-04T06:34:24Z) - TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection [23.73648235283315]
Task-oriented object detection aims to find objects suitable for accomplishing specific tasks.
Recent solutions are mainly all-in-one models.
We propose TaskCLIP, a more natural two-stage design composed of general object detection and task-guided object selection.
arXiv Detail & Related papers (2024-03-12T22:33:02Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - Visuomotor Mechanical Search: Learning to Retrieve Target Objects in
Clutter [43.668395529368354]
We present a novel Deep RL procedure that combines teacher-aided exploration, ii) a critic with privileged information, andiii) mid-level representations.
Our approach trains faster and converges to more efficient uncovering solutions than baselines and ablations, and that our uncovering policies lead to an average improvement in the graspability of the target object.
arXiv Detail & Related papers (2020-08-13T18:23:00Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z) - Relevance-Guided Modeling of Object Dynamics for Reinforcement Learning [0.0951828574518325]
Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment.
We propose a framework for reasoning about object dynamics and behavior to rapidly determine minimal and task-specific object representations.
We also highlight the potential of this framework on several Atari games, using our object representation and standard RL and planning algorithms to learn dramatically faster than existing deep RL algorithms.
arXiv Detail & Related papers (2020-03-03T08:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.