Related papers: Relevance-Guided Modeling of Object Dynamics for Reinforcement Learning

Relevance-Guided Modeling of Object Dynamics for Reinforcement Learning

URL: http://arxiv.org/abs/2003.01384v3
Date: Thu, 3 Jun 2021 19:38:32 GMT
Title: Relevance-Guided Modeling of Object Dynamics for Reinforcement Learning
Authors: William Agnew and Pedro Domingos
Abstract summary: Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment. We propose a framework for reasoning about object dynamics and behavior to rapidly determine minimal and task-specific object representations. We also highlight the potential of this framework on several Atari games, using our object representation and standard RL and planning algorithms to learn dramatically faster than existing deep RL algorithms.
Score: 0.0951828574518325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment, limiting computational and sample efficiency. \textit{Objects} provide a succinct and causal description of the world, and many recent works have proposed unsupervised object representation learning using priors and losses over static object properties like visual consistency. However, object dynamics and interactions are also critical cues for objectness. In this paper we propose a framework for reasoning about object dynamics and behavior to rapidly determine minimal and task-specific object representations. To demonstrate the need to reason over object behavior and dynamics, we introduce a suite of RGBD MuJoCo object collection and avoidance tasks that, while intuitive and visually simple, confound state-of-the-art unsupervised object representation learning algorithms. We also highlight the potential of this framework on several Atari games, using our object representation and standard RL and planning algorithms to learn dramatically faster than existing deep RL algorithms.

Related papers

Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations [9.949149600332836]
Self-supervised vision models and slot-based representations excel at identifying edge-derived geometry but fail to preserve non-geometric surface-level cues.<n>We show that learning an auxiliary latent space over segmented patches, where VAE regularisation enforces compact, disentangled object-centric representations, recovers these missing attributes.
arXiv Detail & Related papers (2025-03-12T21:57:41Z)
Efficient Object-centric Representation Learning with Pre-trained Geometric Prior [1.9685736810241874]
We propose a weakly-supervised framework that emphasises geometric understanding and leverages pre-trained vision models to enhance object discovery. Our method introduces an efficient slot decoder specifically designed for object-centric learning, enabling effective representation of multi-object scenes without requiring explicit depth information.
arXiv Detail & Related papers (2024-12-16T20:01:35Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Top-Down Guidance for Learning Object-Centric Representations [30.06924788022504]
Top-Down Guided Network (TDGNet) is a top-down pathway to improve object-centric representations.<n>We show that TDGNet outperforms current object-centric models on multiple datasets of varying complexity.
arXiv Detail & Related papers (2024-05-17T07:48:27Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot. By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance. Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z)
Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder. We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets. We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Object Pursuit: Building a Space of Objects via Discriminative Weight Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding. We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations. We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z)
Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment [73.9469267445146]
First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents. We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
arXiv Detail & Related papers (2020-10-28T19:27:26Z)
Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter [43.668395529368354]
We present a novel Deep RL procedure that combines teacher-aided exploration, ii) a critic with privileged information, andiii) mid-level representations. Our approach trains faster and converges to more efficient uncovering solutions than baselines and ablations, and that our uncovering policies lead to an average improvement in the graspability of the target object.
arXiv Detail & Related papers (2020-08-13T18:23:00Z)
Look-into-Object: Self-supervised Structure Modeling for Object Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions. We show the recognition backbone can be substantially enhanced for more robust representation learning. Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.