Deep Reinforcement Learning via Object-Centric Attention
- URL: http://arxiv.org/abs/2504.03024v1
- Date: Thu, 03 Apr 2025 20:48:27 GMT
- Title: Deep Reinforcement Learning via Object-Centric Attention
- Authors: Jannis Blüml, Cedric Derstroff, Bjarne Gregori, Elisabeth Dillies, Quentin Delfosse, Kristian Kersting,
- Abstract summary: We introduce Object-Centric Attention via Masking (OCCAM), which selectively preserves task-relevant entities while filtering out irrelevant visual information.<n>OCCAM significantly improves to novel perturbations and reduces sample complexity while showing similar or improved performance compared to conventional pixel-based RL.
- Score: 17.623937562865617
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep reinforcement learning agents, trained on raw pixel inputs, often fail to generalize beyond their training environments, relying on spurious correlations and irrelevant background details. To address this issue, object-centric agents have recently emerged. However, they require different representations tailored to the task specifications. Contrary to deep agents, no single object-centric architecture can be applied to any environment. Inspired by principles of cognitive science and Occam's Razor, we introduce Object-Centric Attention via Masking (OCCAM), which selectively preserves task-relevant entities while filtering out irrelevant visual information. Specifically, OCCAM takes advantage of the object-centric inductive bias. Empirical evaluations on Atari benchmarks demonstrate that OCCAM significantly improves robustness to novel perturbations and reduces sample complexity while showing similar or improved performance compared to conventional pixel-based RL. These results suggest that structured abstraction can enhance generalization without requiring explicit symbolic representations or domain-specific object extraction pipelines.
Related papers
- Are We Done with Object-Centric Learning? [65.67948794110212]
Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene.
With recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently.
We address the OOD generalization challenge caused by spurious background cues through the lens of OCL.
arXiv Detail & Related papers (2025-04-09T17:59:05Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Selective Visual Representations Improve Convergence and Generalization
for Embodied AI [44.33711781750707]
Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations.
This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues.
Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.
arXiv Detail & Related papers (2023-11-07T18:34:02Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments [20.034972354302788]
We extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari.
Our framework allows for object discovery, object representation learning, as well as object-centric RL.
arXiv Detail & Related papers (2023-06-14T17:28:46Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Towards Self-Supervised Learning of Global and Object-Centric
Representations [4.36572039512405]
We discuss key aspects of learning structured object-centric representations with self-supervision.
We validate our insights through several experiments on the CLEVR dataset.
arXiv Detail & Related papers (2022-03-11T15:18:47Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.