Related papers: OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

URL: http://arxiv.org/abs/2306.08649v2
Date: Tue, 27 Feb 2024 17:34:43 GMT
Title: OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments
Authors: Quentin Delfosse, Jannis Bl\"uml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting
Abstract summary: We extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari. Our framework allows for object discovery, object representation learning, as well as object-centric RL.
Score: 20.034972354302788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches only rely on pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. In our work, we extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari, that performs resource-efficient extractions of the object-centric states for these games. Our framework allows for object discovery, object representation learning, as well as object-centric RL. We evaluate OCAtari's detection capabilities and resource efficiency. Our source code is available at github.com/k4ntz/OC_Atari.

Related papers

Are We Done with Object-Centric Learning? [65.67948794110212]
Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. With recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. We address the OOD generalization challenge caused by spurious background cues through the lens of OCL.
arXiv Detail & Related papers (2025-04-09T17:59:05Z)
Deep Reinforcement Learning via Object-Centric Attention [17.623937562865617]
We introduce Object-Centric Attention via Masking (OCCAM), which selectively preserves task-relevant entities while filtering out irrelevant visual information. OCCAM significantly improves to novel perturbations and reduces sample complexity while showing similar or improved performance compared to conventional pixel-based RL.
arXiv Detail & Related papers (2025-04-03T20:48:27Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning [41.09407455527254]
We propose a versatile real-world dataset of tabletop scenes for object-centric learning called OCTScenes. OCTScenes contains 5000 tabletop scenes with a total of 15 objects. It is meticulously designed to serve as a benchmark for comparing, evaluating, and analyzing object-centric learning methods.
arXiv Detail & Related papers (2023-06-16T08:26:57Z)
Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot. By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance. Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene Recognition [19.503027767462605]
We propose an Object-to-Scene (OTS) method, which extracts object features and learns object relations to recognize indoor scenes. OTS outperforms the state-of-the-art methods by more than 2% on indoor scene recognition without using any additional streams.
arXiv Detail & Related papers (2021-08-01T08:37:08Z)
Relevance-Guided Modeling of Object Dynamics for Reinforcement Learning [0.0951828574518325]
Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment. We propose a framework for reasoning about object dynamics and behavior to rapidly determine minimal and task-specific object representations. We also highlight the potential of this framework on several Atari games, using our object representation and standard RL and planning algorithms to learn dramatically faster than existing deep RL algorithms.
arXiv Detail & Related papers (2020-03-03T08:18:49Z)
Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image [4.970364068620608]
Actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects. We employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation.
arXiv Detail & Related papers (2020-02-27T03:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.