OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for
Object-Centric Learning
- URL: http://arxiv.org/abs/2306.09682v3
- Date: Wed, 6 Sep 2023 06:53:43 GMT
- Title: OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for
Object-Centric Learning
- Authors: Yinxuan Huang, Tonglin Chen, Zhimeng Shen, Jinghao Huang, Bin Li,
Xiangyang Xue
- Abstract summary: We propose a versatile real-world dataset of tabletop scenes for object-centric learning called OCTScenes.
OCTScenes contains 5000 tabletop scenes with a total of 15 objects.
It is meticulously designed to serve as a benchmark for comparing, evaluating, and analyzing object-centric learning methods.
- Score: 41.09407455527254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans possess the cognitive ability to comprehend scenes in a compositional
manner. To empower AI systems with similar capabilities, object-centric
learning aims to acquire representations of individual objects from visual
scenes without any supervision. Although recent advances in object-centric
learning have made remarkable progress on complex synthesis datasets, there is
a huge challenge for application to complex real-world scenes. One of the
essential reasons is the scarcity of real-world datasets specifically tailored
to object-centric learning. To address this problem, we propose a versatile
real-world dataset of tabletop scenes for object-centric learning called
OCTScenes, which is meticulously designed to serve as a benchmark for
comparing, evaluating, and analyzing object-centric learning methods. OCTScenes
contains 5000 tabletop scenes with a total of 15 objects. Each scene is
captured in 60 frames covering a 360-degree perspective. Consequently,
OCTScenes is a versatile benchmark dataset that can simultaneously satisfy the
evaluation of object-centric learning methods based on single-image, video, and
multi-view. Extensive experiments of representative object-centric learning
methods are conducted on OCTScenes. The results demonstrate the shortcomings of
state-of-the-art methods for learning meaningful representations from
real-world data, despite their impressive performance on complex synthesis
datasets. Furthermore, OCTScenes can serve as a catalyst for the advancement of
existing methods, inspiring them to adapt to real-world scenes. Dataset and
code are available at https://huggingface.co/datasets/Yinxuan/OCTScenes.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments [20.034972354302788]
We extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari.
Our framework allows for object discovery, object representation learning, as well as object-centric RL.
arXiv Detail & Related papers (2023-06-14T17:28:46Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.