Bridging the Gap to Real-World Object-Centric Learning
- URL: http://arxiv.org/abs/2209.14860v1
- Date: Thu, 29 Sep 2022 15:24:47 GMT
- Title: Bridging the Gap to Real-World Object-Centric Learning
- Authors: Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow,
Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard
Sch\"olkopf, Thomas Brox, Francesco Locatello
- Abstract summary: We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
- Score: 66.55867830853803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans naturally decompose their environment into entities at the appropriate
level of abstraction to act in the world. Allowing machine learning algorithms
to derive this decomposition in an unsupervised way has become an important
line of research. However, current methods are restricted to simulated data or
require additional information in the form of motion or depth in order to
successfully discover objects. In this work, we overcome this limitation by
showing that reconstructing features from models trained in a self-supervised
manner is a sufficient training signal for object-centric representations to
arise in a fully unsupervised way. Our approach, DINOSAUR, significantly
out-performs existing object-centric learning models on simulated data and is
the first unsupervised object-centric model that scales to real world-datasets
such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows
competitive performance compared to more involved pipelines from the computer
vision literature.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - URLOST: Unsupervised Representation Learning without Stationarity or
Topology [26.17135629579595]
We introduce a novel framework that learns from high-dimensional data lacking stationarity and topology.
Our model combines a learnable self-organizing layer, density adjusted spectral clustering, and masked autoencoders.
We evaluate its effectiveness on simulated biological vision data, neural recordings from the primary visual cortex, and gene expression datasets.
arXiv Detail & Related papers (2023-10-06T18:00:02Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - Curious Exploration via Structured World Models Yields Zero-Shot Object
Manipulation [19.840186443344]
We propose to use structured world models to incorporate inductive biases in the control loop to achieve sample-efficient exploration.
Our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time.
arXiv Detail & Related papers (2022-06-22T22:08:50Z) - Conditional Object-Centric Learning from Video [34.012087337046005]
We introduce a sequential extension to Slot Attention to predict optical flow for realistic looking synthetic scenes.
We show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation.
These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences.
arXiv Detail & Related papers (2021-11-24T16:10:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.