Robust and Controllable Object-Centric Learning through Energy-based
Models
- URL: http://arxiv.org/abs/2210.05519v1
- Date: Tue, 11 Oct 2022 15:11:15 GMT
- Title: Robust and Controllable Object-Centric Learning through Energy-based
Models
- Authors: Ruixiang Zhang, Tong Che, Boris Ivanovic, Renhao Wang, Marco Pavone,
Yoshua Bengio, Liam Paull
- Abstract summary: ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
- Score: 95.68748828339059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans are remarkably good at understanding and reasoning about complex
visual scenes. The capability to decompose low-level observations into discrete
objects allows us to build a grounded abstract representation and identify the
compositional structure of the world. Accordingly, it is a crucial step for
machine learning models to be capable of inferring objects and their properties
from visual scenes without explicit supervision. However, existing works on
object-centric representation learning either rely on tailor-made neural
network modules or strong probabilistic assumptions in the underlying
generative and inference processes. In this work, we present \ours, a
conceptually simple and general approach to learning object-centric
representations through an energy-based model. By forming a
permutation-invariant energy function using vanilla attention blocks readily
available in Transformers, we can infer object-centric latent variables via
gradient-based MCMC methods where permutation equivariance is automatically
guaranteed. We show that \ours can be easily integrated into existing
architectures and can effectively extract high-quality object-centric
representations, leading to better segmentation accuracy and competitive
downstream task performance. Further, empirical evaluations show that \ours's
learned representations are robust against distribution shift. Finally, we
demonstrate the effectiveness of \ours in systematic compositional
generalization, by re-composing learned energy functions for novel scene
generation and manipulation.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - Rotating Features for Object Discovery [74.1465486264609]
We present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations.
Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data.
arXiv Detail & Related papers (2023-06-01T12:16:26Z) - Provably Learning Object-Centric Representations [25.152680199034215]
We analyze when object-centric representations can provably be learned without supervision.
We prove that the ground-truth object representations can be identified by an invertible and compositional inference model.
We provide evidence that our theory holds predictive power for existing object-centric models.
arXiv Detail & Related papers (2023-05-23T16:44:49Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - Plug and Play, Model-Based Reinforcement Learning [60.813074750879615]
We introduce an object-based representation that allows zero-shot integration of new objects from known object classes.
This is achieved by representing the global transition dynamics as a union of local transition functions.
Experiments show that our representation can achieve sample-efficiency in a variety of set-ups.
arXiv Detail & Related papers (2021-08-20T01:20:15Z) - Generalization and Robustness Implications in Object-Centric Learning [23.021791024676986]
In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets.
From our experimental study, we find object-centric representations to be generally useful for downstream tasks.
arXiv Detail & Related papers (2021-07-01T17:51:11Z) - Structure-Regularized Attention for Deformable Object Representation [17.120035855774344]
Capturing contextual dependencies has proven useful to improve the representational power of deep neural networks.
Recent approaches that focus on modeling global context, such as self-attention and non-local operation, achieve this goal by enabling unconstrained pairwise interactions between elements.
We consider learning representations for deformable objects which can benefit from context exploitation by modeling the structural dependencies that the data intrinsically possesses.
arXiv Detail & Related papers (2021-06-12T03:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.