Unsupervised Object-Centric Learning from Multiple Unspecified
Viewpoints
- URL: http://arxiv.org/abs/2401.01922v1
- Date: Wed, 3 Jan 2024 15:09:25 GMT
- Title: Unsupervised Object-Centric Learning from Multiple Unspecified
Viewpoints
- Authors: Jinyang Yuan, Tonglin Chen, Zhimeng Shen, Bin Li, Xiangyang Xue
- Abstract summary: We consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision.
We propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem.
Experiments on several specifically designed synthetic datasets have shown that the proposed method can effectively learn from multiple unspecified viewpoints.
- Score: 45.88397367354284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual scenes are extremely diverse, not only because there are infinite
possible combinations of objects and backgrounds but also because the
observations of the same scene may vary greatly with the change of viewpoints.
When observing a multi-object visual scene from multiple viewpoints, humans can
perceive the scene compositionally from each viewpoint while achieving the
so-called ``object constancy'' across different viewpoints, even though the
exact viewpoints are untold. This ability is essential for humans to identify
the same object while moving and to learn from vision efficiently. It is
intriguing to design models that have a similar ability. In this paper, we
consider a novel problem of learning compositional scene representations from
multiple unspecified (i.e., unknown and unrelated) viewpoints without using any
supervision and propose a deep generative model which separates latent
representations into a viewpoint-independent part and a viewpoint-dependent
part to solve this problem. During the inference, latent representations are
randomly initialized and iteratively updated by integrating the information in
different viewpoints with neural networks. Experiments on several specifically
designed synthetic datasets have shown that the proposed method can effectively
learn from multiple unspecified viewpoints.
Related papers
- Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection [41.419853273742746]
We propose a novel active viewpoint selection strategy for object-centric learning.
It predicts images from unknown viewpoints based on information from observation images for each scene.
Our method can accurately predict images from unknown viewpoints.
arXiv Detail & Related papers (2024-11-01T07:01:44Z) - Learning Global Object-Centric Representations via Disentangled Slot Attention [38.78205074748021]
This paper introduces a novel object-centric learning method to empower AI systems with human-like capabilities to identify objects across scenes and generate diverse scenes containing specific objects by learning a set of global object-centric representations.
Experimental results substantiate the efficacy of the proposed method, demonstrating remarkable proficiency in global object-centric representation learning, object identification, scene generation with specific objects and scene decomposition.
arXiv Detail & Related papers (2024-10-24T14:57:00Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - A Computational Account Of Self-Supervised Visual Learning From
Egocentric Object Play [3.486683381782259]
We study how learning signals that equate different viewpoints can support robust visual learning.
We find that representations learned by equating different physical viewpoints of an object benefit downstream image classification accuracy.
arXiv Detail & Related papers (2023-05-30T22:42:03Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Unsupervised Learning of Compositional Scene Representations from
Multiple Unspecified Viewpoints [41.07379505694274]
We consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision.
We propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem.
Experiments on several specifically designed synthetic datasets have shown that the proposed method is able to effectively learn from multiple unspecified viewpoints.
arXiv Detail & Related papers (2021-12-07T08:45:21Z) - Learning Object-Centric Representations of Multi-Object Scenes from
Multiple Views [9.556376932449187]
Multi-View and Multi-Object Network (MulMON) is a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.
We show that MulMON better-resolves spatial ambiguities than single-view methods.
arXiv Detail & Related papers (2021-11-13T13:54:28Z) - Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data.
We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.