Related papers: Unsupervised Object-Centric Learning from Multiple Unspecified Viewpoints

Unsupervised Object-Centric Learning from Multiple Unspecified Viewpoints

URL: http://arxiv.org/abs/2401.01922v1
Date: Wed, 3 Jan 2024 15:09:25 GMT
Title: Unsupervised Object-Centric Learning from Multiple Unspecified Viewpoints
Authors: Jinyang Yuan, Tonglin Chen, Zhimeng Shen, Bin Li, Xiangyang Xue
Abstract summary: We consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision. We propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem. Experiments on several specifically designed synthetic datasets have shown that the proposed method can effectively learn from multiple unspecified viewpoints.
Score: 45.88397367354284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual scenes are extremely diverse, not only because there are infinite possible combinations of objects and backgrounds but also because the observations of the same scene may vary greatly with the change of viewpoints. When observing a multi-object visual scene from multiple viewpoints, humans can perceive the scene compositionally from each viewpoint while achieving the so-called ``object constancy'' across different viewpoints, even though the exact viewpoints are untold. This ability is essential for humans to identify the same object while moving and to learn from vision efficiently. It is intriguing to design models that have a similar ability. In this paper, we consider a novel problem of learning compositional scene representations from multiple unspecified (i.e., unknown and unrelated) viewpoints without using any supervision and propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem. During the inference, latent representations are randomly initialized and iteratively updated by integrating the information in different viewpoints with neural networks. Experiments on several specifically designed synthetic datasets have shown that the proposed method can effectively learn from multiple unspecified viewpoints.

Related papers

Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection [41.419853273742746]
We propose a novel active viewpoint selection strategy for object-centric learning. It predicts images from unknown viewpoints based on information from observation images for each scene. Our method can accurately predict images from unknown viewpoints.
arXiv Detail & Related papers (2024-11-01T07:01:44Z)
Learning Global Object-Centric Representations via Disentangled Slot Attention [38.78205074748021]
This paper introduces a novel object-centric learning method to empower AI systems with human-like capabilities to identify objects across scenes and generate diverse scenes containing specific objects by learning a set of global object-centric representations. Experimental results substantiate the efficacy of the proposed method, demonstrating remarkable proficiency in global object-centric representation learning, object identification, scene generation with specific objects and scene decomposition.
arXiv Detail & Related papers (2024-10-24T14:57:00Z)
AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations. Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z)
A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play [3.486683381782259]
We study how learning signals that equate different viewpoints can support robust visual learning. We find that representations learned by equating different physical viewpoints of an object benefit downstream image classification accuracy.
arXiv Detail & Related papers (2023-05-30T22:42:03Z)
Neural Groundplans: Persistent Neural Scene Representations from a Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation. We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z)
Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints [41.07379505694274]
We consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision. We propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem. Experiments on several specifically designed synthetic datasets have shown that the proposed method is able to effectively learn from multiple unspecified viewpoints.
arXiv Detail & Related papers (2021-12-07T08:45:21Z)
Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views [9.556376932449187]
Multi-View and Multi-Object Network (MulMON) is a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views. We show that MulMON better-resolves spatial ambiguities than single-view methods.
arXiv Detail & Related papers (2021-11-13T13:54:28Z)
Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data. We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z)
Exploit Clues from Views: Self-Supervised and Regularized Learning for Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL) A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation. Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.