Disassembling Object Representations without Labels
- URL: http://arxiv.org/abs/2004.01426v1
- Date: Fri, 3 Apr 2020 08:23:09 GMT
- Title: Disassembling Object Representations without Labels
- Authors: Zunlei Feng, Xinchao Wang, Yongming He, Yike Yuan, Xin Gao, Mingli
Song
- Abstract summary: We study a new representation-learning task, which we termed as disassembling object representations.
Disassembling enables category-specific modularity in the learned representations.
We propose an unsupervised approach to achieving disassembling, named Unsupervised Disassembling Object Representation (UDOR)
- Score: 75.2215716328001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study a new representation-learning task, which we termed
as disassembling object representations. Given an image featuring multiple
objects, the goal of disassembling is to acquire a latent representation, of
which each part corresponds to one category of objects. Disassembling thus
finds its application in a wide domain such as image editing and few- or
zero-shot learning, as it enables category-specific modularity in the learned
representations. To this end, we propose an unsupervised approach to achieving
disassembling, named Unsupervised Disassembling Object Representation (UDOR).
UDOR follows a double auto-encoder architecture, in which a fuzzy
classification and an object-removing operation are imposed. The fuzzy
classification constrains each part of the latent representation to encode
features of up to one object category, while the object-removing, combined with
a generative adversarial network, enforces the modularity of the
representations and integrity of the reconstructed image. Furthermore, we
devise two metrics to respectively measure the modularity of disassembled
representations and the visual integrity of reconstructed images. Experimental
results demonstrate that the proposed UDOR, despited unsupervised, achieves
truly encouraging results on par with those of supervised methods.
Related papers
- Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Image Segmentation-based Unsupervised Multiple Objects Discovery [1.7674345486888503]
Unsupervised object discovery aims to localize objects in images.
We propose a fully unsupervised, bottom-up approach, for multiple objects discovery.
We provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
arXiv Detail & Related papers (2022-12-20T09:48:24Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Compositional Scene Modeling with Global Object-Centric Representations [44.43366905943199]
Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory.
This paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision.
arXiv Detail & Related papers (2022-11-21T14:36:36Z) - On the robustness of self-supervised representations for multi-view
object classification [0.0]
We show that self-supervised representations based on the instance discrimination objective lead to better representations of objects that are more robust to changes in the viewpoint and perspective of the object.
We find that self-supervised representations are more robust to object viewpoint and appear to encode more pertinent information about objects that facilitate the recognition of objects from novel views.
arXiv Detail & Related papers (2022-07-27T17:24:55Z) - Self-Supervised Learning of Object Parts for Semantic Segmentation [7.99536002595393]
We argue that self-supervised learning of object parts is a solution to this issue.
Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
arXiv Detail & Related papers (2022-04-27T17:55:17Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Disentangling What and Where for 3D Object-Centric Representations
Through Active Inference [4.088019409160893]
We propose an active inference agent that can learn novel object categories over time.
We show that our agent is able to learn representations for many object categories in an unsupervised way.
We validate our system in an end-to-end fashion where the agent is able to search for an object at a given pose from a pixel-based rendering.
arXiv Detail & Related papers (2021-08-26T12:49:07Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.