Look-into-Object: Self-supervised Structure Modeling for Object
Recognition
- URL: http://arxiv.org/abs/2003.14142v1
- Date: Tue, 31 Mar 2020 12:22:51 GMT
- Title: Look-into-Object: Self-supervised Structure Modeling for Object
Recognition
- Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Tiejun Zhao, Tao Mei
- Abstract summary: We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
- Score: 71.68524003173219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most object recognition approaches predominantly focus on learning
discriminative visual patterns while overlooking the holistic object structure.
Though important, structure modeling usually requires significant manual
annotations and therefore is labor-intensive. In this paper, we propose to
"look into object" (explicitly yet intrinsically model the object structure)
through incorporating self-supervisions into the traditional framework. We show
the recognition backbone can be substantially enhanced for more robust
representation learning, without any cost of extra annotation and inference
speed. Specifically, we first propose an object-extent learning module for
localizing the object according to the visual patterns shared among the
instances in the same category. We then design a spatial context learning
module for modeling the internal structures of the object, through predicting
the relative positions within the extent. These two modules can be easily
plugged into any backbone networks during training and detached at inference
time. Extensive experiments show that our look-into-object approach (LIO)
achieves large performance gain on a number of benchmarks, including generic
object recognition (ImageNet) and fine-grained object recognition tasks (CUB,
Cars, Aircraft). We also show that this learning paradigm is highly
generalizable to other tasks such as object detection and segmentation (MS
COCO). Project page: https://github.com/JDAI-CV/LIO.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Self-Supervised Learning of Object Parts for Semantic Segmentation [7.99536002595393]
We argue that self-supervised learning of object parts is a solution to this issue.
Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
arXiv Detail & Related papers (2022-04-27T17:55:17Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.