Semantic-Aware Generation for Self-Supervised Visual Representation
Learning
- URL: http://arxiv.org/abs/2111.13163v1
- Date: Thu, 25 Nov 2021 16:46:13 GMT
- Title: Semantic-Aware Generation for Self-Supervised Visual Representation
Learning
- Authors: Yunjie Tian, Lingxi Xie, Xiaopeng Zhang, Jiemin Fang, Haohang Xu, Wei
Huang, Jianbin Jiao, Qi Tian, Qixiang Ye
- Abstract summary: We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
- Score: 116.5814634936371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a self-supervised visual representation learning
approach which involves both generative and discriminative proxies, where we
focus on the former part by requiring the target network to recover the
original image based on the mid-level features. Different from prior work that
mostly focuses on pixel-level similarity between the original and generated
images, we advocate for Semantic-aware Generation (SaGe) to facilitate richer
semantics rather than details to be preserved in the generated image. The core
idea of implementing SaGe is to use an evaluator, a deep network that is
pre-trained without labels, for extracting semantic-aware features. SaGe
complements the target network with view-specific features and thus alleviates
the semantic degradation brought by intensive data augmentations. We execute
SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream
tasks including nearest neighbor test, linear classification, and fine-scaled
image recognition, demonstrating its ability to learn stronger visual
representations.
Related papers
- Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Self-Correlation and Cross-Correlation Learning for Few-Shot Remote
Sensing Image Semantic Segmentation [27.59330408178435]
Few-shot remote sensing semantic segmentation aims at learning to segment target objects from a query image.
We propose a Self-Correlation and Cross-Correlation Learning Network for the few-shot remote sensing image semantic segmentation.
Our model enhances the generalization by considering both self-correlation and cross-correlation between support and query images.
arXiv Detail & Related papers (2023-09-11T21:53:34Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Learning Hierarchical Image Segmentation For Recognition and By Recognition [39.712584686731574]
We propose to integrate a hierarchical segmenter into the recognition process, train and adapt the entire model solely on image-level recognition objectives.
We learn hierarchical segmentation for free alongside recognition, automatically uncovering part-to-whole relationships that not only underpin but also enhance recognition.
Notably, our model (trained on unlabeled 1M ImageNet images) outperforms SAM (trained on 11M images masks) by absolute 8% in mIoU on PartImageNet object segmentation.
arXiv Detail & Related papers (2022-10-01T16:31:44Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.