Self-supervised Feature Enhancement: Applying Internal Pretext Task to
Supervised Learning
- URL: http://arxiv.org/abs/2106.04921v1
- Date: Wed, 9 Jun 2021 08:59:35 GMT
- Title: Self-supervised Feature Enhancement: Applying Internal Pretext Task to
Supervised Learning
- Authors: Yuhang Yang, Zilin Ding, Xuan Cheng, Xiaomin Wang, Ming Liu
- Abstract summary: We show that feature transformations within CNNs can also be regarded as supervisory signals to construct the self-supervised task.
Specifically, we first transform the internal feature maps by discarding different channels, and then define an additional internal pretext task to identify the discarded channels.
CNNs are trained to predict the joint labels generated by the combination of self-supervised labels and original labels.
- Score: 6.508466234920147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional self-supervised learning requires CNNs using external pretext
tasks (i.e., image- or video-based tasks) to encode high-level semantic visual
representations. In this paper, we show that feature transformations within
CNNs can also be regarded as supervisory signals to construct the
self-supervised task, called \emph{internal pretext task}. And such a task can
be applied for the enhancement of supervised learning. Specifically, we first
transform the internal feature maps by discarding different channels, and then
define an additional internal pretext task to identify the discarded channels.
CNNs are trained to predict the joint labels generated by the combination of
self-supervised labels and original labels. By doing so, we let CNNs know which
channels are missing while classifying in the hope to mine richer feature
information. Extensive experiments show that our approach is effective on
various models and datasets. And it's worth noting that we only incur
negligible computational overhead. Furthermore, our approach can also be
compatible with other methods to get better results.
Related papers
- CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - SERE: Exploring Feature Self-relation for Self-supervised Transformer [79.5769147071757]
Vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks.
Recent works reveal that self-supervised learning helps unleash the great potential of ViT.
We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks.
arXiv Detail & Related papers (2022-06-10T15:25:00Z) - Self-supervision of Feature Transformation for Further Improving
Supervised Learning [6.508466234920147]
We find that features in CNNs can be also used for self-supervision.
In our task we discard different particular regions of features, and then train the model to distinguish these different features.
Original labels will be expanded to joint labels via self-supervision of feature transformations.
arXiv Detail & Related papers (2021-06-09T09:06:33Z) - Wider Vision: Enriching Convolutional Neural Networks via Alignment to
External Knowledge Bases [0.3867363075280543]
We aim to explain and expand CNNs models via the mirroring or alignment of CNN to an external knowledge base.
This will allow us to give a semantic context or label for each visual feature.
Our results show that in the aligned embedding space, nodes from the knowledge graph are close to the CNN feature nodes that have similar meanings.
arXiv Detail & Related papers (2021-02-22T16:00:03Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - A CNN-based Feature Space for Semi-supervised Incremental Learning in
Assisted Living Applications [2.1485350418225244]
We propose using the feature space that results from the training dataset to automatically label problematic images.
The resulting semi-supervised incremental learning process allows improving the classification accuracy of new instances by 40%.
arXiv Detail & Related papers (2020-11-11T12:31:48Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Decoding CNN based Object Classifier Using Visualization [6.666597301197889]
We visualize what type of features are extracted in different convolution layers of CNN.
Visualizing heat map of activation helps us to understand how CNN classifies and localizes different objects in image.
arXiv Detail & Related papers (2020-07-15T05:01:27Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.