Why Do Better Loss Functions Lead to Less Transferable Features?
- URL: http://arxiv.org/abs/2010.16402v2
- Date: Wed, 3 Nov 2021 18:32:53 GMT
- Title: Why Do Better Loss Functions Lead to Less Transferable Features?
- Authors: Simon Kornblith, Ting Chen, Honglak Lee, Mohammad Norouzi
- Abstract summary: This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet.
We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks.
- Score: 93.47297944685114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous work has proposed many new loss functions and regularizers that
improve test accuracy on image classification tasks. However, it is not clear
whether these loss functions learn better representations for downstream tasks.
This paper studies how the choice of training objective affects the
transferability of the hidden representations of convolutional neural networks
trained on ImageNet. We show that many objectives lead to statistically
significant improvements in ImageNet accuracy over vanilla softmax
cross-entropy, but the resulting fixed feature extractors transfer
substantially worse to downstream tasks, and the choice of loss has little
effect when networks are fully fine-tuned on the new tasks. Using centered
kernel alignment to measure similarity between hidden representations of
networks, we find that differences among loss functions are apparent only in
the last few layers of the network. We delve deeper into representations of the
penultimate layer, finding that different objectives and hyperparameter
combinations lead to dramatically different levels of class separation.
Representations with higher class separation obtain higher accuracy on the
original task, but their features are less useful for downstream tasks. Our
results suggest there exists a trade-off between learning invariant features
for the original task and features relevant for transfer tasks.
Related papers
- Less is More: On the Feature Redundancy of Pretrained Models When
Transferring to Few-shot Tasks [120.23328563831704]
Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data.
We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce.
arXiv Detail & Related papers (2023-10-05T19:00:49Z) - FourierLoss: Shape-Aware Loss Function with Fourier Descriptors [1.5659201748872393]
This work introduces a new shape-aware loss function, which we name FourierLoss.
It relies on the shape dissimilarity between the ground truth and the predicted segmentation maps through the Fourier descriptors calculated on their objects, and penalizing this dissimilarity in network training.
Experiments revealed that the proposed shape-aware loss function led to statistically significantly better results for liver segmentation, compared to its counterparts.
arXiv Detail & Related papers (2023-09-21T14:23:10Z) - Multi-stage feature decorrelation constraints for improving CNN
classification performance [14.09469656684143]
This article proposes a multi-stage feature decorrelation loss (MFD Loss) for CNN.
MFD Loss refines effective features and eliminates information redundancy by constraining the correlation of features at all stages.
Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better.
arXiv Detail & Related papers (2023-08-24T16:00:01Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - SuSana Distancia is all you need: Enforcing class separability in metric
learning via two novel distance-based loss functions for few-shot image
classification [0.9236074230806579]
We propose two loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data.
Our results show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%.
arXiv Detail & Related papers (2023-05-15T23:12:09Z) - Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks.
Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z) - A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions [5.470136744581653]
Deep perceptual loss is a type of loss function for images that computes the error between two images as the distance between deep features extracted from a neural network.
This work evaluates the effect of different pretrained loss networks on four different application areas.
arXiv Detail & Related papers (2023-02-08T13:08:51Z) - Why Do Self-Supervised Models Transfer? Investigating the Impact of
Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images.
We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z) - CrossTransformers: spatially-aware few-shot transfer [92.33252608837947]
Given new tasks with very little data, modern vision systems degrade remarkably quickly.
We show how the neural network representations which underpin modern vision systems are subject to supervision collapse.
We propose self-supervised learning to encourage general-purpose features that transfer better.
arXiv Detail & Related papers (2020-07-22T15:37:08Z) - Adversarial Training Reduces Information and Improves Transferability [81.59364510580738]
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility.
We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task.
arXiv Detail & Related papers (2020-07-22T08:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.