A Softmax-free Loss Function Based on Predefined Optimal-distribution of
Latent Features for CNN Classifier
- URL: http://arxiv.org/abs/2111.15449v1
- Date: Thu, 25 Nov 2021 06:01:53 GMT
- Title: A Softmax-free Loss Function Based on Predefined Optimal-distribution of
Latent Features for CNN Classifier
- Authors: Qiuyu Zhu, Xuewen Zu
- Abstract summary: This article proposes a Softmax-free loss function (POD Loss) based on predefined optimal-distribution of latent features.
The loss function only restricts the latent features of the samples, including the cosine distance between the latent feature vector of the sample and the center of the predefined evenly-distributed class.
Compared with the commonly used Softmax Loss and the typical Softmax related AM-Softmax Loss, COT-Loss and PEDCC-Loss, experiments on several commonly used datasets on a typical network show that POD Loss is always better and easier to converge.
- Score: 4.7210697296108926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of pattern classification, the training of convolutional neural
network classifiers is mostly end-to-end learning, and the loss function is the
constraint on the final output (posterior probability) of the network, so the
existence of Softmax is essential. In the case of end-to-end learning, there is
usually no effective loss function that completely relies on the features of
the middle layer to restrict learning, resulting in the distribution of sample
latent features is not optimal, so there is still room for improvement in
classification accuracy. Based on the concept of Predefined Evenly-Distributed
Class Centroids (PEDCC), this article proposes a Softmax-free loss function
(POD Loss) based on predefined optimal-distribution of latent features. The
loss function only restricts the latent features of the samples, including the
cosine distance between the latent feature vector of the sample and the center
of the predefined evenly-distributed class, and the correlation between the
latent features of the samples. Finally, cosine distance is used for
classification. Compared with the commonly used Softmax Loss and the typical
Softmax related AM-Softmax Loss, COT-Loss and PEDCC-Loss, experiments on
several commonly used datasets on a typical network show that the
classification performance of POD Loss is always better and easier to converge.
Code is available in https://github.com/TianYuZu/POD-Loss.
Related papers
- Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - A two-head loss function for deep Average-K classification [8.189630642296416]
We propose a new loss function based on a multi-label classification in addition to the classical softmax.
We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes.
arXiv Detail & Related papers (2023-03-31T15:04:53Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Spectral Aware Softmax for Visible-Infrared Person Re-Identification [123.69049942659285]
Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities.
Existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks.
We propose the spectral-aware softmax (SA-Softmax) loss, which can fully explore the embedding space with the modality information.
arXiv Detail & Related papers (2023-02-03T02:57:18Z) - Maximally Compact and Separated Features with Regular Polytope Networks [22.376196701232388]
We show how to extract from CNNs features the properties of emphmaximum inter-class separability and emphmaximum intra-class compactness.
We obtain features similar to what can be obtained with the well-known citewen2016discriminative and other similar approaches.
arXiv Detail & Related papers (2023-01-15T15:20:57Z) - Distinction Maximization Loss: Efficiently Improving Classification
Accuracy, Uncertainty Estimation, and Out-of-Distribution Detection Simply
Replacing the Loss and Calibrating [2.262407399039118]
We propose training deterministic deep neural networks using our DisMax loss.
DisMax usually outperforms all current approaches simultaneously in classification accuracy, uncertainty estimation, inference efficiency, and out-of-distribution detection.
arXiv Detail & Related papers (2022-05-12T04:37:35Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Frequency-aware Discriminative Feature Learning Supervised by
Single-Center Loss for Face Forgery Detection [89.43987367139724]
Face forgery detection is raising ever-increasing interest in computer vision.
Recent works have reached sound achievements, but there are still unignorable problems.
A novel frequency-aware discriminative feature learning framework is proposed in this paper.
arXiv Detail & Related papers (2021-03-16T14:17:17Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - $\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using
Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance.
Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.