Related papers: von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning

von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning

URL: http://arxiv.org/abs/2103.15718v2
Date: Wed, 31 Mar 2021 15:10:23 GMT
Title: von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning
Authors: Tyler R. Scott and Andrew C. Gallagher and Michael C. Mozer
Abstract summary: Recent work has argued that classification losses utilizing softmax cross-entropy are superior not only for fixed-set classification tasks, but also for open-set tasks. We conduct an empirical investigation of embedding geometry on softmax losses for a variety of fixed-set classification and image retrieval tasks.
Score: 12.37528281037283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has argued that classification losses utilizing softmax cross-entropy are superior not only for fixed-set classification tasks, but also by outperforming losses developed specifically for open-set tasks including few-shot learning and retrieval. Softmax classifiers have been studied using different embedding geometries -- Euclidean, hyperbolic, and spherical -- and claims have been made about the superiority of one or another, but they have not been systematically compared with careful controls. We conduct an empirical investigation of embedding geometry on softmax losses for a variety of fixed-set classification and image retrieval tasks. An interesting property observed for the spherical losses lead us to propose a probabilistic classifier based on the von Mises-Fisher distribution, and we show that it is competitive with state-of-the-art methods while producing improved out-of-the-box calibration. We provide guidance regarding the trade-offs between losses and how to choose among them.

Related papers

Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z)
The Multiclass Score-Oriented Loss (MultiSOL) on the Simplex [4.014524824655106]
In supervised binary classification, score-oriented losses have been introduced with the aim of optimizing a chosen performance metric directly during the training phase.<n>In this paper, we use a recently introduced multidimensional threshold-based classification framework to extend such score-oriented losses to multiclass classification.<n>As also demonstrated by several classification experiments, this proposed family of losses is designed to preserve the main advantages observed in the binary setting.
arXiv Detail & Related papers (2025-11-27T16:20:55Z)
Semi-Supervised Contrastive Learning with Orthonormal Prototypes [1.478364697333309]
dimensional collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge.<n>We propose CLOP, a novel semi-supervised loss function designed to prevent dimensional collapse by promoting the formation of linear subspaces among class embeddings.<n>We show that CLOP improves performance in image classification and object detection tasks while also exhibiting greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2025-11-27T13:26:59Z)
NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z)
Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z)
Center Contrastive Loss for Metric Learning [8.433000039153407]
We propose a novel metric learning function called Center Contrastive Loss. It maintains a class-wise center bank and compares the category centers with the query data points using a contrastive loss. The proposed loss combines the advantages of both contrastive and classification methods.
arXiv Detail & Related papers (2023-08-01T11:22:51Z)
Learning Towards the Largest Margins [83.7763875464011]
Loss function should promote the largest possible margins for both classes and samples. Not only does this principled framework offer new perspectives to understand and interpret existing margin-based losses, but it can guide the design of new tools.
arXiv Detail & Related papers (2022-06-23T10:03:03Z)
Revisiting lp-constrained Softmax Loss: A Comprehensive Study [2.570570340104555]
We investigate the performance of lp-constrained softmax loss classifiers across different norm orders, magnitudes, and data dimensions. Experimental results suggest collectively that lp-constrained softmax loss classifiers can achieve more accurate classification results. We suggest that lp normalization is a recommended data representation practice for image classification in terms of performance and convergence.
arXiv Detail & Related papers (2022-06-20T08:03:12Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
A Decidability-Based Loss Function [2.5919311269669003]
Biometric problems often use deep learning models to extract features from images, also known as embeddings. In this work, a loss function based on the decidability index is proposed to improve the quality of embeddings for the verification routine. The proposed approach is compared against the Softmax (cross-entropy), Triplets Soft-Hard, and the Multi Similarity losses in four different benchmarks.
arXiv Detail & Related papers (2021-09-12T14:26:27Z)
Adversarial Robustness via Fisher-Rao Regularization [33.134075068748984]
Adrial robustness has become a topic of growing interest in machine learning. Fire is a new Fisher-Rao regularization for the categorical cross-entropy loss.
arXiv Detail & Related papers (2021-06-12T04:12:58Z)
Shaping Deep Feature Space towards Gaussian Mixture for Visual Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification. With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution. The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z)
Robustifying Binary Classification to Adversarial Perturbation [45.347651499585055]
In this paper we consider the problem of binary classification with adversarial perturbations. We introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data. Under some mild assumptions on the loss function, we theoretically show that the gradient descents converge to the RM classifier in its direction.
arXiv Detail & Related papers (2020-10-29T07:20:37Z)
Rethinking preventing class-collapsing in metric learning with margin-based losses [81.22825616879936]
Metric learning seeks embeddings where visually similar instances are close and dissimilar instances are apart. margin-based losses tend to project all samples of a class onto a single point in the embedding space. We propose a simple modification to the embedding losses such that each sample selects its nearest same-class counterpart in a batch.
arXiv Detail & Related papers (2020-06-09T09:59:25Z)
Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry. Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.