Related papers: A surrogate loss function for optimization of $F_\beta$ score in binary classification with imbalanced data

A surrogate loss function for optimization of $F_\beta$ score in binary classification with imbalanced data

URL: http://arxiv.org/abs/2104.01459v1
Date: Sat, 3 Apr 2021 18:36:23 GMT
Title: A surrogate loss function for optimization of $F_\beta$ score in binary classification with imbalanced data
Authors: Namgil Lee, Heejung Yang, Hojin Yoo
Abstract summary: The gradient paths of the proposed surrogate $F_beta$ loss function approximate the gradient paths of the large sample limit of the $F_beta$ score. It is demonstrated that the proposed surrogate $F_beta$ loss function is effective for optimizing $F_beta$ scores under class imbalances.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The $F_\beta$ score is a commonly used measure of classification performance, which plays crucial roles in classification tasks with imbalanced data sets. However, the $F_\beta$ score cannot be used as a loss function by gradient-based learning algorithms for optimizing neural network parameters due to its non-differentiability. On the other hand, commonly used loss functions such as the binary cross-entropy (BCE) loss are not directly related to performance measures such as the $F_\beta$ score, so that neural networks optimized by using the loss functions may not yield optimal performance measures. In this study, we investigate a relationship between classification performance measures and loss functions in terms of the gradients with respect to the model parameters. Then, we propose a differentiable surrogate loss function for the optimization of the $F_\beta$ score. We show that the gradient paths of the proposed surrogate $F_\beta$ loss function approximate the gradient paths of the large sample limit of the $F_\beta$ score. Through numerical experiments using ResNets and benchmark image data sets, it is demonstrated that the proposed surrogate $F_\beta$ loss function is effective for optimizing $F_\beta$ scores under class imbalances in binary classification tasks compared with other loss functions.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition. We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z)
Loss Functions and Operators Generated by f-Divergences [21.58093510003414]
We propose to construct new convex loss functions based on $f$-divergences. By analogy with the logistic loss, the loss function generated by an $f$-divergence is associated with an operator, that we dub $f$-softargmax. One of the goals of this paper is to determine the effectiveness of loss functions beyond the classical cross-entropy in a language model setting.
arXiv Detail & Related papers (2025-01-30T18:06:18Z)
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives. We exploit their second-order information via their empirical Fisherssian matrices. We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z)
$α$-Divergence Loss Function for Neural Density Ratio Estimation [0.0]
Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. Existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback-Leibler (KL) divergence loss functions. We propose a novel loss function for DRE, the $alpha$-divergence loss function ($alpha$-Div), which is concise but offers stable and effective optimization for DRE.
arXiv Detail & Related papers (2024-02-03T05:33:01Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks. Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z)
Xtreme Margin: A Tunable Loss Function for Binary Classification Problems [0.0]
We provide an overview of a novel loss function, the Xtreme Margin loss function. Unlike the binary cross-entropy and the hinge loss functions, this loss function provides researchers and practitioners flexibility with their training process.
arXiv Detail & Related papers (2022-10-31T22:39:32Z)
Reformulating van Rijsbergen's $F_{\beta}$ metric for weighted binary cross-entropy [0.0]
This paper investigates incorporating a performance metric alongside differentiable loss functions to inform training outcomes. The focus is on van Rijsbergens $F_beta$ metric -- a popular choice for gauging classification performance.
arXiv Detail & Related papers (2022-10-29T01:21:42Z)
Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making. Two core challenges impede the development of efficient methods with finitetime convergence guarantee. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z)
Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z)
Binarizing by Classification: Is soft function really necessary? [4.329951775163721]
We propose to tackle network binarization as a binary classification problem. We also take binarization as a lightweighting approach for pose estimation models. The proposed method enables binary networks to achieve a mAP of up to $60.6$ for the first time.
arXiv Detail & Related papers (2022-05-16T02:47:41Z)
Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch. Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning. We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z)
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation [56.343646789922545]
We propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric. Experiments on PASCAL VOC and Cityscapes demonstrate that the searched surrogate losses outperform the manually designed loss functions consistently.
arXiv Detail & Related papers (2020-10-15T17:59:08Z)
$\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance. Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z)
A Unified Framework of Surrogate Loss by Refactoring and Interpolation [65.60014616444623]
We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent. We validate the effectiveness of UniLoss on three tasks and four datasets.
arXiv Detail & Related papers (2020-07-27T21:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.