Related papers: A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

URL: http://arxiv.org/abs/2302.11062v1
Date: Tue, 21 Feb 2023 23:35:00 GMT
Title: A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss
Authors: Kyle R. Rust and Toby D. Hocking
Abstract summary: We propose a new functional representation of the square loss and squared hinge loss, which results in algorithms that compute the gradient in either linear or log-linear time. Our new algorithm can achieve higher test AUC values on imbalanced data sets than previous algorithms, and make use of larger batch sizes than were previously feasible.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are used to evaluate binary classification algorithms. Because the Area Under the Curve (AUC) is a constant function of the predicted values, learning algorithms instead optimize convex relaxations which involve a sum over all pairs of labeled positive and negative examples. Naive learning algorithms compute the gradient in quadratic time, which is too slow for learning using large batch sizes. We propose a new functional representation of the square loss and squared hinge loss, which results in algorithms that compute the gradient in either linear or log-linear time, and makes it possible to use gradient descent learning with large batch sizes. In our empirical study of supervised binary classification problems, we show that our new algorithm can achieve higher test AUC values on imbalanced data sets than previous algorithms, and make use of larger batch sizes than were previously feasible.

Related papers

Efficient line search for optimizing Area Under the ROC Curve in gradient descent [2.094821665776961]
We study the piecewise linear/constant nature of the Area Under Min (AUM) of false positive and false negative rates. We propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of descent.
arXiv Detail & Related papers (2024-10-11T08:59:06Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z)
Online Learning Under A Separable Stochastic Approximation Framework [20.26530917721778]
We propose an online learning algorithm for a class of machine learning models under a separable approximation framework. We show that the proposed algorithm produces more robust and test performance when compared to other popular learning algorithms.
arXiv Detail & Related papers (2023-05-12T13:53:03Z)
Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection [1.332560004325655]
We propose a new convex loss function called the AUM, short for Under Min(FP, FN) We show that our new AUM learning results in improved AUC and comparable relative to previous baselines.
arXiv Detail & Related papers (2021-07-02T21:21:19Z)
Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z)
Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product. We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z)
Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs. The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z)
Reparametrizing gradient descent [0.0]
We propose an optimization algorithm which we call norm-adapted gradient descent. Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points.
arXiv Detail & Related papers (2020-10-09T20:22:29Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.