Related papers: A termination criterion for stochastic gradient descent for binary classification

A termination criterion for stochastic gradient descent for binary classification

URL: http://arxiv.org/abs/2003.10312v1
Date: Mon, 23 Mar 2020 15:00:21 GMT
Title: A termination criterion for stochastic gradient descent for binary classification
Authors: Sina Baghal, Courtney Paquette, Stephen A. Vavasis
Abstract summary: We propose a new, simple, and inexpensive termination test for constant step-size gradient descent. We show that our test terminates in a finite number of iterations and when the noise in the data is not too large, the expected classifier at termination nearly minimizes the probability of misclassification.
Score: 3.216356957608319
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new, simple, and computationally inexpensive termination test for constant step-size stochastic gradient descent (SGD) applied to binary classification on the logistic and hinge loss with homogeneous linear predictors. Our theoretical results support the effectiveness of our stopping criterion when the data is Gaussian distributed. This presence of noise allows for the possibility of non-separable data. We show that our test terminates in a finite number of iterations and when the noise in the data is not too large, the expected classifier at termination nearly minimizes the probability of misclassification. Finally, numerical experiments indicate for both real and synthetic data sets that our termination test exhibits a good degree of predictability on accuracy and running time.

Related papers

Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications. We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
Theoretical characterization of uncertainty in high-dimensional linear classification [24.073221004661427]
We show that uncertainty for learning from limited number of samples of high-dimensional input data and labels can be obtained by the approximate message passing algorithm. We discuss how over-confidence can be mitigated by appropriately regularising, and show that cross-validating with respect to the loss leads to better calibration than with the 0/1 error.
arXiv Detail & Related papers (2022-02-07T15:32:07Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Estimation and Applications of Quantiles in Deep Binary Classification [0.0]
Quantile regression, based on check loss, is a widely used inferential paradigm in Statistics. We consider the analogue of check loss in the binary classification setting. We develop individualized confidence scores that can be used to decide whether a prediction is reliable.
arXiv Detail & Related papers (2021-02-09T07:07:42Z)
Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime [16.1176305285103]
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. We analyze this algorithm applied to random data including misclassification noise.
arXiv Detail & Related papers (2020-04-25T00:06:18Z)
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime [32.65347128465841]
Deep neural networks can achieve remarkable performances while interpolating the training data perfectly. Rather than the U-curve of the bias-variance trade-off, their test error often follows a "double descent" We develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks.
arXiv Detail & Related papers (2020-03-02T17:39:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.