Related papers: Multinomial Logistic Regression Algorithms via Quadratic Gradient

Multinomial Logistic Regression Algorithms via Quadratic Gradient

URL: http://arxiv.org/abs/2208.06828v2
Date: Wed, 29 Mar 2023 12:10:09 GMT
Title: Multinomial Logistic Regression Algorithms via Quadratic Gradient
Authors: John Chiang
Abstract summary: We propose an enhanced Adaptive Gradient Algorithm (Adagrad) that can accelerate the original Adagrad method. We test the enhanced NAG method and the enhanced Adagrad method on some multiclass-problem datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multinomial logistic regression, also known by other names such as multiclass logistic regression and softmax regression, is a fundamental classification method that generalizes binary logistic regression to multiclass problems. A recently work proposed a faster gradient called $\texttt{quadratic gradient}$ that can accelerate the binary logistic regression training, and presented an enhanced Nesterov's accelerated gradient (NAG) method for binary logistic regression. In this paper, we extend this work to multiclass logistic regression and propose an enhanced Adaptive Gradient Algorithm (Adagrad) that can accelerate the original Adagrad method. We test the enhanced NAG method and the enhanced Adagrad method on some multiclass-problem datasets. Experimental results show that both enhanced methods converge faster than their original ones respectively.

Related papers

MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training. We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
Optimal lower bounds for logistic log-likelihoods [1.3124513975412255]
The logit transform is arguably the most widely-employed link function beyond linear settings. It is still unclear whether tangent lower bounds sharper than quadratic ones can be derived without undermining the tractability of the resulting minorizer. This article addresses such a challenging question through the design and study of a novel piece-wise quadratic lower bound.
arXiv Detail & Related papers (2024-10-14T09:09:33Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Gradient Descent Converges Linearly for Logistic Regression on Separable Data [17.60502131429094]
We show that running gradient descent with variable learning rate guarantees loss $f(x) leq 1.1 cdot f(x*) + epsilon$ the logistic regression objective. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
arXiv Detail & Related papers (2023-06-26T02:15:26Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant [0.0]
We introduce an efficient gradient, called $quadratic$ $gradient$, for privacy-preserving logistic regression training. Experimental results demonstrate that the enhanced algorithms achieve significantly improved convergence speed. There is a good chance that the quadratic gradient approach could integrate first-order gradient descent/ascent algorithms with the second-order Newton-Raphson methods.
arXiv Detail & Related papers (2022-01-26T09:44:13Z)
Human Pose Regression with Residual Log-likelihood Estimation [48.30425850653223]
We propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead.
arXiv Detail & Related papers (2021-07-23T15:06:31Z)
Fast Margin Maximization via Dual Acceleration [52.62944011696364]
We present and analyze a momentum-based method for training linear classifiers with an exponentially-tailed loss. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual.
arXiv Detail & Related papers (2021-07-01T16:36:39Z)
Robust Regression Revisited: Acceleration and Improved Estimation Rates [25.54653340884806]
We study fast algorithms for statistical regression problems under the strong contamination model. The goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees.
arXiv Detail & Related papers (2021-06-22T17:21:56Z)
Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z)
PIANO: A Fast Parallel Iterative Algorithm for Multinomial and Sparse Multinomial Logistic Regression [0.0]
We show that PIANO can be easily extended to solve the Sparse Multinomial Logistic Regression problem. We also prove that PIANO converges to a stationary point of the Multinomial and the Sparse Multinomial Logistic Regression problems.
arXiv Detail & Related papers (2020-02-21T05:15:48Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.