From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes
- URL: http://arxiv.org/abs/2412.08424v1
- Date: Wed, 11 Dec 2024 14:43:39 GMT
- Title: From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes
- Authors: Alexander Tyurin,
- Abstract summary: We focus on the classification problem with a separable dataset.
Recent studies have observed that LR+GD can find a solution with arbitrarily large step sizes.
- Score: 57.93371273485736
- License:
- Abstract: We focus on the classification problem with a separable dataset, one of the most important and classical problems from machine learning. The standard approach to this task is logistic regression with gradient descent (LR+GD). Recent studies have observed that LR+GD can find a solution with arbitrarily large step sizes, defying conventional optimization theory. Our work investigates this phenomenon and makes three interconnected key observations about LR+GD with large step sizes. First, we find a remarkably simple explanation of why LR+GD with large step sizes solves the classification problem: LR+GD reduces to a batch version of the celebrated perceptron algorithm when the step size $\gamma \to \infty.$ Second, we observe that larger step sizes lead LR+GD to higher logistic losses when it tends to the perceptron algorithm, but larger step sizes also lead to faster convergence to a solution for the classification problem, meaning that logistic loss is an unreliable metric of the proximity to a solution. Surprisingly, high loss values can actually indicate faster convergence. Third, since the convergence rate in terms of loss function values of LR+GD is unreliable, we examine the iteration complexity required by LR+GD with large step sizes to solve the classification problem and prove that this complexity is suboptimal. To address this, we propose a new method, Normalized LR+GD - based on the connection between LR+GD and the perceptron algorithm - with much better theoretical guarantees.
Related papers
- Where Do Large Learning Rates Lead Us? [5.305784285588872]
We show that only a narrow range of initial LRs leads to optimal results after fine-tuning with a small LR or weight averaging.
We show that these initial LRs result in a sparse set of learned features, with a clear focus on those most relevant for the task.
In contrast, starting training with too small LRs leads to unstable minima and attempts to learn all features simultaneously, resulting in poor generalization.
arXiv Detail & Related papers (2024-10-29T15:14:37Z) - ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer [68.72454974431749]
We present ClearSR, a new method that can better take advantage of latent low-resolution image (LR) embeddings for diffusion-based real-world image super-resolution (Real-ISR)
Our model can achieve better performance across multiple metrics on several test sets and generate more consistent SR results with LR images than existing methods.
arXiv Detail & Related papers (2024-10-18T08:35:57Z) - Scaling Optimal LR Across Token Horizons [81.29631219839311]
We show how optimal learning rate depends on token horizon in LLM training.
We also provide evidence that LLama-1 used too high LR, and estimate the performance hit from this.
arXiv Detail & Related papers (2024-09-30T03:32:02Z) - Stable Phase Retrieval with Mirror Descent [0.5312303275762104]
We show that mirror descent converges to a critical point of the phase retrieval problem.
We provide global convergence guarantees that ensure that with high probability, mirror descent converges to a global minimizer near the true vector.
arXiv Detail & Related papers (2024-05-17T13:07:14Z) - Label Ranking through Nonparametric Regression [5.994412766684843]
Label Ranking corresponds to the problem of learning a hypothesis that maps features to rankings over a finite set of labels.
We introduce a generative model for Label Ranking, in noiseless and noisy nonparametric regression settings.
We complement our theoretical contributions with experiments, aiming to understand how the input regression noise affects the observed output.
arXiv Detail & Related papers (2021-11-04T10:59:46Z) - A Deep Residual Star Generative Adversarial Network for multi-domain
Image Super-Resolution [21.39772242119127]
Super-Resolution Residual StarGAN (SR2*GAN) is a novel and scalable approach that super-resolves the LR images for the multiple LR domains using only a single model.
We demonstrate the effectiveness of our proposed approach in quantitative and qualitative experiments compared to other state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T11:15:17Z) - A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks.
We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z) - LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single
Image Super-Resolution and Beyond [75.37541439447314]
Single image super-resolution (SISR) deals with a fundamental problem of upsampling a low-resolution (LR) image to its high-resolution (HR) version.
This paper proposes a linearly-assembled pixel-adaptive regression network (LAPAR) to strike a sweet spot of deep model complexity and resulting SISR quality.
arXiv Detail & Related papers (2021-05-21T15:47:18Z) - Closed-loop Matters: Dual Regression Networks for Single Image
Super-Resolution [73.86924594746884]
Deep neural networks have exhibited promising performance in image super-resolution.
These networks learn a nonlinear mapping function from low-resolution (LR) images to high-resolution (HR) images.
We propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions.
arXiv Detail & Related papers (2020-03-16T04:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.