Stochastic Gradient Descent Meets Distribution Regression
- URL: http://arxiv.org/abs/2010.12842v2
- Date: Fri, 5 Mar 2021 16:33:42 GMT
- Title: Stochastic Gradient Descent Meets Distribution Regression
- Authors: Nicole M\"ucke
- Abstract summary: gradient descent (SGD) provides a simple and efficient way to solve a broad range of machine learning problems.
We focus on distribution regression (DR), involving two stages of sampling: Firstly, we regress from probability measures to real-valued responses.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent (SGD) provides a simple and efficient way to
solve a broad range of machine learning problems. Here, we focus on
distribution regression (DR), involving two stages of sampling: Firstly, we
regress from probability measures to real-valued responses. Secondly, we sample
bags from these distributions for utilizing them to solve the overall
regression problem. Recently, DR has been tackled by applying kernel ridge
regression and the learning properties of this approach are well understood.
However, nothing is known about the learning properties of SGD for two stage
sampling problems. We fill this gap and provide theoretical guarantees for the
performance of SGD for DR. Our bounds are optimal in a mini-max sense under
standard assumptions.
Related papers
- Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution [6.144680854063938]
We consider a variant of the gradient descent (SGD) with a random learning rate to reveal its convergence properties.
We demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions.
arXiv Detail & Related papers (2024-06-23T06:52:33Z) - Escaping mediocrity: how two-layer networks learn hard generalized
linear models with SGD [29.162265194920522]
This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Gradient Descent (SGD)
We show that overfactorization can only enhance convergence by a constant factor within this problem class.
Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of SGDity may be minimal in this scenario.
arXiv Detail & Related papers (2023-05-29T14:40:56Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z) - Robust Kernel-based Distribution Regression [13.426195476348955]
We study distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses over a kernel reproducing Hilbert space (RKHS)
By introducing a robust loss function $l_sigma$ for two-stage sampling problems, we present a novel robust distribution regression (RDR) scheme.
arXiv Detail & Related papers (2021-04-21T17:03:46Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Choosing the Sample with Lowest Loss makes SGD Robust [19.08973384659313]
We propose a simple variant of the simple gradient descent (SGD) method in each step.
Vanilla represents a new algorithm that is however effectively minimizing a non-current sum with the smallest loss.
Our theoretical analysis of this idea for ML problems is backed up with small-scale neural network experiments.
arXiv Detail & Related papers (2020-01-10T05:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.