SGD Through the Lens of Kolmogorov Complexity
- URL: http://arxiv.org/abs/2111.05478v1
- Date: Wed, 10 Nov 2021 01:32:38 GMT
- Title: SGD Through the Lens of Kolmogorov Complexity
- Authors: Gregory Schwartzman
- Abstract summary: gradient descent (SGD) finds a solution that achieves $ (1-epsilon)$ classification accuracy on the entire dataset.
This is the first result which is completely emphmodellemma - we don't require the model to have any specific architecture or activation function.
- Score: 0.15229257192293197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove that stochastic gradient descent (SGD) finds a solution that
achieves $(1-\epsilon)$ classification accuracy on the entire dataset. We do so
under two main assumptions: (1. Local progress) There is consistent improvement
of the model accuracy over batches. (2. Models compute simple functions) The
function computed by the model is simple (has low Kolmogorov complexity).
Intuitively, the above means that \emph{local progress} of SGD implies
\emph{global progress}. Assumption 2 trivially holds for underparameterized
models, hence, our work gives the first convergence guarantee for general,
\emph{underparameterized models}. Furthermore, this is the first result which
is completely \emph{model agnostic} - we don't require the model to have any
specific architecture or activation function, it may not even be a neural
network. Our analysis makes use of the entropy compression method, which was
first introduced by Moser and Tardos in the context of the Lov\'asz local
lemma.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - NGD converges to less degenerate solutions than SGD [0.5249805590164902]
The number of free parameters, or dimension, of a model is a straightforward way to measure its complexity.
But this is not an accurate measure of complexity: models capable of memorizing their training data often generalize well despite their high dimension.
Effective dimension aims to more directly capture the complexity of a model by counting only the number of parameters required to represent the functionality of the model.
arXiv Detail & Related papers (2024-09-07T21:27:49Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - A Unified Approach to Learning Ising Models: Beyond Independence and
Bounded Width [7.605563562103568]
We revisit the problem of efficiently learning the underlying parameters of Ising models from data.
We show that a simple existing approach based on node-wise logistic regression provably succeeds at recovering the underlying model in several new settings.
arXiv Detail & Related papers (2023-11-15T18:41:19Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Adaptive Sparse Gaussian Process [0.0]
We propose the first adaptive sparse Gaussian Process (GP) able to address all these issues.
We first reformulate a variational sparse GP algorithm to make it adaptive through a forgetting factor.
We then propose updating a single inducing point of the sparse GP model together with the remaining model parameters every time a new sample arrives.
arXiv Detail & Related papers (2023-02-20T21:34:36Z) - KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model.
Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z) - uGLAD: Sparse graph recovery by optimizing deep unrolled networks [11.48281545083889]
We present a novel technique to perform sparse graph recovery by optimizing deep unrolled networks.
Our model, uGLAD, builds upon and extends the state-of-the-art model GLAD to the unsupervised setting.
We evaluate model results on synthetic Gaussian data, non-Gaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.
arXiv Detail & Related papers (2022-05-23T20:20:27Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.