Extended convexity and smoothness and their applications in deep learning
- URL: http://arxiv.org/abs/2410.05807v1
- Date: Tue, 8 Oct 2024 08:40:07 GMT
- Title: Extended convexity and smoothness and their applications in deep learning
- Authors: Binchuan Qi,
- Abstract summary: In this paper, we introduce the $mathcal$H$smoothness algorithm for non-completely understood gradient and strong convexity.
The effectiveness of the proposed methodology is validated through experiments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The underlying mechanism by which simple gradient-based iterative algorithms can effectively handle the non-convex problem of deep model training remains incompletely understood within the traditional convex and non-convex analysis frameworks, which often require the Lipschitz smoothness of the gradient and strong convexity. In this paper, we introduce $\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness, which broaden the existing concepts of smoothness and convexity, and delineate their fundamental properties. Building on these concepts, we introduce the high-order gradient descent and high-order stochastic gradient descent methods, which serve as extensions to the traditional gradient descent and stochastic gradient descent methods, respectively. Furthermore, we establish descent lemmas for the $\mathcal{H}(\phi)$-convex and $\mathcal{H}(\Phi)$-smooth objective functions when utilizing these four methods. On the basis of these findings, we develop the gradient structure control algorithm to address non-convex optimization objectives, encompassing both the functions represented by machine learning models and common loss functions in deep learning. The effectiveness of the proposed methodology is empirically validated through experiments.
Related papers
- Towards Understanding the Optimization Mechanisms in Deep Learning [5.281849820329249]
In this paper, we adopt a distribution estimation perspective to explore the mechanisms of supervised classification using deep neural networks.
For the latter, we provide theoretical insights into mechanisms such as over- and probability randomization.
arXiv Detail & Related papers (2025-03-29T08:46:13Z) - Early-Stopped Mirror Descent for Linear Regression over Convex Bodies [14.30754799752932]
We study the setting of high-dimensional linear regression under additive Gaussian noise.
We show that the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body.
arXiv Detail & Related papers (2025-03-05T11:59:31Z) - Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula [17.036996839737828]
We show how an algorithm can be used to prove the optimality of a non-dimensional Gaussian regularization model.
We also show how we can use the Tukey loss to prove the optimality of a negative regularization model.
arXiv Detail & Related papers (2025-02-27T11:29:43Z) - Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization [52.61737731453222]
Non-Machine Learning problems typically do not adhere to the standard smoothness assumption.
We propose and analyze new methods with local steps, partial participation of clients, and Random Random Reshuffling.
Our theory is consistent with the known results for standard smooth problems.
arXiv Detail & Related papers (2024-12-03T19:20:56Z) - Towards Sharper Risk Bounds for Minimax Problems [23.380477456114118]
Minimax problems have achieved success in machine learning such as adversarial, robust optimization, reinforcement learning.
For theoretical analysis, current optimal excess risk bounds are composed by generalization error and present 1/n-rates in strongly-strongly-concave (SC-SC)
We analyze some popular algorithms such as empirical saddle point (GDA), gradient descent (DA) and gradient descent (SG)
We derive n times faster than results in minimax problems.
arXiv Detail & Related papers (2024-10-11T03:50:23Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Gradient is All You Need? [0.0]
In this paper we provide a novel analytical perspective on the theoretical understanding of learning algorithms by interpreting consensus-based gradient-based optimization (CBO)
Our results prove the intrinsic power of CBO to alleviate the complexities of the nonlocal landscape function.
arXiv Detail & Related papers (2023-06-16T11:30:55Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population.
We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA)
Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z) - Improved Analysis of Clipping Algorithms for Non-convex Optimization [19.507750439784605]
Recently, citetzhang 2019gradient show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD.
Experiments confirm the superiority of clipping-based methods in deep learning tasks.
arXiv Detail & Related papers (2020-10-05T14:36:59Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.