Related papers: Extended convexity and smoothness and their applications in deep learning

Extended convexity and smoothness and their applications in deep learning

URL: http://arxiv.org/abs/2410.05807v2
Date: Wed, 15 Jan 2025 09:53:49 GMT
Title: Extended convexity and smoothness and their applications in deep learning
Authors: Binchuan Qi, Wei Gong, Li Li,
Abstract summary: This paper introduces an optimization framework aimed at providing a theoretical foundation for a class of composite optimization problems, particularly those in deep learning.<n>We analyze the smoothness of Lipschitz's concepts of Lipschitz's descent and descent methods for objective functions that are $mathcalH(Phi)$-smoothness.
Score: 5.281849820329249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces an optimization framework aimed at providing a theoretical foundation for a class of composite optimization problems, particularly those encountered in deep learning. In this framework, we introduce $\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness to generalize the existing concepts of Lipschitz smoothness and strong convexity. Furthermore, we analyze and establish the convergence of both gradient descent and stochastic gradient descent methods for objective functions that are $\mathcal{H}(\Phi)$-smooth. We prove that the optimal convergence rates of these methods depend solely on the homogeneous degree of $\Phi$. Based on these findings, we construct two types of non-convex and non-smooth optimization problems: deterministic composite and stochastic composite optimization problems, which encompass the majority of optimization problems in deep learning. To address these problems, we develop the gradient structure control algorithm and prove that it can locate approximate global optima. This marks a significant departure from traditional non-convex analysis framework, which typically settle for stationary points. Therefore, with the introduction of $\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness, along with the GSC algorithm, the non-convex optimization mechanisms in deep learning can be theoretically explained and supported. Finally, the effectiveness of the proposed framework is substantiated through empirical experimentation.

Related papers

Towards Understanding the Optimization Mechanisms in Deep Learning [5.281849820329249]
In this paper, we adopt a distribution estimation perspective to explore the mechanisms of supervised classification using deep neural networks. For the latter, we provide theoretical insights into mechanisms such as over- and probability randomization.
arXiv Detail & Related papers (2025-03-29T08:46:13Z)
Early-Stopped Mirror Descent for Linear Regression over Convex Bodies [14.30754799752932]
We study the setting of high-dimensional linear regression under additive Gaussian noise. We show that the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body.
arXiv Detail & Related papers (2025-03-05T11:59:31Z)
Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula [17.036996839737828]
We show how an algorithm can be used to prove the optimality of a non-dimensional Gaussian regularization model. We also show how we can use the Tukey loss to prove the optimality of a negative regularization model.
arXiv Detail & Related papers (2025-02-27T11:29:43Z)
Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization [52.61737731453222]
Non-Machine Learning problems typically do not adhere to the standard smoothness assumption. We propose and analyze new methods with local steps, partial participation of clients, and Random Random Reshuffling. Our theory is consistent with the known results for standard smooth problems.
arXiv Detail & Related papers (2024-12-03T19:20:56Z)
Towards Sharper Risk Bounds for Minimax Problems [23.380477456114118]
Minimax problems have achieved success in machine learning such as adversarial, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds are composed by generalization error and present 1/n-rates in strongly-strongly-concave (SC-SC) We analyze some popular algorithms such as empirical saddle point (GDA), gradient descent (DA) and gradient descent (SG) We derive n times faster than results in minimax problems.
arXiv Detail & Related papers (2024-10-11T03:50:23Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Gradient is All You Need? [0.0]
In this paper we provide a novel analytical perspective on the theoretical understanding of learning algorithms by interpreting consensus-based gradient-based optimization (CBO) Our results prove the intrinsic power of CBO to alleviate the complexities of the nonlocal landscape function.
arXiv Detail & Related papers (2023-06-16T11:30:55Z)
Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys. Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population. We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z)
Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL) New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z)
Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems. We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA) Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z)
Improved Analysis of Clipping Algorithms for Non-convex Optimization [19.507750439784605]
Recently, citetzhang 2019gradient show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD. Experiments confirm the superiority of clipping-based methods in deep learning tasks.
arXiv Detail & Related papers (2020-10-05T14:36:59Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.