Designing a Linearized Potential Function in Neural Network Optimization Using Csiszár Type of Tsallis Entropy
- URL: http://arxiv.org/abs/2411.03611v1
- Date: Wed, 06 Nov 2024 02:12:41 GMT
- Title: Designing a Linearized Potential Function in Neural Network Optimization Using Csiszár Type of Tsallis Entropy
- Authors: Keito Akiyama,
- Abstract summary: In this paper, we establish a framework that utilizes a linearized potential function via Csisz'ar type of Tsallis entropy.
We show that our new framework enable us to derive an exponential convergence result.
- Score: 0.0
- License:
- Abstract: In recent years, learning for neural networks can be viewed as optimization in the space of probability measures. To obtain the exponential convergence to the optimizer, the regularizing term based on Shannon entropy plays an important role. Even though an entropy function heavily affects convergence results, there is almost no result on its generalization, because of the following two technical difficulties: one is the lack of sufficient condition for generalized logarithmic Sobolev inequality, and the other is the distributional dependence of the potential function within the gradient flow equation. In this paper, we establish a framework that utilizes a linearized potential function via Csisz\'{a}r type of Tsallis entropy, which is one of the generalized entropies. We also show that our new framework enable us to derive an exponential convergence result.
Related papers
- GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent [12.409030267572243]
We make a batch of neural network outputs satisfy bounded and general linear constraints.
This is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free.
arXiv Detail & Related papers (2024-09-26T03:12:53Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Optimization-Induced Graph Implicit Nonlinear Diffusion [64.39772634635273]
We propose a new kind of graph convolution variants, called Graph Implicit Diffusion (GIND)
GIND implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing.
We show that the learned representation can be formalized as the minimizer of an explicit convex optimization objective.
arXiv Detail & Related papers (2022-06-29T06:26:42Z) - Stochastic Langevin Differential Inclusions with Applications to Machine Learning [5.274477003588407]
We show some foundational results regarding the flow and properties of Langevin-type Differential Inclusions.
In particular, we show strong existence of the solution, as well as an canonical- minimization of the free-energy functional.
arXiv Detail & Related papers (2022-06-23T08:29:17Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Universal scaling laws in the gradient descent training of neural
networks [10.508187462682308]
We show that the learning trajectory can be characterized by an explicit bounds at large training times.
Our results are based on spectral analysis of the evolution of a large network trained on the expected loss.
arXiv Detail & Related papers (2021-05-02T16:46:38Z) - Towards a theory of machine learning [0.0]
We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function.
arXiv Detail & Related papers (2020-04-15T00:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.