A stochastic optimization approach to train non-linear neural networks
with a higher-order variation regularization
- URL: http://arxiv.org/abs/2308.02293v2
- Date: Mon, 14 Aug 2023 05:06:09 GMT
- Title: A stochastic optimization approach to train non-linear neural networks
with a higher-order variation regularization
- Authors: Akifumi Okuno
- Abstract summary: This study considers a $(k,q)$th order variation regularization ($(k,q)$-VR)
$(k,q)$-VR is defined as the $q$th-powered integral of the absolute $k$th order derivative of the parametric models to be trained.
Our numerical experiments demonstrate that the neural networks trained with the $(k,q)$-VR terms are more resilient'' than those with the conventional parameter regularization.
- Score: 3.0277213703725767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While highly expressive parametric models including deep neural networks have
an advantage to model complicated concepts, training such highly non-linear
models is known to yield a high risk of notorious overfitting. To address this
issue, this study considers a $(k,q)$th order variation regularization
($(k,q)$-VR), which is defined as the $q$th-powered integral of the absolute
$k$th order derivative of the parametric models to be trained; penalizing the
$(k,q)$-VR is expected to yield a smoother function, which is expected to avoid
overfitting. Particularly, $(k,q)$-VR encompasses the conventional
(general-order) total variation with $q=1$. While the $(k,q)$-VR terms applied
to general parametric models are computationally intractable due to the
integration, this study provides a stochastic optimization algorithm, that can
efficiently train general models with the $(k,q)$-VR without conducting
explicit numerical integration. The proposed approach can be applied to the
training of even deep neural networks whose structure is arbitrary, as it can
be implemented by only a simple stochastic gradient descent algorithm and
automatic differentiation. Our numerical experiments demonstrate that the
neural networks trained with the $(k,q)$-VR terms are more ``resilient'' than
those with the conventional parameter regularization. The proposed algorithm
also can be extended to the physics-informed training of neural networks
(PINNs).
Related papers
- Pruning is Optimal for Learning Sparse Features in High-Dimensions [15.967123173054535]
We show that a class of statistical models can be optimally learned using pruned neural networks trained with gradient descent.
We show that pruning neural networks proportional to the sparsity level of $boldsymbolV$ improves their sample complexity compared to unpruned networks.
arXiv Detail & Related papers (2024-06-12T21:43:12Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Fast variable selection makes scalable Gaussian process BSS-ANOVA a
speedy and accurate choice for tabular and time series regression [0.0]
Gaussian processes (GPs) are non-parametric regression engines with a long history.
One of a number of scalable GP approaches is the Karhunen-Lo'eve (KL) decomposed kernel BSS-ANOVA, developed in 2009.
A new method of forward variable selection, quickly and effectively limits the number of terms, yielding a method with competitive accuracies.
arXiv Detail & Related papers (2022-05-26T23:41:43Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - An efficient projection neural network for $\ell_1$-regularized logistic
regression [10.517079029721257]
This paper presents a simple projection neural network for $ell_$-regularized logistics regression.
The proposed neural network does not require any extra auxiliary variable nor any smooth approximation.
We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value.
arXiv Detail & Related papers (2021-05-12T06:13:44Z) - Large-time asymptotics in deep learning [0.0]
We consider the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training.
For the classical $L2$--regularized empirical risk minimization problem, we show that the training error is at most of the order $mathcalOleft(frac1Tright)$.
In the setting of $ellp$--distance losses, we prove that both the training error and the optimal parameters are at most of the order $mathcalOleft(e-mu
arXiv Detail & Related papers (2020-08-06T07:33:17Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.