Non-Convex Optimization with Spectral Radius Regularization
- URL: http://arxiv.org/abs/2102.11210v1
- Date: Mon, 22 Feb 2021 17:39:05 GMT
- Title: Non-Convex Optimization with Spectral Radius Regularization
- Authors: Adam Sandler, Diego Klabjan and Yuan Luo
- Abstract summary: We develop a regularization method which finds flat minima during the training of deep neural networks and other machine learning models.
These minima generalize better than sharp minima, allowing models to better generalize to real word test data.
- Score: 17.629499015699704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a regularization method which finds flat minima during the
training of deep neural networks and other machine learning models. These
minima generalize better than sharp minima, allowing models to better
generalize to real word test data, which may be distributed differently from
the training data. Specifically, we propose a method of regularized
optimization to reduce the spectral radius of the Hessian of the loss function.
Additionally, we derive algorithms to efficiently perform this optimization on
neural networks and prove convergence results for these algorithms.
Furthermore, we demonstrate that our algorithm works effectively on multiple
real world applications in multiple domains including healthcare. In order to
show our models generalize well, we introduce different methods of testing
generalizability.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Variational Sparse Coding with Learned Thresholding [6.737133300781134]
We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples.
We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
arXiv Detail & Related papers (2022-05-07T14:49:50Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Neural Model-based Optimization with Right-Censored Observations [42.530925002607376]
Neural networks (NNs) have been demonstrated to work well at the core of model-based optimization procedures.
We show that our trained regression models achieve a better predictive quality than several baselines.
arXiv Detail & Related papers (2020-09-29T07:32:30Z) - Exponentially improved detection and correction of errors in
experimental systems using neural networks [0.0]
We introduce the use of two machine learning algorithms to create an empirical model of an experimental apparatus.
This is able to reduce the number of measurements necessary for generic optimisation tasks exponentially.
We demonstrate both algorithms at the example of detecting and compensating stray electric fields in an ion trap.
arXiv Detail & Related papers (2020-05-18T22:42:11Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.