Non-Convex Optimization with Spectral Radius Regularization
- URL: http://arxiv.org/abs/2102.11210v1
- Date: Mon, 22 Feb 2021 17:39:05 GMT
- Title: Non-Convex Optimization with Spectral Radius Regularization
- Authors: Adam Sandler, Diego Klabjan and Yuan Luo
- Abstract summary: We develop a regularization method which finds flat minima during the training of deep neural networks and other machine learning models.
These minima generalize better than sharp minima, allowing models to better generalize to real word test data.
- Score: 17.629499015699704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a regularization method which finds flat minima during the
training of deep neural networks and other machine learning models. These
minima generalize better than sharp minima, allowing models to better
generalize to real word test data, which may be distributed differently from
the training data. Specifically, we propose a method of regularized
optimization to reduce the spectral radius of the Hessian of the loss function.
Additionally, we derive algorithms to efficiently perform this optimization on
neural networks and prove convergence results for these algorithms.
Furthermore, we demonstrate that our algorithm works effectively on multiple
real world applications in multiple domains including healthcare. In order to
show our models generalize well, we introduce different methods of testing
generalizability.
Related papers
- Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z) - Efficient compression of neural networks and datasets [0.0]
We compare, improve, and contribute methods that substantially decrease the number of parameters of neural networks.<n>When applying our methods to minimize description length, we obtain very effective data compression algorithms.<n>We empirically verify the prediction that regularized models can exhibit more sample-efficient convergence.
arXiv Detail & Related papers (2025-05-23T04:50:33Z) - Tensor Network Estimation of Distribution Algorithms [0.0]
Methods integrating tensor networks into evolutionary optimization algorithms have appeared in the recent literature.<n>We find that optimization performance of these methods is not related to the power of the generative model in a straightforward way.<n>In light of this we find that adding an explicit mutation operator to the output of the generative model often improves optimization performance.
arXiv Detail & Related papers (2024-12-27T18:22:47Z) - Diffusion Models as Network Optimizers: Explorations and Analysis [71.69869025878856]
generative diffusion models (GDMs) have emerged as a promising new approach to network optimization.<n>In this study, we first explore the intrinsic characteristics of generative models.<n>We provide a concise theoretical and intuitive demonstration of the advantages of generative models over discriminative network optimization.
arXiv Detail & Related papers (2024-11-01T09:05:47Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Variational Sparse Coding with Learned Thresholding [6.737133300781134]
We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples.
We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
arXiv Detail & Related papers (2022-05-07T14:49:50Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models [21.56208997475512]
We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs)
We prove to be near-optimal by bounding its excess Bayes risk.
We empirically validate its advantages on standard benchmark datasets.
arXiv Detail & Related papers (2021-06-06T09:45:37Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Neural Model-based Optimization with Right-Censored Observations [42.530925002607376]
Neural networks (NNs) have been demonstrated to work well at the core of model-based optimization procedures.
We show that our trained regression models achieve a better predictive quality than several baselines.
arXiv Detail & Related papers (2020-09-29T07:32:30Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Exponentially improved detection and correction of errors in
experimental systems using neural networks [0.0]
We introduce the use of two machine learning algorithms to create an empirical model of an experimental apparatus.
This is able to reduce the number of measurements necessary for generic optimisation tasks exponentially.
We demonstrate both algorithms at the example of detecting and compensating stray electric fields in an ion trap.
arXiv Detail & Related papers (2020-05-18T22:42:11Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.