Related papers: (Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

URL: http://arxiv.org/abs/2410.00169v1
Date: Mon, 30 Sep 2024 19:18:15 GMT
Title: (Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number
Authors: Rossen Nenov, Daniel Haider, Peter Balazs,
Abstract summary: We introduce a novel regularizer that is provably differentiable almost everywhere. We show the advantages of this approach for noisy classification and denoising of MNIST images.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing term into the optimization algorithm. However, due to its discontinuous nature and lack of differentiability the condition number is not suitable for a gradient descent approach. This paper introduces a novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers. In particular, we derive a formula for the gradient of this regularizer which can be easily implemented and integrated into existing optimization algorithms. We show the advantages of this approach for noisy classification and denoising of MNIST images.

Related papers

COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation [0.0]
contexts-aware low-rank approximation is a useful tool for compression and fine-tuning of modern large-scale neural networks.<n>Existing methods for neural networks suffer from numerical instabilities due to their reliance on classical formulas involving explicit Gram matrix computation and their subsequent inversion.<n>We propose a novel inversion-free regularized framework that is based entirely on stable decompositions and overcomes the numerical pitfalls of prior art.
arXiv Detail & Related papers (2025-07-10T09:35:22Z)
Regularized second-order optimization of tensor-network Born machines [2.8834278113855896]
Born machines (TNBMs) are quantum-inspired generative models for learning data distributions.<n>A key bottleneck of TNBMs is the logarithmic nature of the loss function commonly used for this problem.<n>We present an improved second-order optimization technique for TNBM training, which significantly enhances convergence rates and the quality of the optimized model.
arXiv Detail & Related papers (2025-01-30T19:00:04Z)
From exponential to finite/fixed-time stability: Applications to optimization [0.0]
Given an exponentially stable optimization algorithm, can it be modified to obtain a finite/fixed-time stable algorithm? We provide an affirmative answer, demonstrate how the solution can be computed on a finite-time interval via a simple scaling of the right-hand-side of the original dynamics. We certify the desired properties of the modified algorithm using the Lyapunov function that proves exponential stability of the original system.
arXiv Detail & Related papers (2024-09-18T05:43:22Z)
Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
Efficient Sampling for Data-Driven Frequency Stability Constraint via Forward-Mode Automatic Differentiation [5.603382086370097]
We propose a gradient-based data generation method via forward-mode automatic differentiation. In this method, the original dynamic system is augmented with new states that represent the dynamic of sensitivities of the original states. We demonstrate the superior performance of the proposed sampling algorithm, compared with the unrolling differentiation and finite difference.
arXiv Detail & Related papers (2024-07-21T03:50:11Z)
Hybrid algorithm simulating non-equilibrium steady states of an open quantum system [10.752869788647802]
Non-equilibrium steady states are a focal point of research in the study of open quantum systems. Previous variational algorithms for searching these steady states have suffered from resource-intensive implementations. We present a novel variational quantum algorithm that efficiently searches for non-equilibrium steady states by simulating the operator-sum form of the Lindblad equation.
arXiv Detail & Related papers (2023-09-13T01:57:27Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z)
Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models. This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization. We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z)
Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent Flows [4.817429789586127]
We introduce a Poly-based optimization framework for achieving acceleration, based on the notion of fixed-time stability dynamical systems. We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms.
arXiv Detail & Related papers (2021-12-02T16:04:40Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Convergence to Second-Order Stationarity for Non-negative Matrix Factorization: Provably and Concurrently [18.89597524771988]
Non-negative matrix factorization (NMF) is a fundamental non-modification optimization problem with numerous applications in Machine Learning. This paper defines a multiplicative weight update type dynamics (Seung algorithm) that runs concurrently and provably avoids saddle points. An important advantage is the use concurrent implementations in parallel computing environments.
arXiv Detail & Related papers (2020-02-26T06:40:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.