Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach
- URL: http://arxiv.org/abs/2310.16647v1
- Date: Wed, 25 Oct 2023 13:55:35 GMT
- Title: Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach
- Authors: Diogo Lavado, Cl\'audia Soares and Alessandra Micheletti
- Abstract summary: Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
- Score: 49.1574468325115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Regularizing Deep Neural Networks (DNNs) is essential for improving
generalizability and preventing overfitting. Fixed penalty methods, though
common, lack adaptability and suffer from hyperparameter sensitivity. In this
paper, we propose a novel approach to DNN regularization by framing the
training process as a constrained optimization problem. Where the data fidelity
term is the minimization objective and the regularization terms serve as
constraints. Then, we employ the Stochastic Augmented Lagrangian (SAL) method
to achieve a more flexible and efficient regularization mechanism. Our approach
extends beyond black-box regularization, demonstrating significant improvements
in white-box models, where weights are often subject to hard constraints to
ensure interpretability. Experimental results on image-based classification on
MNIST, CIFAR10, and CIFAR100 datasets validate the effectiveness of our
approach. SAL consistently achieves higher Accuracy while also achieving better
constraint satisfaction, thus showcasing its potential for optimizing DNNs
under constrained settings.
Related papers
- A constrained optimization approach to improve robustness of neural networks [1.2338729811609355]
We present a novel nonlinear programming-based approach to fine-tune pre-trained neural networks to improve robustness against adversarial attacks while maintaining accuracy on clean data.
arXiv Detail & Related papers (2024-09-18T18:37:14Z) - HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks [14.139047596566485]
HERTA is a high-efficiency and rigorous training algorithm for Unfolded GNNs.
HERTA converges to the optimum of the original model, thus preserving the interpretability of Unfolded GNNs.
As a byproduct of HERTA, we propose a new spectral sparsification method applicable to normalized and regularized graph Laplacians.
arXiv Detail & Related papers (2024-03-26T23:03:06Z) - Enhancing Reliability of Neural Networks at the Edge: Inverted
Normalization with Stochastic Affine Transformations [0.22499166814992438]
We propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in in-memory computing architectures.
Empirical results show a graceful degradation in inference accuracy, with an improvement of up to $58.11%$.
arXiv Detail & Related papers (2024-01-23T00:27:31Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Better Training using Weight-Constrained Stochastic Dynamics [0.0]
We employ constraints to control the parameter space of deep neural networks throughout training.
The use of customized, appropriately designed constraints can reduce the vanishing/exploding problem.
We provide a general approach to efficiently incorporate constraints into a gradient Langevin framework.
arXiv Detail & Related papers (2021-06-20T14:41:06Z) - FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural
Network-Based Optimize [44.65622657676026]
We take constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics.
Because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable.
arXiv Detail & Related papers (2020-06-19T21:58:42Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.