Constraint-Based Regularization of Neural Networks
- URL: http://arxiv.org/abs/2006.10114v2
- Date: Sun, 20 Jun 2021 21:33:27 GMT
- Title: Constraint-Based Regularization of Neural Networks
- Authors: Benedict Leimkuhler, Timoth\'ee Pouchon, Tiffany Vlaar and Amos
Storkey
- Abstract summary: We propose a method for efficiently incorporating constraints into a gradient Langevin framework for the training of deep neural networks.
Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for efficiently incorporating constraints into a
stochastic gradient Langevin framework for the training of deep neural
networks. Constraints allow direct control of the parameter space of the model.
Appropriately designed, they reduce the vanishing/exploding gradient problem,
control weight magnitudes and stabilize deep neural networks and thus improve
the robustness of training algorithms and the generalization capabilities of
the trained neural network. We present examples of constrained training methods
motivated by orthogonality preservation for weight matrices and explicit weight
normalizations. We describe the methods in the overdamped formulation of
Langevin dynamics and the underdamped form, in which momenta help to improve
sampling efficiency. The methods are explored in test examples in image
classification and natural language processing.
Related papers
- Robust Training of Neural Networks at Arbitrary Precision and Sparsity [11.177990498697845]
The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation.
This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes.
We propose a novel, robust, and universal solution: a denoising affine transform.
arXiv Detail & Related papers (2024-09-14T00:57:32Z) - Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Neural Fields with Hard Constraints of Arbitrary Differential Order [61.49418682745144]
We develop a series of approaches for enforcing hard constraints on neural fields.
The constraints can be specified as a linear operator applied to the neural field and its derivatives.
Our approaches are demonstrated in a wide range of real-world applications.
arXiv Detail & Related papers (2023-06-15T08:33:52Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Langevin algorithms for Markovian Neural Networks and Deep Stochastic
control [0.0]
Gradient Descent Langevin Dynamics (SGLD) algorithms are known to improve the training of neural networks in some cases where the neural network is very deep.
We numerically show that Langevin algorithms improve the training on various control problems like hedging and resource management, and for different choices of gradient descent methods.
arXiv Detail & Related papers (2022-12-22T20:00:11Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Better Training using Weight-Constrained Stochastic Dynamics [0.0]
We employ constraints to control the parameter space of deep neural networks throughout training.
The use of customized, appropriately designed constraints can reduce the vanishing/exploding problem.
We provide a general approach to efficiently incorporate constraints into a gradient Langevin framework.
arXiv Detail & Related papers (2021-06-20T14:41:06Z) - Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks [77.34726150561087]
We introduce Gradient Markov Descent (SMGD), a discrete optimization method applicable to training quantized neural networks.
We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
arXiv Detail & Related papers (2020-08-25T15:48:15Z) - Volumization as a Natural Generalization of Weight Decay [25.076488081589403]
Inspired by physics, we define a physical volume for the weight parameters in neural networks.
We show that this method is an effective way of regularizing neural networks.
arXiv Detail & Related papers (2020-03-25T07:13:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.