Related papers: Better Training using Weight-Constrained Stochastic Dynamics

Better Training using Weight-Constrained Stochastic Dynamics

URL: http://arxiv.org/abs/2106.10704v1
Date: Sun, 20 Jun 2021 14:41:06 GMT
Title: Better Training using Weight-Constrained Stochastic Dynamics
Authors: Benedict Leimkuhler, Tiffany Vlaar, Timoth\'ee Pouchon and Amos Storkey
Abstract summary: We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding problem. We provide a general approach to efficiently incorporate constraints into a gradient Langevin framework.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimization schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.

Related papers

Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z)
Neural Fields with Hard Constraints of Arbitrary Differential Order [61.49418682745144]
We develop a series of approaches for enforcing hard constraints on neural fields. The constraints can be specified as a linear operator applied to the neural field and its derivatives. Our approaches are demonstrated in a wide range of real-world applications.
arXiv Detail & Related papers (2023-06-15T08:33:52Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Training multi-objective/multi-task collocation physics-informed neural network with student/teachers transfer learnings [0.0]
This paper presents a PINN training framework that employs pre-training steps and a net-to-net knowledge transfer algorithm. A multi-objective optimization algorithm may improve the performance of a physical-informed neural network with competing constraints.
arXiv Detail & Related papers (2021-07-24T00:43:17Z)
Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism [1.6114012813668932]
Physics-Informed Neural Networks (PINNs) have emerged as a promising application of deep neural networks to the numerical solution of nonlinear partial differential equations (PDEs) We propose a fundamentally new way to train PINNs adaptively, where the adaptation weights are fully trainable and applied to each training point individually. In numerical experiments with several linear and nonlinear benchmark problems, the SA-PINN outperformed other state-of-the-art PINN algorithm in L2 error.
arXiv Detail & Related papers (2020-09-07T04:07:52Z)
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions. We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
Constraint-Based Regularization of Neural Networks [0.0]
We propose a method for efficiently incorporating constraints into a gradient Langevin framework for the training of deep neural networks. Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks.
arXiv Detail & Related papers (2020-06-17T19:28:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.