Neural Network Training Techniques Regularize Optimization Trajectory:
An Empirical Study
- URL: http://arxiv.org/abs/2011.06702v1
- Date: Fri, 13 Nov 2020 00:26:43 GMT
- Title: Neural Network Training Techniques Regularize Optimization Trajectory:
An Empirical Study
- Authors: Cheng Chen, Junjie Yang, Yi Zhou
- Abstract summary: Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc.
We show that successful DNNs consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction.
Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory.
- Score: 17.9739959287894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep neural network (DNN) trainings utilize various training
techniques, e.g., nonlinear activation functions, batch normalization,
skip-connections, etc. Despite their effectiveness, it is still mysterious how
they help accelerate DNN trainings in practice. In this paper, we provide an
empirical study of the regularization effect of these training techniques on
DNN optimization. Specifically, we find that the optimization trajectories of
successful DNN trainings consistently obey a certain regularity principle that
regularizes the model update direction to be aligned with the trajectory
direction. Theoretically, we show that such a regularity principle leads to a
convergence guarantee in nonconvex optimization and the convergence rate
depends on a regularization parameter. Empirically, we find that DNN trainings
that apply the training techniques achieve a fast convergence and obey the
regularity principle with a large regularization parameter, implying that the
model updates are well aligned with the trajectory. On the other hand, DNN
trainings without the training techniques have slow convergence and obey the
regularity principle with a small regularization parameter, implying that the
model updates are not well aligned with the trajectory. Therefore, different
training techniques regularize the model update direction via the regularity
principle to facilitate the convergence.
Related papers
- Alternate Training of Shared and Task-Specific Parameters for Multi-Task
Neural Networks [49.1574468325115]
This paper introduces novel alternate training procedures for hard- parameter sharing Multi-Task Neural Networks (MTNNs)
The proposed alternate training method updates shared and task-specific weights alternately, exploiting the multi-head architecture of the model.
Empirical experiments demonstrate delayed overfitting, improved prediction, and reduced computational demands.
arXiv Detail & Related papers (2023-12-26T21:33:03Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Learning Neural Constitutive Laws From Motion Observations for
Generalizable PDE Dynamics [97.38308257547186]
Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and material models.
We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned.
We introduce a new framework termed "Neural Constitutive Laws" (NCLaw) which utilizes a network architecture that strictly guarantees standard priors.
arXiv Detail & Related papers (2023-04-27T17:42:24Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Convolutional Dictionary Learning by End-To-End Training of Iterative
Neural Networks [3.6280929178575994]
In this work, we construct an INN which can be used as a supervised and physics-informed online convolutional dictionary learning algorithm.
We show that the proposed INN improves over two conventional model-agnostic training methods and yields competitive results also compared to a deep INN.
arXiv Detail & Related papers (2022-06-09T12:15:38Z) - TO-FLOW: Efficient Continuous Normalizing Flows with Temporal
Optimization adjoint with Moving Speed [12.168241245313164]
Continuous normalizing flows (CNFs) construct invertible mappings between an arbitrary complex distribution and an isotropic Gaussian distribution.
It has not been tractable on large datasets due to the incremental complexity of the neural ODE training.
In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training.
arXiv Detail & Related papers (2022-03-19T14:56:41Z) - On feedforward control using physics-guided neural networks: Training
cost regularization and optimized initialization [0.0]
Performance of model-based feedforward controllers is typically limited by the accuracy of the inverse system dynamics model.
This paper proposes a regularization method via identified physical parameters.
It is validated on a real-life industrial linear motor, where it delivers better tracking accuracy and extrapolation.
arXiv Detail & Related papers (2022-01-28T12:51:25Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - On Connections between Regularizations for Improving DNN Robustness [67.28077776415724]
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs)
We study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional.
arXiv Detail & Related papers (2020-07-04T23:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.