Related papers: Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study

Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study

URL: http://arxiv.org/abs/2011.06702v1
Date: Fri, 13 Nov 2020 00:26:43 GMT
Title: Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study
Authors: Cheng Chen, Junjie Yang, Yi Zhou
Abstract summary: Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc. We show that successful DNNs consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction. Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory.
Score: 17.9739959287894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc. Despite their effectiveness, it is still mysterious how they help accelerate DNN trainings in practice. In this paper, we provide an empirical study of the regularization effect of these training techniques on DNN optimization. Specifically, we find that the optimization trajectories of successful DNN trainings consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction. Theoretically, we show that such a regularity principle leads to a convergence guarantee in nonconvex optimization and the convergence rate depends on a regularization parameter. Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory. On the other hand, DNN trainings without the training techniques have slow convergence and obey the regularity principle with a small regularization parameter, implying that the model updates are not well aligned with the trajectory. Therefore, different training techniques regularize the model update direction via the regularity principle to facilitate the convergence.

Related papers

Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation [45.67180051148674]
Spiking Neural Network (SNN), as a brain-inspired and energy-efficient network, is facing the pivotal challenge of exploring a suitable learning framework. We propose a novel parallel conversion learning framework, which establishes a mathematical mapping relationship between each time-step of the parallel spiking neurons.
arXiv Detail & Related papers (2024-12-18T08:37:13Z)
Alternate Training of Shared and Task-Specific Parameters for Multi-Task Neural Networks [49.1574468325115]
This paper introduces novel alternate training procedures for hard- parameter sharing Multi-Task Neural Networks (MTNNs) The proposed alternate training method updates shared and task-specific weights alternately, exploiting the multi-head architecture of the model. Empirical experiments demonstrate delayed overfitting, improved prediction, and reduced computational demands.
arXiv Detail & Related papers (2023-12-26T21:33:03Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics [97.38308257547186]
Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and material models. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. We introduce a new framework termed "Neural Constitutive Laws" (NCLaw) which utilizes a network architecture that strictly guarantees standard priors.
arXiv Detail & Related papers (2023-04-27T17:42:24Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks [3.6280929178575994]
In this work, we construct an INN which can be used as a supervised and physics-informed online convolutional dictionary learning algorithm. We show that the proposed INN improves over two conventional model-agnostic training methods and yields competitive results also compared to a deep INN.
arXiv Detail & Related papers (2022-06-09T12:15:38Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving Speed [12.168241245313164]
Continuous normalizing flows (CNFs) construct invertible mappings between an arbitrary complex distribution and an isotropic Gaussian distribution. It has not been tractable on large datasets due to the incremental complexity of the neural ODE training. In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training.
arXiv Detail & Related papers (2022-03-19T14:56:41Z)
On feedforward control using physics-guided neural networks: Training cost regularization and optimized initialization [0.0]
Performance of model-based feedforward controllers is typically limited by the accuracy of the inverse system dynamics model. This paper proposes a regularization method via identified physical parameters. It is validated on a real-life industrial linear motor, where it delivers better tracking accuracy and extrapolation.
arXiv Detail & Related papers (2022-01-28T12:51:25Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
A Differential Game Theoretic Neural Optimizer for Training Residual Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers. The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z)
On Connections between Regularizations for Improving DNN Robustness [67.28077776415724]
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs) We study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional.
arXiv Detail & Related papers (2020-07-04T23:43:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.