A Differential Game Theoretic Neural Optimizer for Training Residual
Networks
- URL: http://arxiv.org/abs/2007.08880v1
- Date: Fri, 17 Jul 2020 10:19:17 GMT
- Title: A Differential Game Theoretic Neural Optimizer for Training Residual
Networks
- Authors: Guan-Horng Liu, Tianrong Chen and Evangelos A. Theodorou
- Abstract summary: We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
- Score: 29.82841891919951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Connections between Deep Neural Networks (DNNs) training and optimal control
theory has attracted considerable attention as a principled tool of algorithmic
design. Differential Dynamic Programming (DDP) neural optimizer is a recently
proposed method along this line. Despite its empirical success, the
applicability has been limited to feedforward networks and whether such a
trajectory-optimization inspired framework can be extended to modern
architectures remains unclear. In this work, we derive a generalized DDP
optimizer that accepts both residual connections and convolution layers. The
resulting optimal control representation admits a game theoretic perspective,
in which training residual networks can be interpreted as cooperative
trajectory optimization on state-augmented dynamical systems. This Game
Theoretic DDP (GT-DDP) optimizer enjoys the same theoretic connection in
previous work, yet generates a much complex update rule that better leverages
available information during network propagation. Evaluation on image
classification datasets (e.g. MNIST and CIFAR100) shows an improvement in
training convergence and variance reduction over existing methods. Our approach
highlights the benefit gained from architecture-aware optimization.
Related papers
- Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters.
We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z) - Layer Collaboration in the Forward-Forward Algorithm [28.856139738073626]
We study layer collaboration in the forward-forward algorithm.
We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network.
We propose an improved version that supports layer collaboration to better utilize the network structure.
arXiv Detail & Related papers (2023-05-21T08:12:54Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Dynamic Game Theoretic Neural Optimizer [10.612273480358692]
We propose a novel dynamic game perspective by viewing each layer as a player in a dynamic game characterized by the DNN itself.
Our work marries strengths from both OCT and game theory, paving ways to new algorithmic opportunities from robust optimal control and bandit-based optimization.
arXiv Detail & Related papers (2021-05-08T21:56:14Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - DDPNOpt: Differential Dynamic Programming Neural Optimizer [29.82841891919951]
We show that most widely-used algorithms for trainings can be linked to the Differential Dynamic Programming (DDP)
In this vein, we propose a new class of DDPOpt, for training feedforward and convolution networks.
arXiv Detail & Related papers (2020-02-20T15:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.