Dynamic Game Theoretic Neural Optimizer
- URL: http://arxiv.org/abs/2105.03788v1
- Date: Sat, 8 May 2021 21:56:14 GMT
- Title: Dynamic Game Theoretic Neural Optimizer
- Authors: Guan-Horng Liu, Tianrong Chen, and Evangelos A. Theodorou
- Abstract summary: We propose a novel dynamic game perspective by viewing each layer as a player in a dynamic game characterized by the DNN itself.
Our work marries strengths from both OCT and game theory, paving ways to new algorithmic opportunities from robust optimal control and bandit-based optimization.
- Score: 10.612273480358692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The connection between training deep neural networks (DNNs) and optimal
control theory (OCT) has attracted considerable attention as a principled tool
of algorithmic design. Despite few attempts being made, they have been limited
to architectures where the layer propagation resembles a Markovian dynamical
system. This casts doubts on their flexibility to modern networks that heavily
rely on non-Markovian dependencies between layers (e.g. skip connections in
residual networks). In this work, we propose a novel dynamic game perspective
by viewing each layer as a player in a dynamic game characterized by the DNN
itself. Through this lens, different classes of optimizers can be seen as
matching different types of Nash equilibria, depending on the implicit
information structure of each (p)layer. The resulting method, called Dynamic
Game Theoretic Neural Optimizer (DGNOpt), not only generalizes OCT-inspired
optimizers to richer network class; it also motivates a new training principle
by solving a multi-player cooperative game. DGNOpt shows convergence
improvements over existing methods on image classification datasets with
residual networks. Our work marries strengths from both OCT and game theory,
paving ways to new algorithmic opportunities from robust optimal control and
bandit-based optimization.
Related papers
- Layer Collaboration in the Forward-Forward Algorithm [28.856139738073626]
We study layer collaboration in the forward-forward algorithm.
We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network.
We propose an improved version that supports layer collaboration to better utilize the network structure.
arXiv Detail & Related papers (2023-05-21T08:12:54Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Can we learn gradients by Hamiltonian Neural Networks? [68.8204255655161]
We propose a meta-learner based on ODE neural networks that learns gradients.
We demonstrate that our method outperforms a meta-learner based on LSTM for an artificial task and the MNIST dataset with ReLU activations in the optimizee.
arXiv Detail & Related papers (2021-10-31T18:35:10Z) - Characterizing Learning Dynamics of Deep Neural Networks via Complex
Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems.
We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation.
Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With
Trainable Masked Layers [18.22501196339569]
We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure.
We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss.
arXiv Detail & Related papers (2020-05-14T11:05:21Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.