Optimizing Neural Networks via Koopman Operator Theory
- URL: http://arxiv.org/abs/2006.02361v3
- Date: Thu, 22 Oct 2020 03:48:46 GMT
- Title: Optimizing Neural Networks via Koopman Operator Theory
- Authors: Akshunna S. Dogra, William T Redman
- Abstract summary: Koopman operator theory was recently shown to be intimately connected with neural network theory.
In this work we take the first steps in making use of this connection.
We show that Koopman operator theory methods allow predictions of weights and biases of feed weights over a non-trivial range of training time.
- Score: 6.09170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Koopman operator theory, a powerful framework for discovering the underlying
dynamics of nonlinear dynamical systems, was recently shown to be intimately
connected with neural network training. In this work, we take the first steps
in making use of this connection. As Koopman operator theory is a linear
theory, a successful implementation of it in evolving network weights and
biases offers the promise of accelerated training, especially in the context of
deep networks, where optimization is inherently a non-convex problem. We show
that Koopman operator theoretic methods allow for accurate predictions of
weights and biases of feedforward, fully connected deep networks over a
non-trivial range of training time. During this window, we find that our
approach is >10x faster than various gradient descent based methods (e.g. Adam,
Adadelta, Adagrad), in line with our complexity analysis. We end by
highlighting open questions in this exciting intersection between dynamical
systems and neural network theory. We highlight additional methods by which our
results could be expanded to broader classes of networks and larger training
intervals, which shall be the focus of future work.
Related papers
- Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms.
We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z) - Operator Learning Meets Numerical Analysis: Improving Neural Networks
through Iterative Methods [2.226971382808806]
We develop a theoretical framework grounded in iterative methods for operator equations.
We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning.
Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis.
arXiv Detail & Related papers (2023-10-02T20:25:36Z) - Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman
Message Passing [0.0]
We present a novel approach based on Koopman operator theory and message passing networks.
We find a linear representation for the dynamical system which is globally valid at any time step.
The linearisations found by our method produce predictions on a suite of network dynamics problems that are several orders of magnitude better than current state-of-the-art techniques.
arXiv Detail & Related papers (2023-05-15T23:00:25Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.