Related papers: Representing Neural Network Layers as Linear Operations via Koopman Operator Theory

Representing Neural Network Layers as Linear Operations via Koopman Operator Theory

URL: http://arxiv.org/abs/2409.01308v1
Date: Mon, 2 Sep 2024 15:04:33 GMT
Title: Representing Neural Network Layers as Linear Operations via Koopman Operator Theory
Authors: Nishant Suresh Aswani, Saif Eddin Jabari, Muhammad Shafique,
Abstract summary: We show that a linear view of neural networks makes understanding and controlling networks much more approachable. We replace layers of an trained dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%. In addition, we replace layers in an trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.
Score: 9.558002301188091
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.

Related papers

Understanding Deep Neural Networks via Linear Separability of Hidden Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman Message Passing [0.0]
We present a novel approach based on Koopman operator theory and message passing networks. We find a linear representation for the dynamical system which is globally valid at any time step. The linearisations found by our method produce predictions on a suite of network dynamics problems that are several orders of magnitude better than current state-of-the-art techniques.
arXiv Detail & Related papers (2023-05-15T23:00:25Z)
ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling. We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Supervised Training of Siamese Spiking Neural Networks with Earth's Mover Distance [4.047840018793636]
This study adapts the highly-versatile siamese neural network model to the event data domain. We introduce a supervised training framework for optimizing Earth's Mover Distance between spike trains with spiking neural networks (SNN)
arXiv Detail & Related papers (2022-02-20T00:27:57Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
On the Application of Data-Driven Deep Neural Networks in Linear and Nonlinear Structural Dynamics [28.979990729816638]
The use of deep neural network (DNN) models as surrogates for linear and nonlinear structural dynamical systems is explored. The focus is on the development of efficient network architectures using fully-connected, sparsely-connected, and convolutional network layers. It is shown that the proposed DNNs can be used as effective and accurate surrogates for predicting linear and nonlinear dynamical responses under harmonic loadings.
arXiv Detail & Related papers (2021-11-03T13:22:19Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.