Accelerating Toeplitz Neural Network with Constant-time Inference
Complexity
- URL: http://arxiv.org/abs/2311.08756v1
- Date: Wed, 15 Nov 2023 07:50:57 GMT
- Title: Accelerating Toeplitz Neural Network with Constant-time Inference
Complexity
- Authors: Zhen Qin, Yiran Zhong
- Abstract summary: Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks.
They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities.
In this paper, we aim to combine the strengths of TNNs and State Space Models (SSMs) by converting TNNs to SSMs during inference.
- Score: 21.88774274472737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in
various sequence modeling tasks. They outperform commonly used
Transformer-based models while benefiting from log-linear space-time
complexities. On the other hand, State Space Models (SSMs) achieve lower
performance than TNNs in language modeling but offer the advantage of constant
inference complexity. In this paper, we aim to combine the strengths of TNNs
and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to
achieve the same constant inference complexities as SSMs. To accomplish this,
we formulate the conversion process as an optimization problem and provide a
closed-form solution. We demonstrate how to transform the target equation into
a Vandermonde linear system problem, which can be efficiently solved using the
Discrete Fourier Transform (DFT). Notably, our method requires no training and
maintains numerical stability. It can be also applied to any LongConv-based
model. To assess its effectiveness, we conduct extensive experiments on
language modeling tasks across various settings. Additionally, we compare our
method to other gradient-descent solutions, highlighting the superior numerical
stability of our approach. The source code is available at
https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.
Related papers
- Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations [0.0]
Variational Autoencoders (VAEs) are a powerful framework for learning latent representations of reduced dimensionality.
Neural ODEs excel in learning transient system dynamics.
We show that standard Latent ODEs struggle with dimensionality reduction in systems with time-varying inputs.
arXiv Detail & Related papers (2024-10-14T05:45:52Z) - A domain decomposition-based autoregressive deep learning model for unsteady and nonlinear partial differential equations [2.7755345520127936]
We propose a domain-decomposition-based deep learning (DL) framework, named CoMLSim, for accurately modeling unsteady and nonlinear partial differential equations (PDEs)
The framework consists of two key components: (a) a convolutional neural network (CNN)-based autoencoder architecture and (b) an autoregressive model composed of fully connected layers.
arXiv Detail & Related papers (2024-08-26T17:50:47Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Learning Long Sequences in Spiking Neural Networks [0.0]
Spiking neural networks (SNNs) take inspiration from the brain to enable energy-efficient computations.
Recent interest in efficient alternatives to Transformers has given rise to state-of-the-art recurrent architectures named state space models (SSMs)
arXiv Detail & Related papers (2023-12-14T13:30:27Z) - On Tuning Neural ODE for Stability, Consistency and Faster Convergence [0.0]
We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions.
We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode.
arXiv Detail & Related papers (2023-12-04T06:18:10Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - DiffRNN: Differential Verification of Recurrent Neural Networks [3.4423518864863154]
Recurrent neural networks (RNNs) have become popular in a variety of applications such as image processing, data classification, speech recognition, and as controllers in autonomous systems.
We propose DIFFRNN, the first differential verification method for RNNs to certify the equivalence of two structurally similar neural networks.
We demonstrate the practical efficacy of our technique on a variety of benchmarks and show that DIFFRNN outperforms state-of-the-art verification tools such as POPQORN.
arXiv Detail & Related papers (2020-07-20T14:14:35Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.