Related papers: Accelerating Toeplitz Neural Network with Constant-time Inference Complexity

Accelerating Toeplitz Neural Network with Constant-time Inference Complexity

URL: http://arxiv.org/abs/2311.08756v1
Date: Wed, 15 Nov 2023 07:50:57 GMT
Title: Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
Authors: Zhen Qin, Yiran Zhong
Abstract summary: Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. In this paper, we aim to combine the strengths of TNNs and State Space Models (SSMs) by converting TNNs to SSMs during inference.
Score: 21.88774274472737
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity. In this paper, we aim to combine the strengths of TNNs and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to achieve the same constant inference complexities as SSMs. To accomplish this, we formulate the conversion process as an optimization problem and provide a closed-form solution. We demonstrate how to transform the target equation into a Vandermonde linear system problem, which can be efficiently solved using the Discrete Fourier Transform (DFT). Notably, our method requires no training and maintains numerical stability. It can be also applied to any LongConv-based model. To assess its effectiveness, we conduct extensive experiments on language modeling tasks across various settings. Additionally, we compare our method to other gradient-descent solutions, highlighting the superior numerical stability of our approach. The source code is available at https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.

Related papers

Training Neural ODEs Using Fully Discretized Simultaneous Optimization [2.290491821371513]
Training Neural Ordinary Differential Equations (Neural ODEs) requires solving differential equations at each epoch, leading to high computational costs. In particular, we employ a collocation-based, fully discretized formulation and use IPOPT-a solver for large-scale nonlinear optimization. Our results show significant potential for (collocation-based) simultaneous Neural ODE training pipelines.
arXiv Detail & Related papers (2025-02-21T18:10:26Z)
Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations [0.0]
Variational Autoencoders (VAEs) are a powerful framework for learning compact latent representations. NeuralODEs excel in learning transient system dynamics. This work combines the strengths of both to create fast surrogate models with adjustable complexity.
arXiv Detail & Related papers (2024-10-14T05:45:52Z)
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
A domain decomposition-based autoregressive deep learning model for unsteady and nonlinear partial differential equations [2.7755345520127936]
We propose a domain-decomposition-based deep learning (DL) framework, named CoMLSim, for accurately modeling unsteady and nonlinear partial differential equations (PDEs) The framework consists of two key components: (a) a convolutional neural network (CNN)-based autoencoder architecture and (b) an autoregressive model composed of fully connected layers.
arXiv Detail & Related papers (2024-08-26T17:50:47Z)
An Attempt to Devise a Pairwise Ising-Type Maximum Entropy Model Integrated Cost Function for Optimizing SNN Deployment [0.0]
Spiking Neural Networks (SNNs) emulate the spiking behavior of biological neurons and are typically deployed on distributed-memory neuromorphic hardware. We model SNN dynamics using an Ising-type pairwise interaction framework, bridging microscopic neuron interactions with macroscopic network behavior. We evaluate our approach on two SNNs deployed on the sPyNNaker neuromorphic platform.
arXiv Detail & Related papers (2024-07-09T16:33:43Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Learning Long Sequences in Spiking Neural Networks [0.0]
Spiking neural networks (SNNs) take inspiration from the brain to enable energy-efficient computations. Recent interest in efficient alternatives to Transformers has given rise to state-of-the-art recurrent architectures named state space models (SSMs)
arXiv Detail & Related papers (2023-12-14T13:30:27Z)
On Tuning Neural ODE for Stability, Consistency and Faster Convergence [0.0]
We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions. We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode.
arXiv Detail & Related papers (2023-12-04T06:18:10Z)
RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF) The proposed SIDNN is compatible with a broad range of OPF schemes. It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z)
DiffRNN: Differential Verification of Recurrent Neural Networks [3.4423518864863154]
Recurrent neural networks (RNNs) have become popular in a variety of applications such as image processing, data classification, speech recognition, and as controllers in autonomous systems. We propose DIFFRNN, the first differential verification method for RNNs to certify the equivalence of two structurally similar neural networks. We demonstrate the practical efficacy of our technique on a variety of benchmarks and show that DIFFRNN outperforms state-of-the-art verification tools such as POPQORN.
arXiv Detail & Related papers (2020-07-20T14:14:35Z)
Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.