State-Free Inference of State-Space Models: The Transfer Function Approach
- URL: http://arxiv.org/abs/2405.06147v2
- Date: Sun, 2 Jun 2024 02:48:05 GMT
- Title: State-Free Inference of State-Space Models: The Transfer Function Approach
- Authors: Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher RĂ©, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli,
- Abstract summary: State-free inference does not incur any significant memory or computational cost with an increase in state size.
We achieve this using properties of the proposed frequency domain transfer function parametrization.
We report improved perplexity in language modeling over a long convolutional Hyena baseline.
- Score: 132.83348321603205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF.
Related papers
- Generalized Dynamic Brain Functional Connectivity Based on Random Convolutions [15.620523540831021]
We propose a generalized approach to dynamics via a multi-dimensional random convolution (RandCon) DFC method.
RandCon with the smallest kernel size (3 time points) showed notable improvements in performance on simulated data.
Results from real fMRI data indicated that RandCon was more sensitive to gender differences than competing methods.
arXiv Detail & Related papers (2024-06-24T13:02:36Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Active Nearest Neighbor Regression Through Delaunay Refinement [79.93030583257597]
We introduce an algorithm for active function approximation based on nearest neighbor regression.
Our Active Nearest Neighbor Regressor (ANNR) relies on the Voronoi-Delaunay framework from computational geometry to subdivide the space into cells with constant estimated function value.
arXiv Detail & Related papers (2022-06-16T10:24:03Z) - Lightweight Convolutional Neural Networks By Hypercomplex
Parameterization [10.420215908252425]
We define the parameterization of hypercomplex convolutional layers to develop lightweight and efficient large-scale convolutional models.
Our method grasps the convolution rules and the filters organization directly from data.
We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets and audio datasets.
arXiv Detail & Related papers (2021-10-08T14:57:19Z) - Exploiting Multiple Timescales in Hierarchical Echo State Networks [0.0]
Echo state networks (ESNs) are a powerful form of reservoir computing that only require training of linear output weights.
Here we explore the timescales in hierarchical ESNs, where the reservoir is partitioned into two smaller reservoirs linked with distinct properties.
arXiv Detail & Related papers (2021-01-11T22:33:17Z) - Lightning-Fast Gravitational Wave Parameter Inference through Neural
Amortization [6.810835072367285]
Latest advances in neural simulation-based inference can speed up the inference time by up to three orders of magnitude.
We find that our model correctly estimates credible intervals for the parameters of simulated gravitational waves.
arXiv Detail & Related papers (2020-10-24T16:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.