A Neural State-Space Model Approach to Efficient Speech Separation
- URL: http://arxiv.org/abs/2305.16932v1
- Date: Fri, 26 May 2023 13:47:11 GMT
- Title: A Neural State-Space Model Approach to Efficient Speech Separation
- Authors: Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng
Siong Chng
- Abstract summary: We introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM)
To extend the SSM technique into speech separation tasks, we first decompose the input mixture into multi-scale representations with different resolutions.
Experiments show that S4M performs comparably to other separation backbones in terms of SI-SDRi.
Our S4M-tiny model (1.8M parameters) even surpasses attention-based Sepformer (26.0M parameters) in noisy conditions with only 9.2 of multiply-accumulate operation (MACs)
- Score: 34.38911304755453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we introduce S4M, a new efficient speech separation framework
based on neural state-space models (SSM). Motivated by linear time-invariant
systems for sequence modeling, our SSM-based approach can efficiently model
input signals into a format of linear ordinary differential equations (ODEs)
for representation learning. To extend the SSM technique into speech separation
tasks, we first decompose the input mixture into multi-scale representations
with different resolutions. This mechanism enables S4M to learn globally
coherent separation and reconstruction. The experimental results show that S4M
performs comparably to other separation backbones in terms of SI-SDRi, while
having a much lower model complexity with significantly fewer trainable
parameters. In addition, our S4M-tiny model (1.8M parameters) even surpasses
attention-based Sepformer (26.0M parameters) in noisy conditions with only 9.2
of multiply-accumulate operation (MACs).
Related papers
- Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data.
Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states.
This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z) - Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model [0.0]
A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences.
We propose exponential smoothing (ETS) to augment and reduce inductive bias.
Our models achieve comparable results to S4 on the LRA benchmark.
arXiv Detail & Related papers (2024-03-26T07:23:46Z) - EfficientState Space Model viaFast Tensor Convolutionand Block Diagonalization [5.260841516691153]
We propose a new state space layer based on multiple-input multiple-output SSM, called efficient SSM.
Our eSSM is built on the convolutional representation of multi-input and multi-input (MIMO) SSM.
In the model efficiency benchmark, the parameters of eSSM are only 12.89% of LSTM and 13.24% of Mamba.
arXiv Detail & Related papers (2024-02-23T12:36:31Z) - Augmenting conformers with structured state-space sequence models for
online speech recognition [41.444671189679994]
Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems.
In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4)
We performed systematic ablation studies to compare variants of S4 models and propose two novel approaches that combine them with convolutions.
Our best model achieves WERs of 4.01%/8.53% on test sets from Librispeech, outperforming Conformers with extensively tuned convolution.
arXiv Detail & Related papers (2023-09-15T17:14:17Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks.
We revisit the idea that closely following the HiPPO framework is necessary for high performance.
We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM.
S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.