A Neural State-Space Model Approach to Efficient Speech Separation
- URL: http://arxiv.org/abs/2305.16932v1
- Date: Fri, 26 May 2023 13:47:11 GMT
- Title: A Neural State-Space Model Approach to Efficient Speech Separation
- Authors: Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng
Siong Chng
- Abstract summary: We introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM)
To extend the SSM technique into speech separation tasks, we first decompose the input mixture into multi-scale representations with different resolutions.
Experiments show that S4M performs comparably to other separation backbones in terms of SI-SDRi.
Our S4M-tiny model (1.8M parameters) even surpasses attention-based Sepformer (26.0M parameters) in noisy conditions with only 9.2 of multiply-accumulate operation (MACs)
- Score: 34.38911304755453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we introduce S4M, a new efficient speech separation framework
based on neural state-space models (SSM). Motivated by linear time-invariant
systems for sequence modeling, our SSM-based approach can efficiently model
input signals into a format of linear ordinary differential equations (ODEs)
for representation learning. To extend the SSM technique into speech separation
tasks, we first decompose the input mixture into multi-scale representations
with different resolutions. This mechanism enables S4M to learn globally
coherent separation and reconstruction. The experimental results show that S4M
performs comparably to other separation backbones in terms of SI-SDRi, while
having a much lower model complexity with significantly fewer trainable
parameters. In addition, our S4M-tiny model (1.8M parameters) even surpasses
attention-based Sepformer (26.0M parameters) in noisy conditions with only 9.2
of multiply-accumulate operation (MACs).
Related papers
- GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
We develop a class-sensitive selective scan mechanism to guide dynamic adaptation.
Experiments on miniImageNet, CUB-200, and CIFAR-100 demonstrate that our framework outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - There is HOPE to Avoid HiPPOs for Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilizes parameters within Hankel operators.
Our model efficiently implements these innovations by nonuniformly sampling the transfer functions of LTI systems.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model [0.0]
A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences.
We propose exponential smoothing (ETS) to augment and reduce inductive bias.
Our models achieve comparable results to S4 on the LRA benchmark.
arXiv Detail & Related papers (2024-03-26T07:23:46Z) - Augmenting conformers with structured state-space sequence models for
online speech recognition [41.444671189679994]
Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems.
In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4)
We performed systematic ablation studies to compare variants of S4 models and propose two novel approaches that combine them with convolutions.
Our best model achieves WERs of 4.01%/8.53% on test sets from Librispeech, outperforming Conformers with extensively tuned convolution.
arXiv Detail & Related papers (2023-09-15T17:14:17Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks.
We revisit the idea that closely following the HiPPO framework is necessary for high performance.
We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM.
S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z) - On the Parameterization and Initialization of Diagonal State Space
Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models.
We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z) - Quaternion Factorization Machines: A Lightweight Solution to Intricate
Feature Interaction Modelling [76.89779231460193]
factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering.
We propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM) for sparse predictive analytics.
arXiv Detail & Related papers (2021-04-05T00:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.