Related papers: SPMamba: State-space model is all you need in speech separation

SPMamba: State-space model is all you need in speech separation

URL: http://arxiv.org/abs/2404.02063v1
Date: Tue, 2 Apr 2024 16:04:31 GMT
Title: SPMamba: State-space model is all you need in speech separation
Authors: Kai Li, Guo Chen,
Abstract summary: We propose a network architecture for speech separation using a state-space model. We adopt the TF-GridNet model as the foundational framework and substitute its Transformer component with a bidirectional Mamba module. Our experimental results reveal an important role in the performance aspects of Mamba-based models.
Score: 6.590157910988076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high computational complexity. Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements. In this paper, we propose a network architecture for speech separation using a state-space model, namely SPMamba. We adopt the TF-GridNet model as the foundational framework and substitute its Transformer component with a bidirectional Mamba module, aiming to capture a broader range of contextual information. Our experimental results reveal an important role in the performance aspects of Mamba-based models. SPMamba demonstrates superior performance with a significant advantage over existing separation models in a dataset built on Librispeech. Notably, SPMamba achieves a substantial improvement in separation quality, with a 2.42 dB enhancement in SI-SNRi compared to the TF-GridNet. The source code for SPMamba is publicly accessible at https://github.com/JusperLee/SPMamba .

Related papers

Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z)
An Exploration of Mamba for Speech Self-Supervised Models [48.01992287080999]
We explore Mamba-based HuBERT models as alternatives to Transformer-based SSL architectures.<n>HuBERT models enable fine-tuning on long-context ASR with significantly lower compute.<n>These findings highlight Mamba-based SSL as a promising and complementary direction for long-sequence modeling, real-time speech modeling, and speech unit extraction.
arXiv Detail & Related papers (2025-06-14T19:00:44Z)
STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems. Most existing methods based on CNNs and transformers still suffer from substantial computational burdens. We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
DiffImp: Efficient Diffusion Model for Probabilistic Time Series Imputation with Bidirectional Mamba Backbone [6.428451261614519]
Current DDPM-based probabilistic time series imputation methodologies are confronted with two types of challenges. We integrate the computational efficient state space model, namely Mamba, as the backbone denosing module for DDPMs. Our approach can achieve state-of-the-art time series imputation results on multiple datasets, different missing scenarios and missing ratios.
arXiv Detail & Related papers (2024-10-17T08:48:52Z)
PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks. PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity. Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution. Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z)
FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting [6.152779144421304]
We introduce a novel framework named FMamba for multivariate time-series forecasting (MTSF) Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module. We use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block) Finally, FMamba obtains the predictive results through the projector, a linear layer.
arXiv Detail & Related papers (2024-07-20T09:14:05Z)
MambaForGCN: Enhancing Long-Range Dependency with State Space Model and Kolmogorov-Arnold Networks for Aspect-Based Sentiment Analysis [0.6885635732944716]
We present a novel approach to enhance long-range dependencies between aspect and opinion words in ABSA (MambaForGCN) Experimental results on three benchmark datasets demonstrate MambaForGCN's effectiveness, outperforming state-of-the-art (SOTA) baseline models.
arXiv Detail & Related papers (2024-07-14T22:23:07Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba. We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
Decision Mamba is a novel multi-grained state space model with a self-evolving policy learning strategy. To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations.
arXiv Detail & Related papers (2024-06-08T10:12:00Z)
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection [5.37935922811333]
MambaMixer is a new architecture with data-dependent weights that uses a dual selection mechanism across tokens and channels. As a proof of concept, we design Vision MambaMixer (ViM2) and Time Series MambaMixer (TSM2) architectures based on the MambaMixer block.
arXiv Detail & Related papers (2024-03-29T00:05:13Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.