Related papers: An Investigation of Incorporating Mamba for Speech Enhancement

An Investigation of Incorporating Mamba for Speech Enhancement

URL: http://arxiv.org/abs/2405.06573v1
Date: Fri, 10 May 2024 16:18:49 GMT
Title: An Investigation of Incorporating Mamba for Speech Enhancement
Authors: Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao,
Abstract summary: We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. SEMamba demonstrates promising results and attains a PESQ score of 3.55 on the VoiceBank-DEMAND dataset.
Score: 45.610243349192096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the properties of Mamba by integrating it as the core model in both basic and advanced SE systems, along with utilizing signal-level distances as well as metric-oriented loss functions. SEMamba demonstrates promising results and attains a PESQ score of 3.55 on the VoiceBank-DEMAND dataset. When combined with the perceptual contrast stretching technique, the proposed SEMamba yields a new state-of-the-art PESQ score of 3.69.

Related papers

Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity [56.0251572416922]
State Space Models (SSMs) have emerged as efficient alternatives to Transformers for sequential modeling. We propose a novel SSM architecture that introduces modality-aware sparsity through modality-specific parameterization of the Mamba block. We evaluate Mixture-of-Mamba across three multi-modal pretraining settings.
arXiv Detail & Related papers (2025-01-27T18:35:05Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z)
Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment [15.320011514412437]
We take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment. We propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks. Our proposed StylePrompt enables better perception transfer capability with less computational cost.
arXiv Detail & Related papers (2024-06-13T19:21:01Z)
Mamba-R: Vision Mamba ALSO Needs Registers [45.41648622999754]
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba. To mitigate this issue, we follow the prior solution of introducing register tokens into Vision Mamba.
arXiv Detail & Related papers (2024-05-23T17:58:43Z)
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model [12.399378490833818]
Self-Supervised Audio Mamba (SSAMBA) is the first self-supervised, attention-free, and SSM-based model for audio representation learning. Our results demonstrate that SSAMBA outperforms the Self-Supervised Audio Spectrogram Transformer (SSAST) in most tasks.
arXiv Detail & Related papers (2024-05-20T06:58:47Z)
MambaOut: Do We Really Need Mamba for Vision? [70.60495392198686]
Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism. This paper conceptually concludes that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. We construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM.
arXiv Detail & Related papers (2024-05-13T17:59:56Z)
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation [18.383760896304604]
This report introduces the first attempt to train a Mamba model utilizing contrastive technical-image pretraining (CLIP) A Mamba model 67 million parameters is on par with a 307 million- parameters Vision Transformer (ViT) model in zero-shot classification tasks.
arXiv Detail & Related papers (2024-04-30T09:40:07Z)
MambaByte: Token-free Selective State Space Model [71.90159903595514]
MambaByte is a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences. We show MambaByte to be competitive with, and even to outperform, state-of-the-art subword Transformers on language modeling tasks.
arXiv Detail & Related papers (2024-01-24T18:53:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.