Exploration of Adapter for Noise Robust Automatic Speech Recognition
- URL: http://arxiv.org/abs/2402.18275v3
- Date: Tue, 4 Jun 2024 08:20:02 GMT
- Title: Exploration of Adapter for Noise Robust Automatic Speech Recognition
- Authors: Hao Shi, Tatsuya Kawahara,
- Abstract summary: Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial.
Integrating adapters into neural networks has emerged as a potent technique for transfer learning.
This study thoroughly investigates adapter-based ASR adaptation in noisy environments.
- Score: 25.997785562014883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial. Integrating adapters into neural networks has emerged as a potent technique for transfer learning. This study thoroughly investigates adapter-based ASR adaptation in noisy environments. We conducted experiments using the CHiME--4 dataset. The results show that inserting the adapter in the shallow layer yields superior effectiveness, and there is no significant difference between adapting solely within the shallow layer and adapting across all layers. The simulated data helps the system to improve its performance under real noise conditions. Nonetheless, when the amount of data is the same, the real data is more effective than the simulated data. Multi-condition training is still useful for adapter training. Furthermore, integrating adapters into speech enhancement-based ASR systems yields substantial improvements.
Related papers
- ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks [10.852047082856487]
We introduce ELP-adapter tuning, a novel method for parameter-efficient fine-tuning using three types of adapters.
E-adapters are integrated into transformer-based encoder layers and help to learn fine-grained speech representations that are effective for speech recognition.
L-adapters create paths from each encoder layer to the downstream head and help to extract non-linguistic features from lower encoder layers.
The P-adapter appends pseudo features to CNN features to further improve effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-28T05:26:03Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - Parameter-Efficient Sparse Retrievers and Rerankers using Adapters [4.9545244468634655]
We study adapters for SPLADE, a sparse retriever, for which adapters retain the efficiency and effectiveness otherwise achieved by finetuning.
We also address domain adaptation of neural retrieval thanks to adapters on cross-domain BEIR datasets and TripClick.
arXiv Detail & Related papers (2023-03-23T12:34:30Z) - Unsupervised Noise adaptation using Data Simulation [21.866522173387715]
We propose a generative adversarial network based method to efficiently learn a converse clean-to-noisy transformation.
Experimental results show that our method effectively mitigates the domain mismatch between training and test sets.
arXiv Detail & Related papers (2023-02-23T12:57:20Z) - CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z) - Efficient acoustic feature transformation in mismatched environments
using a Guided-GAN [1.495380389108477]
We propose a new framework to improve automatic speech recognition systems in resource-scarce environments.
We use a generative adversarial network (GAN) operating on acoustic input features to enhance the features of mismatched data.
With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER)
arXiv Detail & Related papers (2022-10-03T05:33:28Z) - Adaptable Adapters [74.65986170056945]
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters.
Adaptable adapters contain different activation functions for different layers and different input data.
We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
arXiv Detail & Related papers (2022-05-03T14:59:27Z) - A Likelihood Ratio based Domain Adaptation Method for E2E Models [10.510472957585646]
End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants.
While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem.
In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities.
arXiv Detail & Related papers (2022-01-10T21:22:39Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Exploiting Adapters for Cross-lingual Low-resource Speech Recognition [52.40623653290499]
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language.
We propose adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech adaptation.
arXiv Detail & Related papers (2021-05-18T08:30:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.