Related papers: Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning

Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning

URL: http://arxiv.org/abs/2506.21576v1
Date: Mon, 16 Jun 2025 05:14:51 GMT
Title: Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
Authors: Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li,
Abstract summary: Large-scale multilingual ASR models like Whisper excel in high-resource settings but face challenges in low-resource scenarios.<n>We explore Soft Prompt Tuning (SPT), a parameter-efficient method to enhance CS ASR while preserving prior knowledge.<n>We evaluate two strategies: (1) full fine-tuning (FFT) of both soft prompts and the entire Whisper model, demonstrating improved cross-lingual capabilities compared to traditional methods, and (2) adhering to SPT's original design by freezing model parameters and only training soft prompts.
Score: 13.192310468477377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale multilingual ASR models like Whisper excel in high-resource settings but face challenges in low-resource scenarios, such as rare languages and code-switching (CS), due to computational costs and catastrophic forgetting. We explore Soft Prompt Tuning (SPT), a parameter-efficient method to enhance CS ASR while preserving prior knowledge. We evaluate two strategies: (1) full fine-tuning (FFT) of both soft prompts and the entire Whisper model, demonstrating improved cross-lingual capabilities compared to traditional methods, and (2) adhering to SPT's original design by freezing model parameters and only training soft prompts. Additionally, we introduce SPT4ASR, a combination of different SPT variants. Experiments on the SEAME and ASRU2019 datasets show that deep prompt tuning is the most effective SPT approach, and our SPT4ASR methods achieve further error reductions in CS ASR, maintaining parameter efficiency similar to LoRA, without degrading performance on existing languages.

Related papers

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR)<n>In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages.<n>We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z)
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR [25.566285376879094]
Multimodal models are able to leverage unlabeled text via text-only adaptation with further parameter-efficient ASR fine-tuning. We show cross-lingual transfer from a high-resource language, achieving up to a relative 17% WER reduction over a baseline in a zero-shot setting.
arXiv Detail & Related papers (2024-10-17T11:19:44Z)
Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More [9.230338573494622]
Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models to specific tasks. This paper investigates the potential of SPT for cross-lingual transfer.
arXiv Detail & Related papers (2024-02-06T07:52:30Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning [14.975436239088312]
We propose DePT, which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. We demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios.
arXiv Detail & Related papers (2023-09-11T00:02:05Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual. sensuous-aware fine-Tuning (SPT) scheme. SPT allocates trainable parameters to task-specific important positions. Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z)
SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning [28.29889045842277]
Multitask prompted learning can help generalization through a diverse set of tasks at once. We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning.
arXiv Detail & Related papers (2022-12-21T11:18:09Z)
Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU) We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.