Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages
- URL: http://arxiv.org/abs/2602.01008v1
- Date: Sun, 01 Feb 2026 04:18:31 GMT
- Title: Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages
- Authors: Yang Xiao, Eun-Jung Holden, Ting Dang,
- Abstract summary: We analyze multilingual automatic speech recognition models and reveal a U-shaped adaptability pattern.<n>We propose DAMA, a Depth-Aware Model Adaptation framework that allocates adaptation capacity according to each layer's role.<n>Dama matches or surpasses state-of-the-art accuracy with 80% fewer trainable parameters, achieves a 29% error reduction under extreme data scarcity, and significantly improves memory, training time, and computational efficiency over baselines.
- Score: 11.808922632545874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent speech foundation models excel at multilingual automatic speech recognition (ASR) for high-resource languages, but adapting them to low-resource languages remains challenging due to data scarcity and efficiency constraints. Full-model fine-tuning is computationally expensive and prone to overfitting, while parameter-efficient methods like LoRA apply adaptation uniformly across layers, overlooking internal representations thus compromising effectiveness and efficiency. We analyze multilingual ASR models and reveal a U-shaped adaptability pattern: early and late layers are language-specific and require more adaptation, while intermediate layers retain shared semantics and need less. Building on this observation, we propose DAMA, a Depth-Aware Model Adaptation framework that allocates adaptation capacity according to each layer's role. DAMA also introduces Singular Value Decomposition (SVD)-based initialization to constrain adaptation and preserve the U-shaped pattern, as well as a frozen middle-layer basis for further efficiency. Evaluated on 18 low-resource languages across two benchmark datasets, DAMA matches or surpasses state-of-the-art accuracy with 80% fewer trainable parameters, achieves a 29% error reduction under extreme data scarcity, and significantly improves memory, training time, and computational efficiency over baselines. These results highlight the benefits of structure-aware adaptation for efficient, scalable multilingual ASR.
Related papers
- SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR [65.90944188787786]
Low-rank adaptation (LoRA) is widely used in speech applications, but its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech.<n>This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet.<n>We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B.
arXiv Detail & Related papers (2025-09-02T20:51:17Z) - Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages [10.418542753869433]
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data.<n>Current state-of-the-art large language models (LLMs) still struggle with LRLs.<n>Small multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes.
arXiv Detail & Related papers (2025-02-14T13:10:39Z) - Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models [40.69348434971122]
We propose FedARA, a novel Adaptive Rank Allocation framework for federated parameter-efficient fine-tuning of language models.<n>FedARA consistently outperforms baselines by an average of 6.95% to 8.49% across various datasets and models under heterogeneous data.<n>Experiments on various edge devices demonstrate substantial decreases in total training time and energy consumption by up to 48.90% and 46.95%, respectively.
arXiv Detail & Related papers (2025-01-24T11:19:07Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR)<n>In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages.<n>We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z) - SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition [55.2480439325792]
Speech Meta In-Context LEarning (SMILE) is an innovative framework that combines meta-learning with speech in-context learning (SICL)<n>We show that SMILE consistently outperforms baseline methods in training-free few-shot multilingual ASR tasks.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - Self-supervised Adaptive Pre-training of Multilingual Speech Models for
Language and Dialect Identification [19.893213508284813]
Self-supervised adaptive pre-training is proposed to adapt the pre-trained model to the target domain and languages of the downstream task.
We show that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages.
arXiv Detail & Related papers (2023-12-12T14:58:08Z) - Master-ASR: Achieving Multilingual Scalability and Low-Resource
Adaptation in ASR with Modular Learning [28.592569051244375]
METHODNS simultaneously achieves strong multilingual scalability and low-resource adaptation ability.
Our framework achieves a 0.13$sim$2.41 lower character error rate (CER) with 30% smaller inference overhead over state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2023-06-23T16:23:00Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.