GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR
- URL: http://arxiv.org/abs/2603.02464v1
- Date: Mon, 02 Mar 2026 23:10:09 GMT
- Title: GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR
- Authors: Pouya Mehralian, Melissa Farasyn, Anne Breitbarth, Anne-Sophie Ghyselen, Hugo Van hamme,
- Abstract summary: GLoRIA is an adaptation framework that modulates low-rank updates in a pre-trained encoder.<n>On the GCND corpus, GLoRIA outperforms geo-conditioned full fine-tuning.
- Score: 11.705969106735454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Speech Recognition (ASR) in dialect-heavy settings remains challenging due to strong regional variation and limited labeled data. We propose GLoRIA, a parameter-efficient adaptation framework that leverages metadata (e.g., coordinates) to modulate low-rank updates in a pre-trained encoder. GLoRIA injects low-rank matrices into each feed-forward layer, with a gating MLP determining the non-negative contribution of each LoRA rank-1 component based on location metadata. On the GCND corpus, GLoRIA outperforms geo-conditioned full fine-tuning, LoRA, and both dialect-specific and unified full fine-tuning, achieving state-of-the-art word error rates while updating under 10% of parameters. GLoRIA also generalizes well to unseen dialects, including in extrapolation scenarios, and enables interpretable adaptation patterns that can be visualized geospatially. These results show metadata-gated low-rank adaptation is an effective, interpretable, and efficient solution for dialectal ASR.
Related papers
- SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR [65.90944188787786]
Low-rank adaptation (LoRA) is widely used in speech applications, but its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech.<n>This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet.<n>We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B.
arXiv Detail & Related papers (2025-09-02T20:51:17Z) - Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models [7.075648770762989]
Fine-tuning large language models with low-rank adaptaion (LoRA) is a cost-effective way to incorporate information from a specific dataset.<n>It is often unclear how well the fine-tuned LLM will generalize, i.e., how well it will perform on unseen datasets.<n>We propose Amortized Bayesian Meta-Learning for LoRA (ABMLL) to improve generalization and scales to large models.
arXiv Detail & Related papers (2025-08-19T21:57:59Z) - Optimizing Retrieval-Augmented Generation (RAG) for Colloquial Cantonese: A LoRA-Based Systematic Review [0.0]
Review examines advances in.<n>Efficient Fine-Tuning (PEFT) to optimize Retrieval-Augmented Generation (RAG) systems like Qwen3, DeepSeek, and Kimi.<n>RAG systems face challenges in understanding and generating authentic Cantonese colloquial expressions due to limited annotated data and linguistic variability.<n>Findings reveal that dynamic and ensemble LoRA adaptations significantly reduce trainable parameters without sacrificing retrieval accuracy and generation quality in dialectal contexts.
arXiv Detail & Related papers (2025-08-12T03:46:16Z) - HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization [55.972018549438964]
Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy.<n>We propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable fine-tuning of LLMs in heterogeneous environments.<n> Experimental results on the text classification task demonstrate that HAFLQ reduces memory usage by 31%, lowers communication cost by 49%, improves accuracy by 50%, and achieves faster convergence compared to the baseline method.
arXiv Detail & Related papers (2024-11-10T19:59:54Z) - Low-Rank Adaptation Secretly Imitates Differentially Private SGD [5.359060261460183]
We show theoretically that low-rank adaptation is equivalent to fine-tuning adapters with noisy batch gradients.<n>We also quantify the variance of the injected noise as a decreasing function of adaptation rank.<n>Low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
arXiv Detail & Related papers (2024-09-26T04:56:49Z) - LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation [15.520180125182756]
Recent advancements in integrating speech information into large language models (LLMs) have significantly improved automatic speech recognition (ASR) accuracy.
Existing methods often constrained by the capabilities of the speech encoders under varied acoustic conditions, such as accents.
We propose LA-RAG, a novel Retrieval-Augmented Generation (RAG) paradigm for LLM-based ASR.
arXiv Detail & Related papers (2024-09-13T07:28:47Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - LLM-based speaker diarization correction: A generalizable approach [0.0]
We investigate the use of large language models (LLMs) for diarization correction as a post-processing step.<n>The ability of the models to improve diarization accuracy in a holdout dataset from the Fisher corpus as well as an independent dataset was measured.
arXiv Detail & Related papers (2024-06-07T13:33:22Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.