SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
- URL: http://arxiv.org/abs/2601.12600v1
- Date: Sun, 18 Jan 2026 22:16:44 GMT
- Title: SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
- Authors: Pu Wang, Shinji Watanabe, Hugo Van hamme,
- Abstract summary: This paper introduces SSVD-Outer (SSVD-O), an extension of the structured SVD-guided (SSVD) fine-tuning method.<n>We conduct the first systematic analysis of parameter budget allocation across model subspaces in PEFT for automatic speech recognition.<n>We show that SSVD-O consistently narrows the performance gap to full fine-tuning while improving generalization and mitigating catastrophic forgetting.
- Score: 65.90944188787786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) is a scalable approach for adapting large speech foundation models to new domains. While methods such as LoRA and its state-of-the-art variants reduce adaptation costs, they typically allocate parameters uniformly across model subspaces, which limits their efficiency and scalability in speech applications. Building on our prior work, this paper introduces SSVD-Outer (SSVD-O), an extension of the structured SVD-guided (SSVD) fine-tuning method. SSVD-O combines input acoustic feature space-associated inner transformations with output semantic feature space-associated outer transformations to enable scalable and balanced adaptation. We conduct the first systematic analysis of parameter budget allocation across model subspaces in PEFT for automatic speech recognition (ASR), and investigate the trade-off between learning and forgetting under constrained resources. SSVD-O is benchmarked against LoRA, DoRA, PiSSA, and SSVD on domain-shifted ASR tasks, including child speech and regional accents, across model scales from 0.1B to 2B within the ESPnet framework. Experimental results show that SSVD-O consistently narrows the performance gap to full fine-tuning while improving generalization and mitigating catastrophic forgetting.
Related papers
- Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages [11.808922632545874]
We analyze multilingual automatic speech recognition models and reveal a U-shaped adaptability pattern.<n>We propose DAMA, a Depth-Aware Model Adaptation framework that allocates adaptation capacity according to each layer's role.<n>Dama matches or surpasses state-of-the-art accuracy with 80% fewer trainable parameters, achieves a 29% error reduction under extreme data scarcity, and significantly improves memory, training time, and computational efficiency over baselines.
arXiv Detail & Related papers (2026-02-01T04:18:31Z) - SSVP: Synergistic Semantic-Visual Prompting for Industrial Zero-Shot Anomaly Detection [55.54007781679915]
We propose Synergistic Semantic-Visual Prompting (SSVP), that efficiently fuses diverse visual encodings to elevate model's fine-grained perception.<n>SSVP achieves state-of-the-art performance with 93.0% Image-AUROC and 92.2% Pixel-AUROC on MVTec-AD, significantly outperforming existing zero-shot approaches.
arXiv Detail & Related papers (2026-01-14T04:42:19Z) - Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection [62.89126207012712]
Face Presentation Attack Detection (PAD) demands incremental learning to combat spoofing tactics and domains.<n>Privacy regulations forbid retaining past data, necessitating rehearsal-free learning (RF-IL)
arXiv Detail & Related papers (2025-12-22T04:30:11Z) - SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR [65.90944188787786]
Low-rank adaptation (LoRA) is widely used in speech applications, but its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech.<n>This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet.<n>We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B.
arXiv Detail & Related papers (2025-09-02T20:51:17Z) - Merging Memory and Space: A State Space Neural Operator [8.378604588491394]
State Space Neural Operator (SS-NO) is a compact architecture for learning solution operators of time-dependent partial differential equations.<n>We show that SS-NO achieves state-of-the-art performance across diverse PDE benchmarks.
arXiv Detail & Related papers (2025-07-31T11:09:15Z) - AdaSVD: Adaptive Singular Value Decomposition for Large Language Models [75.1196637934987]
Singular Value Decomposition (SVD) has emerged as a promising compression technique for large language models (LLMs)<n>Existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation.<n>We propose AdaSVD, an adaptive SVD-based LLM compression approach.
arXiv Detail & Related papers (2025-02-03T14:34:37Z) - Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection [2.1155908599769764]
We propose UniProj-Det, a lightweight modular framework for parameter-efficient open-vocabulary object detection.<n>UniProj-Det freezes pretrained backbones and introduces a Universal Projection module with a learnable modality token, enabling unified vision--language adaptation at minimal cost.
arXiv Detail & Related papers (2024-08-20T12:27:53Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.