ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval
- URL: http://arxiv.org/abs/2512.19703v1
- Date: Thu, 11 Dec 2025 14:48:30 GMT
- Title: ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval
- Authors: Siyuan Fu, Xuchen Guo, Mingjun Liu, Hongxiang Li, Boyin Tan, Gongxi Zhu, Xianwei Zhuang, Jinghan Ru, Yuxin Xie, Yuguo Yin,
- Abstract summary: The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning.<n>The Gradient Locality Bottleneck (GLB) structurally prevents models from leveraging out-of-batch knowledge.<n>The Representation-Drift Mismatch (RDM) is where a static knowledge base becomes progressively misaligned with the evolving model, turning guidance into noise.
- Score: 19.94287753279928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning. This process, however, is inherently limited by what we formalize as the Gradient Locality Bottleneck (GLB), which structurally prevents models from leveraging out-of-batch knowledge and thus impairs fine-grained and long-tail learning. While external knowledge-enhanced methods can alleviate the GLB, we identify a critical, unaddressed side effect: the Representation-Drift Mismatch (RDM), where a static knowledge base becomes progressively misaligned with the evolving model, turning guidance into noise. To address this dual challenge, we propose the Adaptive Self-improving Knowledge (ASK) framework, a model-agnostic, plug-and-play solution. ASK breaks the GLB via multi-grained knowledge injection, systematically mitigates RDM through dynamic knowledge refinement, and introduces a novel adaptive reliability weighting scheme to ensure consistent knowledge contributes to optimization. Experimental results on two benchmark datasets with superior, state-of-the-art performance justify the efficacy of our proposed ASK framework.
Related papers
- Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards [8.109014000578766]
We present ASR-TRA, a novel Testtime Reinforcement Adaptation framework inspired by causal intervention.<n>Our method achieves higher accuracy while maintaining lower latency than existing TTA baselines.<n>Our approach provides a practical and robust solution for deploying ASR systems in challenging real-world conditions.
arXiv Detail & Related papers (2026-03-05T14:43:15Z) - Training-Free Intelligibility-Guided Observation Addition for Noisy ASR [57.74127683005929]
This paper proposes an intelligibility-guided observation addition (OA) method to improve speech recognition in noisy environments.<n>Experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines.
arXiv Detail & Related papers (2026-02-24T14:46:54Z) - DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities [28.992992584085787]
DIS2 is a new paradigm shifting from modality-shared feature dependence to active, guided missing features compensation.<n> Compensatory features are explicitly captured which, when fused with the features of the available modality, approximate the ideal fused representation of the full-modality case.<n>Our proposed approach significantly outperforms state-of-the-art methods across benchmarks.
arXiv Detail & Related papers (2026-01-20T01:33:54Z) - Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning [78.92934995292113]
We propose a Confidence-Aware Asymmetric Learning (CAL) framework, which balances confidence across known and novel forgery types.<n>CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution.
arXiv Detail & Related papers (2025-12-14T12:31:28Z) - Personalized Federated Recommendation With Knowledge Guidance [18.117610268256005]
We propose Federated Recommendation with Knowledge Guidance (FedRKG)<n>FedRKG fuses global knowledge into preserved local embeddings, attaining personalization benefits of dual-knowledge within a single-knowledge memory footprint.<n>Experiments on benchmark datasets demonstrate that FedRKG significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-17T04:35:53Z) - Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z) - Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation [0.0]
This paper introduces and evaluates a novel "unlearn-then-learn" strategy for precise knowledge editing in Large Language Models.<n>A two-stage approach is powered by an initial circuit localization phase that identifies and targets the specific internal components responsible for encoding the conflicting fact.
arXiv Detail & Related papers (2025-08-09T18:48:25Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z) - Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG [51.120170062795566]
We propose Divide-Then-Align (DTA) to endow RAG systems with the ability to respond with "I don't know" when the query is out of the knowledge boundary.<n>DTA balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-27T08:21:21Z) - RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects [12.5122702720856]
We propose Robust Fine-Tuning (RbFT) to enhance the resilience of large language models against retrieval defects.<n> Experimental results demonstrate that RbFT significantly improves the robustness of RAG systems across diverse retrieval conditions.
arXiv Detail & Related papers (2025-01-30T14:15:09Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning [14.403812623299027]
Retrieval-Augmented Generation (RAG) offers an effective solution to the issues faced by Large Language Models (LLMs) in hallucination generation and knowledge obsolescence.<n>We propose Parenting, a novel framework that decouples, identifies, and purposefully optimize parameter subspaces related to adherence and robustness.
arXiv Detail & Related papers (2024-10-14T10:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.