Related papers: MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model

MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model

URL: http://arxiv.org/abs/2509.20706v1
Date: Thu, 25 Sep 2025 03:16:32 GMT
Title: MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
Authors: Hsiao-Ying Huang, Yi-Cheng Lin, Hung-yi Lee,
Abstract summary: Large audio-language models (LALMs) show strong zero-shot ability on speech tasks, suggesting promise for speech emotion recognition (SER)<n>We ask: given only unlabeled target-domain audio and an API-only LALM, can a student model be adapted to outperform the LALM in the target domain?<n>We propose MI-Fuse, a denoised label fusion framework that supplements the LALM with a source-domain trained SER as an auxiliary teacher.
Score: 49.59690207400984
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large audio-language models (LALMs) show strong zero-shot ability on speech tasks, suggesting promise for speech emotion recognition (SER). However, SER in real-world deployments often fails under domain mismatch, where source data are unavailable and powerful LALMs are accessible only through an API. We ask: given only unlabeled target-domain audio and an API-only LALM, can a student model be adapted to outperform the LALM in the target domain? To this end, we propose MI-Fuse, a denoised label fusion framework that supplements the LALM with a source-domain trained SER classifier as an auxiliary teacher. The framework draws multiple stochastic predictions from both teachers, weights their mean distributions by mutual-information-based uncertainty, and stabilizes training with an exponential moving average teacher. Experiments across three public emotion datasets and six cross-domain transfers show consistent gains, with the student surpassing the LALM and outperforming the strongest baseline by 3.9%. This approach strengthens emotion-aware speech systems without sharing source data, enabling realistic adaptation.

Related papers

GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval [19.73916326078242]
The CLIP model has become a cornerstone of large-scale retrieval systems by aligning text and image data in a unified embedding space.<n>To avoid costly retraining, existing methods mainly adopt query-rewriting strategies with large language models (LLMs)<n>We address this challenge with GRAPE, a plug-and-play enhancement approach that incorporates ranking signals into retrieval-guided query rewriting.
arXiv Detail & Related papers (2025-09-27T15:36:59Z)
COLA: Context-aware Language-driven Test-time Adaptation [20.919416740369975]
We investigate a more general source model capable of adaptation to multiple target domains without needing shared labels.<n>This is achieved by using a pre-trained vision-language model (VLM), egno, CLIP, that can recognize images through matching with class descriptions.<n>We propose a novel method -- Context-aware Language-driven TTA (COLA)
arXiv Detail & Related papers (2025-09-22T11:19:17Z)
Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding [59.09971455857609]
Video Temporal Grounding aims to temporally locate video segments matching a natural language description in a long video.<n>We introduce a Data-Efficient Unlabelled Cross-domain Temporal Grounding method.<n>This method eliminates the need for target annotation and keeps both computational and storage overhead low enough to run in real time.
arXiv Detail & Related papers (2025-08-08T13:47:00Z)
Shh, don't say that! Domain Certification in LLMs [124.61851324874627]
Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains.<n>We introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models.<n>We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate.
arXiv Detail & Related papers (2025-02-26T17:13:19Z)
ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters [3.729242965449096]
This paper introduces ChameleonLLM, a novel framework that enables inference-time adaptation of large language models.<n>Unlike traditional fine-tuning approaches such as Low-Rank Adaptation (LoRA), our method dynamically generates adaptive modifications to the decoder weights.<n>By intelligently grouping similar inputs and computing context-aware low-rank updates via a hyper-network, ChameleonLLM achieves significant performance gains.
arXiv Detail & Related papers (2025-02-06T18:57:06Z)
Labels Generated by Large Language Models Help Measure People's Empathy in Vitro [9.536979155245026]
This paper explores using large language models (LLMs) to improve supervised training of mainstream models.<n>We show that replacing or supplementing crowdsourced labels with LLM-generated labels achieves statistically significant accuracy improvements.<n>This paper further analyses evaluation metric selection and demographic biases to help guide the future development of more equitable empathy computing models.
arXiv Detail & Related papers (2025-01-01T01:06:58Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning Based on Emotional Information [36.059869205457815]
Methods for cross-domain misinformation detection rely on effort- and resource-intensive fine-tuning and complex model structures.<n>We propose RAEmoLLM, the first retrieval augmented (RAG) LLMs framework to address cross-domain misinformation detection using in-context learning based on affective information.<n> RAEmoLLM achieves significant improvements compared to the other few-shot methods on three datasets.
arXiv Detail & Related papers (2024-06-16T22:49:11Z)
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning [7.2523602603838535]
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target domain using only unlabeled target data.<n>We propose $textbfReliability-based Curriculum Learning (RCL)$, a novel framework that integrates multiple MLLMs for knowledge exploitation via pseudo-labeling in SFDA.<n>RCL achieves state-of-the-art (SOTA) performance on multiple SFDA benchmarks, e.g., $textbf+9.4%$ on DomainNet, demonstrating its effectiveness in enhancing adaptability and robustness without requiring access to source data.
arXiv Detail & Related papers (2024-05-28T17:18:17Z)
Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data. We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z)
SLICER: Learning universal audio representations using low-resource self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data. Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z)
On Universal Black-Box Domain Adaptation [53.7611757926922]
We study an arguably least restrictive setting of domain adaptation in a sense of practical deployment. Only the interface of source model is available to the target domain, and where the label-space relations between the two domains are allowed to be different and unknown. We propose to unify them into a self-training framework, regularized by consistency of predictions in local neighborhoods of target samples.
arXiv Detail & Related papers (2021-04-10T02:21:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.