AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding
- URL: http://arxiv.org/abs/2509.04821v1
- Date: Fri, 05 Sep 2025 05:45:07 GMT
- Title: AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding
- Authors: Yan Xie, Yibo Cui, Liang Xie, Erwei Yin,
- Abstract summary: Spoken Language Understanding (SLU) is a core component of conversational systems, enabling machines to interpret user utterances.<n>We propose an Adaptive Feature Distillation framework that transfers rich semantic representations from a General Text Embeddings (GTE)-based teacher model to a lightweight student model.<n> Experiments on the Chinese profile-based ProSLU benchmark demonstrate that AFD-SLU achieves state-of-the-art results, with 95.67% intent accuracy, 92.02% slot F1 score, and 85.50% overall accuracy.
- Score: 11.066147892754154
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spoken Language Understanding (SLU) is a core component of conversational systems, enabling machines to interpret user utterances. Despite its importance, developing effective SLU systems remains challenging due to the scarcity of labeled training data and the computational burden of deploying Large Language Models (LLMs) in real-world applications. To further alleviate these issues, we propose an Adaptive Feature Distillation framework that transfers rich semantic representations from a General Text Embeddings (GTE)-based teacher model to a lightweight student model. Our method introduces a dynamic adapter equipped with a Residual Projection Neural Network (RPNN) to align heterogeneous feature spaces, and a Dynamic Distillation Coefficient (DDC) that adaptively modulates the distillation strength based on real-time feedback from intent and slot prediction performance. Experiments on the Chinese profile-based ProSLU benchmark demonstrate that AFD-SLU achieves state-of-the-art results, with 95.67% intent accuracy, 92.02% slot F1 score, and 85.50% overall accuracy.
Related papers
- Advancing Text Classification with Large Language Models and Neural Attention Mechanisms [11.31737492247233]
The framework includes text encoding, contextual representation modeling, attention-based enhancement, and classification prediction.<n>Results show that the proposed method outperforms existing models on all metrics.
arXiv Detail & Related papers (2025-12-10T09:18:41Z) - Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z) - Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition [4.353165013945741]
Sign language recognition refers to interpreting sign language glosses from given videos automatically.<n>Recent skeleton-based action recognition has attracted increasing attention due to its ability to handle variations in subjects and backgrounds independently.
arXiv Detail & Related papers (2025-03-26T11:10:29Z) - SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models [74.40683913645731]
Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications.<n>Our work proposes a novel solution treating VLMs as black boxes, leveraging scores without training data or ground truth.<n>Analysis of these prompt scores reveals VLM biases and AND''/OR' signal ambiguities, notably that maximum scores are surprisingly suboptimal compared to second-highest scores.
arXiv Detail & Related papers (2025-02-24T07:15:05Z) - Training Strategies for Isolated Sign Language Recognition [72.27323884094953]
This paper introduces a comprehensive model training pipeline for Isolated Sign Language Recognition.<n>The constructed pipeline incorporates carefully selected image and video augmentations to tackle the challenges of low data quality and varying sign speeds.
arXiv Detail & Related papers (2024-12-16T08:37:58Z) - Learnable Sparse Customization in Heterogeneous Edge Computing [27.201987866208484]
We propose Learnable Personalized Sparsification for heterogeneous Federated learning (FedLPS)<n>FedLPS learns the importance of model units on local data representation and derives an importance-based sparse pattern with minimals to accurately extract personalized data features.<n>Experiments show that FedLPS outperforms status quo approaches in accuracy and training costs.
arXiv Detail & Related papers (2024-12-10T06:14:31Z) - How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR)<n>In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages.<n>We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z) - Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection [2.1155908599769764]
We propose UniProj-Det, a lightweight modular framework for parameter-efficient open-vocabulary object detection.<n>UniProj-Det freezes pretrained backbones and introduces a Universal Projection module with a learnable modality token, enabling unified vision--language adaptation at minimal cost.
arXiv Detail & Related papers (2024-08-20T12:27:53Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - End-to-End Spoken Language Understanding for Generalized Voice
Assistants [15.241812584273886]
We present our approach to developing an E2E model for generalized speech recognition in commercial voice assistants (VAs)
We propose a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels.
This is then fine-tuned on both transcription and semantic classification losses to handle a diverse set of intent and argument combinations.
arXiv Detail & Related papers (2021-06-16T17:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.