Related papers: Spoken Language Understanding on Unseen Tasks With In-Context Learning

Spoken Language Understanding on Unseen Tasks With In-Context Learning

URL: http://arxiv.org/abs/2505.07731v1
Date: Mon, 12 May 2025 16:38:43 GMT
Title: Spoken Language Understanding on Unseen Tasks With In-Context Learning
Authors: Neeraj Agrawal, Sriram Ganapathy,
Abstract summary: We introduce a novel approach to robust task-agnostic fine-tuning using randomized class labels.<n>We illustrate that the performance of the speech-text LLMs on an unseen task is significantly improved over standard approaches.
Score: 32.375855980608286
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Spoken language understanding (SLU) tasks involve diverse skills that probe the information extraction, classification and/or generation capabilities of models. In this setting, task-specific training data may not always be available. While traditional task-specific SLU models are unable to cater to such requirements, the speech-text large language models (LLMs) offer a promising alternative with emergent abilities. However, out of-the-box, our evaluations indicate that the zero/few-shot performance of prominent open-source speech-text LLMs on SLU tasks are not up to the mark. In this paper, we introduce a novel approach to robust task-agnostic fine-tuning using randomized class labels. With this proposed fine-tuning, we illustrate that the performance of the speech-text LLMs on an unseen task is significantly improved over standard approaches. Critically, the proposed approach avoids the requirement of task-specific data annotations for enabling new tasks in speech-text LLMs.

Related papers

DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)<n>We present a simple yet effective automatic process for creating speech-text pair data.<n>Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z)
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing. We reformulate speech processing tasks into speech-to-unit generation tasks. We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv Detail & Related papers (2024-08-23T13:00:10Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models [12.45822383965784]
We introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method. Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens.
arXiv Detail & Related papers (2024-02-15T16:21:14Z)
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions [64.50935101415776]
We build a single model that jointly performs various spoken language understanding (SLU) tasks. We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages.
arXiv Detail & Related papers (2023-10-04T17:10:23Z)
Leveraging Large Language Models for Exploiting ASR Uncertainty [16.740712975166407]
Large language models must either rely on off-the-shelf automatic speech recognition systems for transcription, or be equipped with an in-built speech modality. We tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent. We propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis.
arXiv Detail & Related papers (2023-09-09T17:02:33Z)
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z)
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community. There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z)
Meta Auxiliary Learning for Low-resource Spoken Language Understanding [11.002938634213734]
Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task. We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task.
arXiv Detail & Related papers (2022-06-26T03:12:33Z)
Low Resource Pipeline for Spoken Language Understanding via Weak Supervision [5.9901156966011975]
In Weak Supervised Learning (WSL), a model is trained over noisy labels obtained from semantic rules and task-specific pre-trained models. We show that task-agnostic prompts are generalizable and can be used to obtain noisy labels for different Spoken Language Understanding (SLU) tasks. We demonstrate that prompt-based methods generate reliable labels for the above SLU tasks and thus can be used as a universal weak source to train a weak-supervised model (WSM) in absence of labeled data.
arXiv Detail & Related papers (2022-06-21T17:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.