Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target
- URL: http://arxiv.org/abs/2305.18096v2
- Date: Sat, 8 Jul 2023 07:25:14 GMT
- Title: Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target
- Authors: Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee
- Abstract summary: Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances.
We propose to use discrete units as intermediate guidance to improve textless SLU performance.
- Score: 58.59044226658916
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Spoken Language Understanding (SLU) is a task that aims to extract semantic
information from spoken utterances. Previous research has made progress in
end-to-end SLU by using paired speech-text data, such as pre-trained Automatic
Speech Recognition (ASR) models or paired text as intermediate targets.
However, acquiring paired transcripts is expensive and impractical for
unwritten languages. On the other hand, Textless SLU extracts semantic
information from speech without utilizing paired transcripts. However, the
absence of intermediate targets and training guidance for textless SLU often
results in suboptimal performance. In this work, inspired by the
content-disentangled discrete units from self-supervised speech models, we
proposed to use discrete units as intermediate guidance to improve textless SLU
performance. Our method surpasses the baseline method on five SLU benchmark
corpora. Additionally, we find that unit guidance facilitates few-shot learning
and enhances the model's ability to handle noise.
Related papers
- Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Bridging Speech and Textual Pre-trained Models with Unsupervised ASR [70.61449720963235]
This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models.
We show that unsupervised automatic speech recognition (ASR) can improve the representations from speech self-supervised models.
Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
arXiv Detail & Related papers (2022-11-06T04:50:37Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - STOP: A dataset for Spoken Task Oriented Semantic Parsing [66.14615249745448]
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
arXiv Detail & Related papers (2022-06-29T00:36:34Z) - Meta Auxiliary Learning for Low-resource Spoken Language Understanding [11.002938634213734]
Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task.
We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task.
arXiv Detail & Related papers (2022-06-26T03:12:33Z) - Speak or Chat with Me: End-to-End Spoken Language Understanding System
with Flexible Inputs [21.658650440278063]
We propose a novel system that can predict intents from flexible types of inputs: speech, ASR transcripts, or both.
Our experiments show significant advantages for these pre-training and fine-tuning strategies, resulting in a system that achieves competitive intent-classification performance.
arXiv Detail & Related papers (2021-04-07T20:48:08Z) - Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation [15.225080891662675]
Speech comprehension can benefit from inference of massive pre-trained language models.
We experimentally verify our hypothesis that the knowledge could be shared from the top layer of the LM to a fully speech-based module.
arXiv Detail & Related papers (2020-05-17T10:50:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.