Meta Auxiliary Learning for Low-resource Spoken Language Understanding
- URL: http://arxiv.org/abs/2206.12774v1
- Date: Sun, 26 Jun 2022 03:12:33 GMT
- Title: Meta Auxiliary Learning for Low-resource Spoken Language Understanding
- Authors: Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang
- Abstract summary: Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task.
We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task.
- Score: 11.002938634213734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Spoken language understanding (SLU) treats automatic speech recognition (ASR)
and natural language understanding (NLU) as a unified task and usually suffers
from data scarcity. We exploit an ASR and NLU joint training method based on
meta auxiliary learning to improve the performance of low-resource SLU task by
only taking advantage of abundant manual transcriptions of speech data. One
obvious advantage of such method is that it provides a flexible framework to
implement a low-resource SLU training task without requiring access to any
further semantic annotations. In particular, a NLU model is taken as label
generation network to predict intent and slot tags from texts; a multi-task
network trains ASR task and SLU task synchronously from speech; and the
predictions of label generation network are delivered to the multi-task network
as semantic targets. The efficiency of the proposed algorithm is demonstrated
with experiments on the public CATSLU dataset, which produces more suitable ASR
hypotheses for the downstream NLU task.
Related papers
- Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - Leveraging Large Language Models for Exploiting ASR Uncertainty [16.740712975166407]
Large language models must either rely on off-the-shelf automatic speech recognition systems for transcription, or be equipped with an in-built speech modality.
We tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent.
We propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis.
arXiv Detail & Related papers (2023-09-09T17:02:33Z) - Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances.
We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Bridging Speech and Textual Pre-trained Models with Unsupervised ASR [70.61449720963235]
This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models.
We show that unsupervised automatic speech recognition (ASR) can improve the representations from speech self-supervised models.
Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
arXiv Detail & Related papers (2022-11-06T04:50:37Z) - STOP: A dataset for Spoken Task Oriented Semantic Parsing [66.14615249745448]
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
arXiv Detail & Related papers (2022-06-29T00:36:34Z) - End-to-End Spoken Language Understanding using RNN-Transducer ASR [14.267028645397266]
We propose an end-to-end trained spoken language understanding (SLU) system that extracts transcripts, intents and slots from an input speech utterance.
It consists of a streaming recurrent neural network transducer (RNNT) based automatic speech recognition (ASR) model connected to a neural natural language understanding (NLU) model through a neural interface.
arXiv Detail & Related papers (2021-06-30T09:20:32Z) - Speech To Semantics: Improve ASR and NLU Jointly via All-Neural
Interfaces [17.030832205343195]
We consider the problem of spoken language understanding (SLU) of extracting natural language intents from speech directed at voice assistants.
An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios.
We show that the jointly trained model shows improvements to ASR incorporating semantic information from NLU and also improves NLU by exposing it to ASR confusion encoded in the hidden layer.
arXiv Detail & Related papers (2020-08-14T02:43:57Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.