STOP: A dataset for Spoken Task Oriented Semantic Parsing
- URL: http://arxiv.org/abs/2207.10643v1
- Date: Wed, 29 Jun 2022 00:36:34 GMT
- Title: STOP: A dataset for Spoken Task Oriented Semantic Parsing
- Authors: Paden Tomasello, Po-Chun Hsu, Akshat Shrivastava, Daniel Lazar, Duc
Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossef Mordechay,
Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman
Mohamed
- Abstract summary: End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
- Score: 66.14615249745448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end spoken language understanding (SLU) predicts intent directly from
audio using a single model. It promises to improve the performance of assistant
systems by leveraging acoustic information lost in the intermediate textual
representation and preventing cascading errors from Automatic Speech
Recognition (ASR). Further, having one unified model has efficiency advantages
when deploying assistant systems on-device. However, the limited number of
public audio datasets with semantic parse labels hinders the research progress
in this area. In this paper, we release the Spoken Task-Oriented semantic
Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly
available. Additionally, we define low-resource splits to establish a benchmark
for improving SLU when limited labeled data is available. Furthermore, in
addition to the human-recorded audio, we are releasing a TTS-generated version
to benchmark the performance for low-resource domain adaptation of end-to-end
SLU systems. Initial experimentation show end-to-end SLU models performing
slightly worse than their cascaded counterparts, which we hope encourages
future work in this direction.
Related papers
- Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances.
We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z) - End-to-end spoken language understanding using joint CTC loss and
self-supervised, pretrained acoustic encoders [13.722028186368737]
We leverage self-supervised acoustic encoders fine-tuned with Connectionist Temporal Classification to extract textual embeddings.
Our model achieves 4% absolute improvement over the the state-of-the-art (SOTA) dialogue act classification model on the DSTC2 dataset.
arXiv Detail & Related papers (2023-05-04T15:36:37Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Two-Pass Low Latency End-to-End Spoken Language Understanding [36.81762807197944]
We incorporated language models pre-trained on unlabeled text data inside E2E-SLU frameworks to build strong semantic representations.
We developed a 2-pass SLU system that makes low latency prediction using acoustic information from the few seconds of the audio in the first pass.
Our code and models are publicly available as part of the ESPnet-SLU toolkit.
arXiv Detail & Related papers (2022-07-14T05:50:16Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - On the Use of External Data for Spoken Named Entity Recognition [40.93448412171246]
Recent advances in self-supervised speech representations have made it feasible to consider learning models with limited labeled data.
We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline approaches.
arXiv Detail & Related papers (2021-12-14T18:49:26Z) - Do as I mean, not as I say: Sequence Loss Training for Spoken Language
Understanding [22.652754839140744]
Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech.
We propose non-differentiable sequence losses based on SLU metrics as a proxy for semantic error and use the REINFORCE trick to train ASR and SLU models with this loss.
We show that custom sequence loss training is the state-of-the-art on open SLU datasets and leads to 6% relative improvement in both ASR and NLU performance metrics.
arXiv Detail & Related papers (2021-02-12T20:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.