A Study on the Integration of Pipeline and E2E SLU systems for Spoken
Semantic Parsing toward STOP Quality Challenge
- URL: http://arxiv.org/abs/2305.01620v2
- Date: Sat, 6 May 2023 16:35:31 GMT
- Title: A Study on the Integration of Pipeline and E2E SLU systems for Spoken
Semantic Parsing toward STOP Quality Challenge
- Authors: Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng,
Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
- Abstract summary: We describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge.
Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance.
We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.
- Score: 33.89616011003973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently there have been efforts to introduce new benchmark tasks for spoken
language understanding (SLU), like semantic parsing. In this paper, we describe
our proposed spoken semantic parsing system for the quality track (Track 1) in
Spoken Language Understanding Grand Challenge which is part of ICASSP Signal
Processing Grand Challenge 2023. We experiment with both end-to-end and
pipeline systems for this task. Strong automatic speech recognition (ASR)
models like Whisper and pretrained Language models (LM) like BART are utilized
inside our SLU framework to boost performance. We also investigate the output
level combination of various models to get an exact match accuracy of 80.8,
which won the 1st place at the challenge.
Related papers
- Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Bridging Speech and Textual Pre-trained Models with Unsupervised ASR [70.61449720963235]
This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models.
We show that unsupervised automatic speech recognition (ASR) can improve the representations from speech self-supervised models.
Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
arXiv Detail & Related papers (2022-11-06T04:50:37Z) - End-to-End Spoken Language Understanding: Performance analyses of a
voice command task in a low resource setting [0.3867363075280543]
We present a study identifying the signal features and other linguistic properties used by an E2E model to perform the Spoken Language Understanding task.
The study is carried out in the application domain of a smart home that has to handle non-English (here French) voice commands.
arXiv Detail & Related papers (2022-07-17T13:51:56Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - WaBERT: A Low-resource End-to-end Model for Spoken Language
Understanding and Speech-to-BERT Alignment [2.7505260301752763]
We propose a novel end-to-end model combining the speech model and the language model for SLU tasks.
WaBERT is based on the pre-trained speech and language model, hence training from scratch is not needed.
arXiv Detail & Related papers (2022-04-22T02:14:40Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - End-to-End Spoken Language Understanding for Generalized Voice
Assistants [15.241812584273886]
We present our approach to developing an E2E model for generalized speech recognition in commercial voice assistants (VAs)
We propose a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels.
This is then fine-tuned on both transcription and semantic classification losses to handle a diverse set of intent and argument combinations.
arXiv Detail & Related papers (2021-06-16T17:56:47Z) - Towards Semi-Supervised Semantics Understanding from Speech [15.672850567147854]
We propose a framework to learn semantics directly from speech with semi-supervision from transcribed speech to address these.
Our framework is built upon pretrained end-to-end (E2E) ASR and self-supervised language models, such as BERT, and fine-tuned on a limited amount of target SLU corpus.
arXiv Detail & Related papers (2020-11-11T01:48:09Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.