ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
- URL: http://arxiv.org/abs/2111.14706v1
- Date: Mon, 29 Nov 2021 17:05:49 GMT
- Title: ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
- Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi
Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc
Thang Vu, Alan W Black, Shinji Watanabe
- Abstract summary: ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet.
It is designed for quick development of spoken language understanding in a single framework.
- Score: 95.39817519115394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Automatic Speech Processing (ASR) systems are getting better, there is an
increasing interest of using the ASR output to do downstream Natural Language
Processing (NLP) tasks. However, there are few open source toolkits that can be
used to generate reproducible results on different Spoken Language
Understanding (SLU) benchmarks. Hence, there is a need to build an open source
standard that can be used to have a faster start into SLU research. We present
ESPnet-SLU, which is designed for quick development of spoken language
understanding in a single framework. ESPnet-SLU is a project inside end-to-end
speech processing toolkit, ESPnet, which is a widely used open-source standard
for various speech processing tasks like ASR, Text to Speech (TTS) and Speech
Translation (ST). We enhance the toolkit to provide implementations for various
SLU benchmarks that enable researchers to seamlessly mix-and-match different
ASR and NLU models. We also provide pretrained models with intensively tuned
hyper-parameters that can match or even outperform the current state-of-the-art
performances. The toolkit is publicly available at
https://github.com/espnet/espnet.
Related papers
- Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages [0.20971479389679337]
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant.
In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs)
Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model.
arXiv Detail & Related papers (2024-04-03T09:13:26Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken
Language Understanding [57.48730496422474]
Spoken Language Understanding (SLU) is one of the core components of a task-oriented dialogue system.
OpenSLU is an open-source toolkit to provide a unified, modularized, and toolkit for spoken language understanding.
arXiv Detail & Related papers (2023-05-17T14:12:29Z) - ESPnet-SE++: Speech Enhancement for Robust Speech Recognition,
Translation, and Understanding [86.47555696652618]
This paper presents recent progress on integrating speech separation and enhancement into the ESPnet toolkit.
A new interface has been designed to combine speech enhancement front-ends with other tasks, including automatic speech recognition (ASR), speech translation (ST), and spoken language understanding (SLU)
Results show that the integration of SE front-ends with back-end tasks is a promising research direction even for tasks besides ASR.
arXiv Detail & Related papers (2022-07-19T18:55:29Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - ESPnet-ST: All-in-One Speech Translation Toolkit [57.76342114226599]
ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet.
It implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation.
We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines.
arXiv Detail & Related papers (2020-04-21T18:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.