Related papers: ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

URL: http://arxiv.org/abs/2111.14706v1
Date: Mon, 29 Nov 2021 17:05:49 GMT
Title: ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe
Abstract summary: ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet. It is designed for quick development of spoken language understanding in a single framework.
Score: 95.39817519115394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can be used to have a faster start into SLU research. We present ESPnet-SLU, which is designed for quick development of spoken language understanding in a single framework. ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet, which is a widely used open-source standard for various speech processing tasks like ASR, Text to Speech (TTS) and Speech Translation (ST). We enhance the toolkit to provide implementations for various SLU benchmarks that enable researchers to seamlessly mix-and-match different ASR and NLU models. We also provide pretrained models with intensively tuned hyper-parameters that can match or even outperform the current state-of-the-art performances. The toolkit is publicly available at https://github.com/espnet/espnet.

Related papers

ESPnet-SpeechLM: An Open Speech Language Model Toolkit [98.4525334631522]
We present ESPnet-SpeechLM, an open toolkit designed to democratize the development of speech language models (SpeechLMs) The toolkit standardizes speech processing tasks by framing them as universal sequential modeling problems. With ESPnet-SpeechLM, users can easily define task templates and configure key settings, enabling seamless and streamlined SpeechLM development.
arXiv Detail & Related papers (2025-02-21T05:21:58Z)
Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages [0.20971479389679337]
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model.
arXiv Detail & Related papers (2024-04-03T09:13:26Z)
Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis. Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z)
OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding [57.48730496422474]
Spoken Language Understanding (SLU) is one of the core components of a task-oriented dialogue system. OpenSLU is an open-source toolkit to provide a unified, modularized, and toolkit for spoken language understanding.
arXiv Detail & Related papers (2023-05-17T14:12:29Z)
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding [86.47555696652618]
This paper presents recent progress on integrating speech separation and enhancement into the ESPnet toolkit. A new interface has been designed to combine speech enhancement front-ends with other tasks, including automatic speech recognition (ASR), speech translation (ST), and spoken language understanding (SLU) Results show that the integration of SE front-ends with back-end tasks is a promising research direction even for tasks besides ASR.
arXiv Detail & Related papers (2022-07-19T18:55:29Z)
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands. This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z)
ESPnet-ST: All-in-One Speech Translation Toolkit [57.76342114226599]
ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet. It implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation. We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines.
arXiv Detail & Related papers (2020-04-21T18:38:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.