ESPnet-ST: All-in-One Speech Translation Toolkit
- URL: http://arxiv.org/abs/2004.10234v2
- Date: Wed, 30 Sep 2020 12:28:18 GMT
- Title: ESPnet-ST: All-in-One Speech Translation Toolkit
- Authors: Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson
Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe
- Abstract summary: ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet.
It implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation.
We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines.
- Score: 57.76342114226599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ESPnet-ST, which is designed for the quick development of
speech-to-speech translation systems in a single framework. ESPnet-ST is a new
project inside end-to-end speech processing toolkit, ESPnet, which integrates
or newly implements automatic speech recognition, machine translation, and
text-to-speech functions for speech translation. We provide all-in-one recipes
including data pre-processing, feature extraction, training, and decoding
pipelines for a wide range of benchmark datasets. Our reproducible results can
match or even outperform the current state-of-the-art performances; these
pre-trained models are downloadable. The toolkit is publicly available at
https://github.com/espnet/espnet.
Related papers
- ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit [61.52122386938913]
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit.
This paper describes the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2.
arXiv Detail & Related papers (2023-04-10T14:05:22Z) - SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
Based Speech-Text Pre-training [106.34112664893622]
We propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder.
Our proposed SpeechUT is fine-tuned and evaluated on automatic speech recognition (ASR) and speech translation (ST) tasks.
arXiv Detail & Related papers (2022-10-07T17:57:45Z) - ESPnet-SE++: Speech Enhancement for Robust Speech Recognition,
Translation, and Understanding [86.47555696652618]
This paper presents recent progress on integrating speech separation and enhancement into the ESPnet toolkit.
A new interface has been designed to combine speech enhancement front-ends with other tasks, including automatic speech recognition (ASR), speech translation (ST), and spoken language understanding (SLU)
Results show that the integration of SE front-ends with back-end tasks is a promising research direction even for tasks besides ASR.
arXiv Detail & Related papers (2022-07-19T18:55:29Z) - ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet [95.39817519115394]
ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet.
It is designed for quick development of spoken language understanding in a single framework.
arXiv Detail & Related papers (2021-11-29T17:05:49Z) - SpeechBrain: A General-Purpose Speech Toolkit [73.0404642815335]
SpeechBrain is an open-source and all-in-one speech toolkit.
It is designed to facilitate the research and development of neural speech processing technologies.
It achieves competitive or state-of-the-art performance in a wide range of speech benchmarks.
arXiv Detail & Related papers (2021-06-08T18:22:56Z) - NeurST: Neural Speech Translation Toolkit [13.68036533544182]
NeurST is an open-source toolkit for neural speech translation developed by ByteDance AI Lab.
It mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products.
arXiv Detail & Related papers (2020-12-18T02:33:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.