BSTC: A Large-Scale Chinese-English Speech Translation Dataset
- URL: http://arxiv.org/abs/2104.03575v2
- Date: Fri, 9 Apr 2021 05:47:33 GMT
- Title: BSTC: A Large-Scale Chinese-English Speech Translation Dataset
- Authors: Ruiqing Zhang, Xiyang Wang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Zhi
Li, Haifeng Wang, Ying Chen, Qinfei Li
- Abstract summary: BSTC (Baidu Speech Translation Corpus) is a large-scale Chinese-English speech translation dataset.
This dataset is constructed based on a collection of licensed videos of talks or lectures, including about 68 hours of Mandarin data.
We have asked three experienced interpreters to simultaneously interpret the testing talks in a mock conference setting.
- Score: 26.633433687767553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale
Chinese-English speech translation dataset. This dataset is constructed based
on a collection of licensed videos of talks or lectures, including about 68
hours of Mandarin data, their manual transcripts and translations into English,
as well as automated transcripts by an automatic speech recognition (ASR)
model. We have further asked three experienced interpreters to simultaneously
interpret the testing talks in a mock conference setting. This corpus is
expected to promote the research of automatic simultaneous translation as well
as the development of practical systems. We have organized simultaneous
translation tasks and used this corpus to evaluate automatic simultaneous
translation systems.
Related papers
- Cross-Lingual Transfer Learning for Speech Translation [7.802021866251242]
This paper examines how to expand the speech translation capability of speech foundation models with restricted data.
Whisper, a speech foundation model with strong performance on speech recognition and English translation, is used as the example model.
Using speech-to-speech retrieval to analyse the audio representations generated by the encoder, we show that utterances from different languages are mapped to a shared semantic space.
arXiv Detail & Related papers (2024-07-01T09:51:48Z) - TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine
Translation of Lecture Transcripts [50.00305136008848]
We propose a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera.
For both English--Japanese and English--Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets.
This study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits.
arXiv Detail & Related papers (2023-11-07T03:50:25Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - End-to-End Speech Translation of Arabic to English Broadcast News [2.375764121997739]
Speech translation (ST) is the task of translating acoustic speech signals in a source language into text in a foreign language.
This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system.
arXiv Detail & Related papers (2022-12-11T11:35:46Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - FST: the FAIR Speech Translation System for the IWSLT21 Multilingual
Shared Task [36.51221186190272]
We describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign.
Our system is built by leveraging transfer learning across modalities, tasks and languages.
arXiv Detail & Related papers (2021-07-14T19:43:44Z) - Lost in Interpreting: Speech Translation from Source or Interpreter? [0.0]
We release 10 hours of recordings and transcripts of European Parliament speeches in English, with simultaneous interpreting into Czech and German.
We evaluate quality and latency of speaker-based and interpreter-based spoken translation systems from English to Czech.
arXiv Detail & Related papers (2021-06-17T09:32:49Z) - Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation.
The key idea is to generate source transcript and target translation text with a single decoder.
Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z) - Designing the Business Conversation Corpus [20.491255702901288]
We aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus.
A detailed analysis of the corpus is provided along with challenging examples for automatic translation.
We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.
arXiv Detail & Related papers (2020-08-05T05:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.