Simultaneous Translation for Unsegmented Input: A Sliding Window
Approach
- URL: http://arxiv.org/abs/2210.09754v1
- Date: Tue, 18 Oct 2022 11:07:28 GMT
- Title: Simultaneous Translation for Unsegmented Input: A Sliding Window
Approach
- Authors: Sukanta Sen and Ond\v{r}ej Bojar and Barry Haddow
- Abstract summary: We present a sliding window approach to translate raw ASR outputs (online or offline) without needing to rely on an automatic segmenter.
Experiments on English-to-German and English-to-Czech show that our approach improves 1.3--2.0 BLEU points over the usual ASR-segmenter pipeline.
- Score: 8.651762907847848
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In the cascaded approach to spoken language translation (SLT), the ASR output
is typically punctuated and segmented into sentences before being passed to MT,
since the latter is typically trained on written text. However, erroneous
segmentation, due to poor sentence-final punctuation by the ASR system, leads
to degradation in translation quality, especially in the simultaneous (online)
setting where the input is continuously updated. To reduce the influence of
automatic segmentation, we present a sliding window approach to translate raw
ASR outputs (online or offline) without needing to rely on an automatic
segmenter. We train translation models using parallel windows (instead of
parallel sentences) extracted from the original training data. At test time, we
translate at the window level and join the translated windows using a simple
approach to generate the final translation. Experiments on English-to-German
and English-to-Czech show that our approach improves 1.3--2.0 BLEU points over
the usual ASR-segmenter pipeline, and the fixed-length window considerably
reduces flicker compared to a baseline retranslation-based online SLT system.
Related papers
- Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution.
It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector.
Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z) - Token-Level Serialized Output Training for Joint Streaming ASR and ST
Leveraging Textual Alignments [49.38965743465124]
This paper introduces a streaming Transformer-Transducer that jointly generates automatic speech recognition (ASR) and speech translation (ST) outputs using a single decoder.
Experiments in monolingual and multilingual settings demonstrate that our approach achieves the best quality-latency balance.
arXiv Detail & Related papers (2023-07-07T02:26:18Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - A Technical Report: BUT Speech Translation Systems [2.9327503320877457]
The paper describes the BUT's speech translation systems.
The systems are English$longrightarrow$German offline speech translation systems.
A large degradation is observed when translating ASR hypothesis compared to the oracle input text.
arXiv Detail & Related papers (2020-10-22T10:52:31Z) - Cascaded Models With Cyclic Feedback For Direct Speech Translation [14.839931533868176]
We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data.
A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English speech translation.
arXiv Detail & Related papers (2020-10-21T17:18:51Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Dynamic Masking for Improved Stability in Spoken Language Translation [8.591381243212712]
We show how a mask can be set to improve the latency-flicker trade-off without sacrificing translation quality.
A possible solution is to add a fixed delay, or "mask" to the the output of the MT system.
We show how this mask can be set dynamically, improving the latency-flicker trade-off without sacrificing translation quality.
arXiv Detail & Related papers (2020-05-30T12:23:10Z) - Jointly Trained Transformers models for Spoken Language Translation [2.3886615435250302]
This work trains SLT systems with ASR objective as an auxiliary loss and both the networks are connected through neural hidden representations.
This architecture has improved from BLEU from 36.8 to 44.5.
All the experiments are reported on English-Portuguese speech translation task using How2 corpus.
arXiv Detail & Related papers (2020-04-25T11:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.