Shiftable Context: Addressing Training-Inference Context Mismatch in
Simultaneous Speech Translation
- URL: http://arxiv.org/abs/2307.01377v1
- Date: Mon, 3 Jul 2023 22:11:51 GMT
- Title: Shiftable Context: Addressing Training-Inference Context Mismatch in
Simultaneous Speech Translation
- Authors: Matthew Raffel, Drew Penney, Lizhong Chen
- Abstract summary: Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation.
We propose Shiftable Context to ensure consistent segment and context sizes are maintained throughout training and inference.
- Score: 0.17188280334580192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer models using segment-based processing have been an effective
architecture for simultaneous speech translation. However, such models create a
context mismatch between training and inference environments, hindering
potential translation accuracy. We solve this issue by proposing Shiftable
Context, a simple yet effective scheme to ensure that consistent segment and
context sizes are maintained throughout training and inference, even with the
presence of partially filled segments due to the streaming nature of
simultaneous translation. Shiftable Context is also broadly applicable to
segment-based transformers for streaming tasks. Our experiments on the
English-German, English-French, and English-Spanish language pairs from the
MUST-C dataset demonstrate that when applied to the Augmented Memory
Transformer, a state-of-the-art model for simultaneous speech translation, the
proposed scheme achieves an average increase of 2.09, 1.83, and 1.95 BLEU
scores across each wait-k value for the three language pairs, respectively,
with a minimal impact on computation-aware Average Lagging.
Related papers
- Long-Form End-to-End Speech Translation via Latent Alignment
Segmentation [6.153530338207679]
Current simultaneous speech translation models can process audio only up to a few seconds long.
We propose a novel segmentation approach for a low-latency end-to-end speech translation.
We show that the proposed approach achieves state-of-the-art quality at no additional computational cost.
arXiv Detail & Related papers (2023-09-20T15:10:12Z) - End-to-End Simultaneous Speech Translation with Differentiable
Segmentation [21.03142288187605]
SimulST outputs translation while receiving the streaming speech inputs.
segmenting the speech inputs at unfavorable moments can disrupt the acoustic integrity and adversely affect the performance of the translation model.
We propose Differentiable segmentation (DiSeg) for SimulST to directly learn segmentation from the underlying translation model.
arXiv Detail & Related papers (2023-05-25T14:25:12Z) - Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - RealTranS: End-to-End Simultaneous Speech Translation with Convolutional
Weighted-Shrinking Transformer [33.876412404781846]
RealTranS is an end-to-end model for simultaneous speech translation.
It maps speech features into text space with a weighted-shrinking operation and a semantic encoder.
Experiments show that RealTranS with the Wait-K-Stride-N strategy outperforms prior end-to-end models.
arXiv Detail & Related papers (2021-06-09T06:35:46Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.