Machine Translation Verbosity Control for Automatic Dubbing
- URL: http://arxiv.org/abs/2110.03847v1
- Date: Fri, 8 Oct 2021 01:19:10 GMT
- Title: Machine Translation Verbosity Control for Automatic Dubbing
- Authors: Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh
Virkar, Roberto Barra-Chicote, Robert Enyedi
- Abstract summary: We propose new methods to control the verbosity of machine translation output.
For experiments we use a public data set to dub English speeches into French, Italian, German and Spanish.
We report extensive subjective tests that measure the impact of MT verbosity control on the final quality of dubbed video clips.
- Score: 11.85772502779967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic dubbing aims at seamlessly replacing the speech in a video document
with synthetic speech in a different language. The task implies many
challenges, one of which is generating translations that not only convey the
original content, but also match the duration of the corresponding utterances.
In this paper, we focus on the problem of controlling the verbosity of machine
translation output, so that subsequent steps of our automatic dubbing pipeline
can generate dubs of better quality. We propose new methods to control the
verbosity of MT output and compare them against the state of the art with both
intrinsic and extrinsic evaluations. For our experiments we use a public data
set to dub English speeches into French, Italian, German and Spanish. Finally,
we report extensive subjective tests that measure the impact of MT verbosity
control on the final quality of dubbed video clips.
Related papers
- TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - SBAAM! Eliminating Transcript Dependency in Automatic Subtitling [23.444615994847947]
Subtitling plays a crucial role in enhancing the accessibility of audiovisual content.
Past attempts to automate this process rely to varying degrees, on automatic transcripts.
We introduce the first direct model capable of producing automatic subtitles.
arXiv Detail & Related papers (2024-05-17T12:42:56Z) - SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - Jointly Optimizing Translations and Speech Timing to Improve Isochrony
in Automatic Dubbing [71.02335065794384]
We propose a model that directly optimize both the translation as well as the speech duration of the generated translations.
We show that this system generates speech that better matches the timing of the original speech, compared to prior work, while simplifying the system architecture.
arXiv Detail & Related papers (2023-02-25T04:23:25Z) - Dubbing in Practice: A Large Scale Study of Human Localization With
Insights for Automatic Dubbing [6.26764826816895]
We investigate how humans perform the task of dubbing video content from one language into another.
We leverage a novel corpus of 319.57 hours of video from 54 professionally produced titles.
arXiv Detail & Related papers (2022-12-23T04:12:52Z) - VideoDubber: Machine Translation with Speech-Aware Length Control for
Video Dubbing [73.56970726406274]
Video dubbing aims to translate the original speech in a film or television program into the speech in a target language.
To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech.
We propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation.
arXiv Detail & Related papers (2022-11-30T12:09:40Z) - Direct Speech Translation for Automatic Subtitling [17.095483965591267]
We propose the first direct ST model for automatic subtitling that generates subtitles in the target language along with their timestamps with a single model.
Our experiments on 7 language pairs show that our approach outperforms a cascade system in the same data condition.
arXiv Detail & Related papers (2022-09-27T06:47:42Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Prosody-Aware Neural Machine Translation for Dubbing [9.49303003480503]
We introduce the task of prosody-aware machine translation which aims at generating translations suitable for dubbing.
Dubbing of a spoken sentence requires transferring the content as well as the prosodic structure of the source into the target language to preserve timing information.
We propose an implicit and explicit modeling approaches to integrate prosody information into neural machine translation.
arXiv Detail & Related papers (2021-12-16T01:11:08Z) - Efficient Inference for Multilingual Neural Machine Translation [60.10996883354372]
We consider several ways to make multilingual NMT faster at inference without degrading its quality.
Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality.
arXiv Detail & Related papers (2021-09-14T13:28:13Z) - From Speech-to-Speech Translation to Automatic Dubbing [28.95595497865406]
We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing.
Our architecture features neural machine translation generating output of preferred length, prosodic alignment of the translation with the original speech segments, neural text-to-speech with fine tuning of the duration of each utterance.
arXiv Detail & Related papers (2020-01-19T07:03:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.