Sing it, Narrate it: Quality Musical Lyrics Translation
- URL: http://arxiv.org/abs/2410.22066v1
- Date: Tue, 29 Oct 2024 14:23:56 GMT
- Title: Sing it, Narrate it: Quality Musical Lyrics Translation
- Authors: Zhuorui Ye, Jinhan Li, Rongwu Xu,
- Abstract summary: Existing song translation approaches often prioritize singability constraints at the expense of translation quality.
This paper aims to enhance translation quality while maintaining key singability features.
- Score: 0.5735035463793009
- License:
- Abstract: Translating lyrics for musicals presents unique challenges due to the need to ensure high translation quality while adhering to singability requirements such as length and rhyme. Existing song translation approaches often prioritize these singability constraints at the expense of translation quality, which is crucial for musicals. This paper aims to enhance translation quality while maintaining key singability features. Our method consists of three main components. First, we create a dataset to train reward models for the automatic evaluation of translation quality. Second, to enhance both singability and translation quality, we implement a two-stage training process with filtering techniques. Finally, we introduce an inference-time optimization framework for translating entire songs. Extensive experiments, including both automatic and human evaluations, demonstrate significant improvements over baseline methods and validate the effectiveness of each component in our approach.
Related papers
- GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores.
We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset.
We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method.
We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model.
Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z) - Songs Across Borders: Singable and Controllable Neural Lyric Translation [17.878364279808604]
This paper bridges the singability quality gap by formalizing lyric translation into a constrained translation problem.
We convert theoretical guidance and practical techniques from translatology literature to prompt-driven NMT approaches.
Our model achieves 99.85%, 99.00%, and 95.52% on length accuracy, rhyme accuracy, and word boundary recall.
arXiv Detail & Related papers (2023-05-26T10:50:17Z) - Translate the Beauty in Songs: Jointly Learning to Align Melody and
Translate Lyrics [38.35809268026605]
We propose Lyrics-Melody Translation with Adaptive Grouping (LTAG) as a holistic solution to automatic song translation.
It is a novel encoder-decoder framework that can simultaneously translate the source lyrics and determine the number of aligned notes at each decoding step.
Experiments conducted on an English-Chinese song translation data set show the effectiveness of our model in both automatic and human evaluation.
arXiv Detail & Related papers (2023-03-28T03:17:59Z) - Automatic Song Translation for Tonal Languages [23.08861476320527]
We develop a benchmark for English--Mandarin song translation and develop an unsupervised AST system.
Both automatic and human evaluations show GagaST successfully balances semantics and singability.
arXiv Detail & Related papers (2022-03-25T02:25:33Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Speech Translation by Understanding and Learning from the
Auxiliary Text Translation Task [26.703809355057224]
We conduct a detailed analysis to understand the impact of the auxiliary task on the primary task within the multitask learning framework.
Our analysis confirms that multitask learning tends to generate similar decoder representations from different modalities.
Inspired by these findings, we propose three methods to improve translation quality.
arXiv Detail & Related papers (2021-07-12T23:53:40Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.