Automatic Song Translation for Tonal Languages
- URL: http://arxiv.org/abs/2203.13420v1
- Date: Fri, 25 Mar 2022 02:25:33 GMT
- Title: Automatic Song Translation for Tonal Languages
- Authors: Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie,
Jordan Boyd-Graber
- Abstract summary: We develop a benchmark for English--Mandarin song translation and develop an unsupervised AST system.
Both automatic and human evaluations show GagaST successfully balances semantics and singability.
- Score: 23.08861476320527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper develops automatic song translation (AST) for tonal languages and
addresses the unique challenge of aligning words' tones with melody of a song
in addition to conveying the original meaning. We propose three criteria for
effective AST -- preserving meaning, singability and intelligibility -- and
design metrics for these criteria. We develop a new benchmark for
English--Mandarin song translation and develop an unsupervised AST system,
Guided AliGnment for Automatic Song Translation (GagaST), which combines
pre-training with three decoding constraints. Both automatic and human
evaluations show GagaST successfully balances semantics and singability.
Related papers
- Sing it, Narrate it: Quality Musical Lyrics Translation [0.5735035463793009]
Existing song translation approaches often prioritize singability constraints at the expense of translation quality.
This paper aims to enhance translation quality while maintaining key singability features.
arXiv Detail & Related papers (2024-10-29T14:23:56Z) - REFFLY: Melody-Constrained Lyrics Editing Model [50.03960548399128]
We introduce REFFLY, the first revision framework designed to edit arbitrary forms of plain text draft into high-quality, full-fledged song lyrics.
Our approach ensures that the generated lyrics retain the original meaning of the draft, align with the melody, and adhere to the desired song structures.
arXiv Detail & Related papers (2024-08-30T23:22:34Z) - Lyrics Transcription for Humans: A Readability-Aware Benchmark [1.2499537119440243]
We introduce Jam-ALT, a comprehensive lyrics transcription benchmark.
The benchmark features a complete revision of the JamendoLyrics dataset, along with evaluation metrics designed to capture and assess the lyric-specific nuances.
We apply the benchmark to recent transcription systems and present additional error analysis, as well as an experimental comparison with a classical music dataset.
arXiv Detail & Related papers (2024-07-30T14:20:09Z) - Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark [2.6297569393407416]
We introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset.
First, a complete revision of the transcripts, geared specifically towards ALT evaluation.
Second, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena.
arXiv Detail & Related papers (2023-11-23T13:13:48Z) - DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution.
It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector.
Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z) - LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method.
We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model.
Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z) - AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment [67.10208647482109]
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings.
This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment.
Experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics.
arXiv Detail & Related papers (2023-05-08T06:02:10Z) - Translate the Beauty in Songs: Jointly Learning to Align Melody and
Translate Lyrics [38.35809268026605]
We propose Lyrics-Melody Translation with Adaptive Grouping (LTAG) as a holistic solution to automatic song translation.
It is a novel encoder-decoder framework that can simultaneously translate the source lyrics and determine the number of aligned notes at each decoding step.
Experiments conducted on an English-Chinese song translation data set show the effectiveness of our model in both automatic and human evaluation.
arXiv Detail & Related papers (2023-03-28T03:17:59Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z) - Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem.
We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.