Lyrics Transcription for Humans: A Readability-Aware Benchmark
- URL: http://arxiv.org/abs/2408.06370v1
- Date: Tue, 30 Jul 2024 14:20:09 GMT
- Title: Lyrics Transcription for Humans: A Readability-Aware Benchmark
- Authors: Ondřej Cífka, Hendrik Schreiber, Luke Miner, Fabian-Robert Stöter,
- Abstract summary: We introduce Jam-ALT, a comprehensive lyrics transcription benchmark.
The benchmark features a complete revision of the JamendoLyrics dataset, along with evaluation metrics designed to capture and assess the lyric-specific nuances.
We apply the benchmark to recent transcription systems and present additional error analysis, as well as an experimental comparison with a classical music dataset.
- Score: 1.2499537119440243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Writing down lyrics for human consumption involves not only accurately capturing word sequences, but also incorporating punctuation and formatting for clarity and to convey contextual information. This includes song structure, emotional emphasis, and contrast between lead and background vocals. While automatic lyrics transcription (ALT) systems have advanced beyond producing unstructured strings of words and are able to draw on wider context, ALT benchmarks have not kept pace and continue to focus exclusively on words. To address this gap, we introduce Jam-ALT, a comprehensive lyrics transcription benchmark. The benchmark features a complete revision of the JamendoLyrics dataset, in adherence to industry standards for lyrics transcription and formatting, along with evaluation metrics designed to capture and assess the lyric-specific nuances, laying the foundation for improving the readability of lyrics. We apply the benchmark to recent transcription systems and present additional error analysis, as well as an experimental comparison with a classical music dataset.
Related papers
- REFFLY: Melody-Constrained Lyrics Editing Model [50.03960548399128]
We introduce REFFLY, the first revision framework designed to edit arbitrary forms of plain text draft into high-quality, full-fledged song lyrics.
Our approach ensures that the generated lyrics retain the original meaning of the draft, align with the melody, and adhere to the desired song structures.
arXiv Detail & Related papers (2024-08-30T23:22:34Z) - A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features
For Classical Vocal Performance [7.488651253072641]
The goal of real-time lyrics alignment is to take live singing audio as input and pinpoint the exact position within given lyrics on the fly.
The task can benefit real-world applications such as the automatic subtitling of live concerts or operas.
This paper presents a real-time lyrics alignment system for classical vocal performances with two contributions.
arXiv Detail & Related papers (2024-01-17T13:25:32Z) - Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark [2.6297569393407416]
We introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset.
First, a complete revision of the transcripts, geared specifically towards ALT evaluation.
Second, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena.
arXiv Detail & Related papers (2023-11-23T13:13:48Z) - LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method.
We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model.
Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Unsupervised Melody-Guided Lyrics Generation [84.22469652275714]
We propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data.
We leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process.
arXiv Detail & Related papers (2023-05-12T20:57:20Z) - Bridging Music and Text with Crowdsourced Music Comments: A
Sequence-to-Sequence Framework for Thematic Music Comments Generation [18.2750732408488]
We exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music.
To enhance the authenticity and thematicity of generated texts, we propose a discriminator and a novel topic evaluator.
arXiv Detail & Related papers (2022-09-05T14:51:51Z) - SongMASS: Automatic Song Writing with Pre-training and Alignment
Constraint [54.012194728496155]
SongMASS is proposed to overcome the challenges of lyric-to-melody generation and melody-to-lyric generation.
It leverages masked sequence to sequence (MASS) pre-training and attention based alignment modeling.
We show that SongMASS generates lyric and melody with significantly better quality than the baseline method.
arXiv Detail & Related papers (2020-12-09T16:56:59Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.