Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of
Polyphonic Music
- URL: http://arxiv.org/abs/2204.03307v1
- Date: Thu, 7 Apr 2022 09:15:46 GMT
- Title: Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of
Polyphonic Music
- Authors: Xiaoxue Gao, Chitralekha Gupta and Haizhou Li
- Abstract summary: We propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network.
The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs.
Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.
- Score: 73.73045854068384
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Lyrics transcription of polyphonic music is challenging not only because the
singing vocals are corrupted by the background music, but also because the
background music and the singing style vary across music genres, such as pop,
metal, and hip hop, which affects lyrics intelligibility of the song in
different ways. In this work, we propose to transcribe the lyrics of polyphonic
music using a novel genre-conditioned network. The proposed network adopts
pre-trained model parameters, and incorporates the genre adapters between
layers to capture different genre peculiarities for lyrics-genre pairs, thereby
only requiring lightweight genre-specific parameters for training. Our
experiments show that the proposed genre-conditioned network outperforms the
existing lyrics transcription systems.
Related papers
- Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT [0.13654846342364302]
We present a multi-label, cross-lingual genre classification system based on multilingual sentence embeddings generated by sBERT.
Using a bilingual Portuguese-English dataset with eight overlapping genres, we demonstrate the system's ability to train on lyrics in one language and predict genres in another.
arXiv Detail & Related papers (2025-01-07T13:22:35Z) - Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks [0.5524804393257919]
This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time.
We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model.
arXiv Detail & Related papers (2024-11-06T14:29:49Z) - SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics.
The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM.
Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z) - Synthetic Lyrics Detection Across Languages and Genres [4.987546582439803]
Large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity.
Previous research has explored content detection in various domains, but no work has focused on the modality of lyrics in music.
We curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists.
arXiv Detail & Related papers (2024-06-21T15:19:21Z) - Syllable-level lyrics generation from melody exploiting character-level
language model [14.851295355381712]
We propose to exploit fine-tuning character-level language models for syllable-level lyrics generation from symbolic melody.
In particular, our method endeavors to incorporate linguistic knowledge of the language model into the beam search process of a syllable-level Transformer generator network.
arXiv Detail & Related papers (2023-10-02T02:53:29Z) - LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method.
We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model.
Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Music-to-Text Synaesthesia: Generating Descriptive Text from Music
Recordings [36.090928638883454]
Music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding.
We build a computational model to generate sentences that can describe the content of the music recording.
To tackle the highly non-discriminative classical music, we design a group topology-preservation loss.
arXiv Detail & Related papers (2022-10-02T06:06:55Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.