Related papers: Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

URL: http://arxiv.org/abs/2204.03307v1
Date: Thu, 7 Apr 2022 09:15:46 GMT
Title: Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Authors: Xiaoxue Gao, Chitralekha Gupta and Haizhou Li
Abstract summary: We propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network. The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs. Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.
Score: 73.73045854068384
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by the background music, but also because the background music and the singing style vary across music genres, such as pop, metal, and hip hop, which affects lyrics intelligibility of the song in different ways. In this work, we propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network. The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs, thereby only requiring lightweight genre-specific parameters for training. Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.

Related papers

Multi-label Cross-lingual automatic music genre classification from lyrics with Sentence BERT [0.13654846342364302]
We present a multi-label, cross-lingual genre classification system based on multilingual sentence embeddings generated by sBERT. Using a bilingual Portuguese-English dataset with eight overlapping genres, we demonstrate the system's ability to train on lyrics in one language and predict genres in another.
arXiv Detail & Related papers (2025-01-07T13:22:35Z)
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks [0.5524804393257919]
This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time. We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model.
arXiv Detail & Related papers (2024-11-06T14:29:49Z)
SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics. The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM. Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z)
REFFLY: Melody-Constrained Lyrics Editing Model [50.03960548399128]
This paper introduces REFFLY, the first revision framework for editing and generating melody-aligned lyrics.<n>We train the lyric revision module using our synthesized melody-aligned lyrics dataset.<n>To further enhance the revision ability, we propose training-frees aimed at preserving both semantic meaning and musical consistency.
arXiv Detail & Related papers (2024-08-30T23:22:34Z)
Syllable-level lyrics generation from melody exploiting character-level language model [14.851295355381712]
We propose to exploit fine-tuning character-level language models for syllable-level lyrics generation from symbolic melody. In particular, our method endeavors to incorporate linguistic knowledge of the language model into the beam search process of a syllable-level Transformer generator network.
arXiv Detail & Related papers (2023-10-02T02:53:29Z)
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method. We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z)
Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints. Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z)
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings [36.090928638883454]
Music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding. We build a computational model to generate sentences that can describe the content of the music recording. To tackle the highly non-discriminative classical music, we design a group topology-preservation loss.
arXiv Detail & Related papers (2022-10-02T06:06:55Z)
MuLan: A Joint Embedding of Music Audio and Natural Language [15.753767984842014]
This paper presents a new generation of models that link audio annotations directly to natural language descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings.
arXiv Detail & Related papers (2022-08-26T03:13:21Z)
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation. ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
Multilingual Music Genre Embeddings for Effective Cross-Lingual Music Item Annotation [9.709229853995987]
Cross-lingual music genre translation is possible without relying on a parallel corpus. By learning multilingual music genre embeddings, we enable cross-lingual music genre translation without relying on a parallel corpus. Our method is effective in translating music genres across tag systems in multiple languages.
arXiv Detail & Related papers (2020-09-16T15:39:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.