Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of
Polyphonic Music
- URL: http://arxiv.org/abs/2204.03307v1
- Date: Thu, 7 Apr 2022 09:15:46 GMT
- Title: Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of
Polyphonic Music
- Authors: Xiaoxue Gao, Chitralekha Gupta and Haizhou Li
- Abstract summary: We propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network.
The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs.
Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.
- Score: 73.73045854068384
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Lyrics transcription of polyphonic music is challenging not only because the
singing vocals are corrupted by the background music, but also because the
background music and the singing style vary across music genres, such as pop,
metal, and hip hop, which affects lyrics intelligibility of the song in
different ways. In this work, we propose to transcribe the lyrics of polyphonic
music using a novel genre-conditioned network. The proposed network adopts
pre-trained model parameters, and incorporates the genre adapters between
layers to capture different genre peculiarities for lyrics-genre pairs, thereby
only requiring lightweight genre-specific parameters for training. Our
experiments show that the proposed genre-conditioned network outperforms the
existing lyrics transcription systems.
Related papers
- Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks [0.5524804393257919]
This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time.
We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model.
arXiv Detail & Related papers (2024-11-06T14:29:49Z) - SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics.
The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM.
Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z) - Syllable-level lyrics generation from melody exploiting character-level
language model [14.851295355381712]
We propose to exploit fine-tuning character-level language models for syllable-level lyrics generation from symbolic melody.
In particular, our method endeavors to incorporate linguistic knowledge of the language model into the beam search process of a syllable-level Transformer generator network.
arXiv Detail & Related papers (2023-10-02T02:53:29Z) - LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method.
We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model.
Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Music-to-Text Synaesthesia: Generating Descriptive Text from Music
Recordings [36.090928638883454]
Music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding.
We build a computational model to generate sentences that can describe the content of the music recording.
To tackle the highly non-discriminative classical music, we design a group topology-preservation loss.
arXiv Detail & Related papers (2022-10-02T06:06:55Z) - MuLan: A Joint Embedding of Music Audio and Natural Language [15.753767984842014]
This paper presents a new generation of models that link audio annotations directly to natural language descriptions.
MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings.
arXiv Detail & Related papers (2022-08-26T03:13:21Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z) - Multilingual Music Genre Embeddings for Effective Cross-Lingual Music
Item Annotation [9.709229853995987]
Cross-lingual music genre translation is possible without relying on a parallel corpus.
By learning multilingual music genre embeddings, we enable cross-lingual music genre translation without relying on a parallel corpus.
Our method is effective in translating music genres across tag systems in multiple languages.
arXiv Detail & Related papers (2020-09-16T15:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.