Related papers: A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

URL: http://arxiv.org/abs/2011.06801v1
Date: Fri, 13 Nov 2020 08:01:20 GMT
Title: A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions
Authors: Shulei Ji, Jing Luo, Xinyu Yang
Abstract summary: This paper attempts to provide an overview of various composition tasks under different music generation levels using deep learning. In addition, we summarize datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.
Score: 10.179835761549471
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly. Previous surveys have explored the network models employed in the field of automatic music generation. However, the development history, the model evolution, as well as the pros and cons of same music generation task have not been clearly illustrated. This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning. In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.

Related papers

Vision-to-Music Generation: A Survey [10.993775589904251]
Vision-to-music generation shows vast application prospects in fields such as film scoring, short video creation, and dance music synthesis. Research in vision-to-music is still in its preliminary stage due to its complex internal structure and the difficulty of modeling dynamic relationships with video. Existing surveys focus on general music generation without comprehensive discussion on vision-to-music.
arXiv Detail & Related papers (2025-03-27T08:21:54Z)
A Comprehensive Survey on Generative AI for Video-to-Music Generation [15.575851379886952]
This paper presents a comprehensive review of video-to-music generation using deep generative AI techniques. We focus on three key components: visual feature extraction, music generation frameworks, and conditioning mechanisms.
arXiv Detail & Related papers (2025-02-18T03:18:54Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
Multi-Genre Music Transformer -- Composing Full Length Musical Piece [0.0]
The objective of the project is to implement a Multi-Genre Transformer which learns to produce music pieces through more adaptive learning process. We built a multi-genre compound word dataset, implemented a linear transformer which was trained on this dataset. We call this Multi-Genre Transformer, which was able to generate full length new musical pieces which is diverse and comparable to original tracks.
arXiv Detail & Related papers (2023-01-06T05:27:55Z)
A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data. The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z)
Evaluating Deep Music Generation Methods Using Data Augmentation [13.72212417973239]
We focus on a homogeneous, objective framework for evaluating samples of algorithmically generated music. We do not seek to assess the musical merit of generated music, but instead explore whether generated samples contain meaningful information pertaining to emotion or mood/theme.
arXiv Detail & Related papers (2021-12-31T20:35:46Z)
Multi-task Learning with Metadata for Music Mood Classification [0.0]
Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation. We propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks. Applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently.
arXiv Detail & Related papers (2021-10-10T11:36:34Z)
Personalized Popular Music Generation Using Imitation and Structure [1.971709238332434]
We propose a statistical machine learning model that is able to capture and imitate the structure, melody, chord, and bass style from a given example seed song. An evaluation using 10 pop songs shows that our new representations and methods are able to create high-quality stylistic music.
arXiv Detail & Related papers (2021-05-10T23:43:00Z)
From Artificial Neural Networks to Deep Learning for Music Generation -- History, Concepts and Trends [0.0]
This paper provides a tutorial on music generation based on deep learning techniques. It analyzes some early works from the late 1980s using artificial neural networks for music generation.
arXiv Detail & Related papers (2020-04-07T00:33:56Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.