MusPy: A Toolkit for Symbolic Music Generation
- URL: http://arxiv.org/abs/2008.01951v1
- Date: Wed, 5 Aug 2020 06:16:13 GMT
- Title: MusPy: A Toolkit for Symbolic Music Generation
- Authors: Hao-Wen Dong, Ke Chen, Julian McAuley, Taylor Berg-Kirkpatrick
- Abstract summary: MusPy is an open source Python library for symbolic music generation.
In this paper, we present statistical analysis of the eleven datasets currently supported by MusPy.
- Score: 32.01713268702699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present MusPy, an open source Python library for symbolic
music generation. MusPy provides easy-to-use tools for essential components in
a music generation system, including dataset management, data I/O, data
preprocessing and model evaluation. In order to showcase its potential, we
present statistical analysis of the eleven datasets currently supported by
MusPy. Moreover, we conduct a cross-dataset generalizability experiment by
training an autoregressive model on each dataset and measuring held-out
likelihood on the others---a process which is made easier by MusPy's dataset
management system. The results provide a map of domain overlap between various
commonly used datasets and show that some datasets contain more representative
cross-genre samples than others. Along with the dataset analysis, these results
might serve as a guide for choosing datasets in future research. Source code
and documentation are available at https://github.com/salu133445/muspy .
Related papers
- Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Minimalist Data Wrangling with Python [4.429175633425273]
Data Wrangling with Python is envisaged as a student's first introduction to data science.
It provides a high-level overview as well as discussing key concepts in detail.
arXiv Detail & Related papers (2022-11-09T01:24:39Z) - FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in
Realistic Healthcare Settings [51.09574369310246]
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models.
We propose a novel cross-silo dataset suite focused on healthcare, FLamby, to bridge the gap between theory and practice of cross-silo FL.
Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research.
arXiv Detail & Related papers (2022-10-10T12:17:30Z) - DataLab: A Platform for Data Analysis and Intervention [96.75253335629534]
DataLab is a unified data-oriented platform that allows users to interactively analyze the characteristics of data.
toolname has features for dataset recommendation and global vision analysis.
So far, DataLab covers 1,715 datasets and 3,583 of its transformed version.
arXiv Detail & Related papers (2022-02-25T18:32:19Z) - NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information.
We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.