Related papers: MusPy: A Toolkit for Symbolic Music Generation

MusPy: A Toolkit for Symbolic Music Generation

URL: http://arxiv.org/abs/2008.01951v1
Date: Wed, 5 Aug 2020 06:16:13 GMT
Title: MusPy: A Toolkit for Symbolic Music Generation
Authors: Hao-Wen Dong, Ke Chen, Julian McAuley, Taylor Berg-Kirkpatrick
Abstract summary: MusPy is an open source Python library for symbolic music generation. In this paper, we present statistical analysis of the eleven datasets currently supported by MusPy.
Score: 32.01713268702699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including dataset management, data I/O, data preprocessing and model evaluation. In order to showcase its potential, we present statistical analysis of the eleven datasets currently supported by MusPy. Moreover, we conduct a cross-dataset generalizability experiment by training an autoregressive model on each dataset and measuring held-out likelihood on the others---a process which is made easier by MusPy's dataset management system. The results provide a map of domain overlap between various commonly used datasets and show that some datasets contain more representative cross-genre samples than others. Along with the dataset analysis, these results might serve as a guide for choosing datasets in future research. Source code and documentation are available at https://github.com/salu133445/muspy .

Related papers

MLPrE -- A tool for preprocessing and exploratory data analysis prior to machine learning model construction [0.24629531282150877]
We present Machine Learning Preprocessing and Exploratory Data Analysis,DatarE.<n>DataFrames were utilized to hold data during processing and ensure scalability.<n>A total of 69 stages were implemented intorE, of which we highlight and demonstrate key stages using six diverse datasets.
arXiv Detail & Related papers (2025-10-29T17:52:39Z)
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
What Makes Good Synthetic Training Data for Zero-Shot Stereo Matching? [57.49867420132091]
We report the effects on zero-shot stereo matching performance using standard benchmarks.<n>We validate our findings by collecting the best settings and creating a large-scale dataset.<n>We open-source our system to enable further research on procedural stereo datasets.
arXiv Detail & Related papers (2025-04-23T17:59:33Z)
dnamite: A Python Package for Neural Additive Models [18.987678432106563]
This paper introduces dnamite, a Python package that implements Neural Additive Models (NAMs) We describe the methodology underlying dnamite, its design principles, and its implementation. We demonstrate the utility of dnamite in a real-world setting where feature selection and survival analysis are both important.
arXiv Detail & Related papers (2025-03-06T00:24:54Z)
Mixtera: A Data Plane for Foundation Model Training [1.797352319167759]
We build and present Mixtera, a data plane for foundation model training. We show that Mixtera does not bottleneck training and scales to 256 GH200 superchips. We also explore the role of mixtures for vision-language models.
arXiv Detail & Related papers (2025-02-27T05:55:44Z)
Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z)
LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data [3.66486428341988]
We propose LUMA, a unique multimodal dataset, featuring audio, image, and textual data from 50 classes.<n>It extends the well-known CIFAR 10/100 dataset with audio samples extracted from three audio corpora, and text data generated using the Gemma-7B Large Language Model (LLM)<n>The LUMA dataset enables the controlled injection of varying types and degrees of uncertainty to achieve and tailor specific experiments and benchmarking initiatives.
arXiv Detail & Related papers (2024-06-14T09:22:07Z)
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization. infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z)
Minimalist Data Wrangling with Python [4.429175633425273]
Data Wrangling with Python is envisaged as a student's first introduction to data science. It provides a high-level overview as well as discussing key concepts in detail.
arXiv Detail & Related papers (2022-11-09T01:24:39Z)
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings [51.09574369310246]
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models. We propose a novel cross-silo dataset suite focused on healthcare, FLamby, to bridge the gap between theory and practice of cross-silo FL. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research.
arXiv Detail & Related papers (2022-10-10T12:17:30Z)
DataLab: A Platform for Data Analysis and Intervention [96.75253335629534]
DataLab is a unified data-oriented platform that allows users to interactively analyze the characteristics of data. toolname has features for dataset recommendation and global vision analysis. So far, DataLab covers 1,715 datasets and 3,583 of its transformed version.
arXiv Detail & Related papers (2022-02-25T18:32:19Z)
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z)
Multitask learning for instrument activation aware music source separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance. We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset. The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.