Related papers: MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic music

Related papers

Calliope: An Online Generative Music System for Symbolic Multi-Track Composition [5.649205001069577]
Calliope is a web application that assists in performing a variety of multi-track composition tasks. The user can upload (Musical Instrument Digital Interface) MIDI files, visualize and edit MIDI tracks, and generate partial (via bar in-filling) or complete multi-track content.
arXiv Detail & Related papers (2025-04-18T20:06:18Z)
The GigaMIDI Dataset with Features for Expressive Music Performance Detection [5.585625844344932]
The GigaMIDI dataset contains over 1.4 million unique MIDI files, encompassing 1.8 billion MIDI note events and over 5.3 million MIDI tracks. This curated iteration of GigaMIDI encompasses expressively-performed instrument tracks detected by NOMML, constituting 31% of the GigaMIDI dataset.
arXiv Detail & Related papers (2025-02-24T23:39:40Z)
MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition [4.152843247686306]
MIDI-GPT is a generative system designed for computer-assisted music composition. It supports the infilling of musical material at the track and bar level, and can condition generation on attributes including instrument type, musical style, note density, polyphony level, and note duration. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material.
arXiv Detail & Related papers (2025-01-28T15:17:36Z)
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices. We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z)
MidiCaps: A large-scale MIDI dataset with text captions [6.806050368211496]
This work aims to enable research that combines LLMs with symbolic music by presenting, the first openly available large-scale MIDI dataset with text captions. Inspired by recent advancements in captioning techniques, we present a curated dataset of over 168k MIDI files with textual descriptions.
arXiv Detail & Related papers (2024-06-04T12:21:55Z)
DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z)
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z)
Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset [1.3607388598209322]
We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs. We first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions.
arXiv Detail & Related papers (2023-07-27T11:24:47Z)
Composer's Assistant: An Interactive Transformer for Multi-Track MIDI Infilling [0.0]
Composer's Assistant is a system for interactive human-computer composition in the REAPER digital audio workstation. We train a T5-like model to accomplish the task of multi-track MIDI infilling. Composer's Assistant consists of this model together with scripts that enable interaction with the model in REAPER.
arXiv Detail & Related papers (2023-01-29T19:45:10Z)
A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z)
Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music [0.25559196081940677]
This paper proposes a bi-directional LSTM model with attention mechanism capable of generating similar type of music based on MIDI data. The music generated by the model follows the theme/style of the music the model is trained on.
arXiv Detail & Related papers (2020-11-02T06:43:28Z)
PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence. MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling. We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
Foley Music: Learning to Generate Music from Videos [115.41099127291216]
Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
arXiv Detail & Related papers (2020-07-21T17:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.