miditok: A Python package for MIDI file tokenization
- URL: http://arxiv.org/abs/2310.17202v1
- Date: Thu, 26 Oct 2023 07:37:44 GMT
- Title: miditok: A Python package for MIDI file tokenization
- Authors: Nathan Fradet, Jean-Pierre Briot, Fabien Chhel, Amal El Fallah
Seghrouchni, Nicolas Gutowski
- Abstract summary: MidiTok is an open-source library allowing to tokenize symbolic music.
It features the most popular music tokenizations, under a unified API.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent progress in natural language processing has been adapted to the
symbolic music modality. Language models, such as Transformers, have been used
with symbolic music for a variety of tasks among which music generation,
modeling or transcription, with state-of-the-art performances. These models are
beginning to be used in production products. To encode and decode music for the
backbone model, they need to rely on tokenizers, whose role is to serialize
music into sequences of distinct elements called tokens. MidiTok is an
open-source library allowing to tokenize symbolic music with great flexibility
and extended features. It features the most popular music tokenizations, under
a unified API. It is made to be easily used and extensible for everyone.
Related papers
- TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument [19.395289629201056]
Token Synth is a novel neural synthesizer that generates audio tokens from MIDI tokens and CLAP embedding.
Our model is capable of performing instrument cloning, text-to-instrument synthesis, and text-guided timbre manipulation.
arXiv Detail & Related papers (2025-02-13T03:40:30Z) - Text2midi: Generating Symbolic Music from Captions [7.133321587053803]
This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions.
We utilize a pretrained LLM encoder to process captions, which then condition an autoregressive transformer decoder to produce MIDI sequences.
We conduct comprehensive empirical evaluations, incorporating both automated and human studies, that show our model generates MIDI files of high quality.
arXiv Detail & Related papers (2024-12-21T08:09:12Z) - Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation [2.668651175000492]
Representing symbolic music with compound tokens, where each token consists of several different sub-tokens, offers the advantage of reducing sequence length.
We introduce the Nested Music Transformer (NMT), an architecture tailored for decoding compound tokens autoregressively, similar to processing flattened tokens, but with low memory usage.
Experiment results showed that applying the NMT to compound tokens can enhance the performance in terms of better perplexity in processing various symbolic music datasets and discrete audio tokens from the MAESTRO dataset.
arXiv Detail & Related papers (2024-08-02T11:02:38Z) - MidiCaps: A large-scale MIDI dataset with text captions [6.806050368211496]
This work aims to enable research that combines LLMs with symbolic music by presenting, the first openly available large-scale MIDI dataset with text captions.
Inspired by recent advancements in captioning techniques, we present a curated dataset of over 168k MIDI files with textual descriptions.
arXiv Detail & Related papers (2024-06-04T12:21:55Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Byte Pair Encoding for Symbolic Music [0.0]
Byte Pair embeddings significantly decreases the sequence length while increasing the vocabulary size.
We leverage the embedding capabilities of such models with more expressive tokens, resulting in both better results and faster inference in generation and classification tasks.
The source code is shared on Github, along with a companion website.
arXiv Detail & Related papers (2023-01-27T20:22:18Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z) - Foley Music: Learning to Generate Music from Videos [115.41099127291216]
Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.
We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
arXiv Detail & Related papers (2020-07-21T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.