From Words to Music: A Study of Subword Tokenization Techniques in
Symbolic Music Generation
- URL: http://arxiv.org/abs/2304.08953v2
- Date: Tue, 25 Apr 2023 11:16:53 GMT
- Title: From Words to Music: A Study of Subword Tokenization Techniques in
Symbolic Music Generation
- Authors: Adarsh Kumar and Pedro Sarmento
- Abstract summary: Subword tokenization has been widely successful in text-based natural language processing tasks with Transformer-based models.
We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time.
Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition.
- Score: 1.9188864062289432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Subword tokenization has been widely successful in text-based natural
language processing (NLP) tasks with Transformer-based models. As Transformer
models become increasingly popular in symbolic music-related studies, it is
imperative to investigate the efficacy of subword tokenization in the symbolic
music domain. In this paper, we explore subword tokenization techniques, such
as byte-pair encoding (BPE), in symbolic music generation and its impact on the
overall structure of generated songs. Our experiments are based on three types
of MIDI datasets: single track-melody only, multi-track with a single
instrument, and multi-track and multi-instrument. We apply subword tokenization
on post-musical tokenization schemes and find that it enables the generation of
longer songs at the same time and improves the overall structure of the
generated music in terms of objective metrics like structure indicator (SI),
Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE
and Unigram, and observe that both methods lead to consistent improvements. Our
study suggests that subword tokenization is a promising technique for symbolic
music generation and may have broader implications for music composition,
particularly in cases involving complex data such as multi-track songs.
Related papers
- MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Natural Language Processing Methods for Symbolic Music Generation and
Information Retrieval: a Survey [6.416887247454113]
This survey reviews NLP methods applied to symbolic music generation and information retrieval studies.
We first propose an overview of representations of symbolic music adapted from natural language sequential representations.
We describe these models, in particular deep learning models, through different prisms, highlighting music-specialized mechanisms.
arXiv Detail & Related papers (2024-02-27T12:48:01Z) - Structure-informed Positional Encoding for Music Generation [0.0]
We propose a structure-informed positional encoding framework for music generation with Transformers.
We test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation.
Our methods improve the melodic and structural consistency of the generated pieces.
arXiv Detail & Related papers (2024-02-20T13:41:35Z) - Impact of time and note duration tokenizations on deep learning symbolic
music modeling [0.0]
We analyze the common tokenization methods and experiment with time and note duration representations.
We demonstrate that explicit information leads to better results depending on the task.
arXiv Detail & Related papers (2023-10-12T16:56:37Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Byte Pair Encoding for Symbolic Music [0.0]
Byte Pair embeddings significantly decreases the sequence length while increasing the vocabulary size.
We leverage the embedding capabilities of such models with more expressive tokens, resulting in both better results and faster inference in generation and classification tasks.
The source code is shared on Github, along with a companion website.
arXiv Detail & Related papers (2023-01-27T20:22:18Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.