Exploring Tokenization Methods for Multitrack Sheet Music Generation
- URL: http://arxiv.org/abs/2410.17584v1
- Date: Wed, 23 Oct 2024 06:19:48 GMT
- Title: Exploring Tokenization Methods for Multitrack Sheet Music Generation
- Authors: Yashan Wang, Shangda Wu, Xingjian Du, Maosong Sun,
- Abstract summary: This study explores the tokenization of multitrack sheet music in ABC notation.
In terms of both computational efficiency and musicality, experimental results show that bar-stream patching performs best overall.
- Score: 48.8206920811097
- License:
- Abstract: This study explores the tokenization of multitrack sheet music in ABC notation, introducing two methods--bar-stream and line-stream patching. We compare these methods against existing techniques, including bar patching, byte patching, and Byte Pair Encoding (BPE). In terms of both computational efficiency and the musicality of the generated compositions, experimental results show that bar-stream patching performs best overall compared to the others, which makes it a promising tokenization strategy for sheet music generation.
Related papers
- Audio-to-Score Conversion Model Based on Whisper methodology [0.0]
This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens.
Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance.
arXiv Detail & Related papers (2024-10-22T17:31:37Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Batching BPE Tokenization Merges [55.2480439325792]
BatchBPE is an open-source pure Python implementation of the Byte Pair algorithm.
It is used to train a high quality tokenizer on a basic laptop.
arXiv Detail & Related papers (2024-08-05T09:37:21Z) - Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding [54.532578213126065]
Most document understanding methods preserve all tokens within sub-images and treat them equally.
This neglects their different informativeness and leads to a significant increase in the number of image tokens.
We propose Token-level Correlation-guided Compression, a parameter-free and plug-and-play methodology to optimize token processing.
arXiv Detail & Related papers (2024-07-19T16:11:15Z) - From Words to Music: A Study of Subword Tokenization Techniques in
Symbolic Music Generation [1.9188864062289432]
Subword tokenization has been widely successful in text-based natural language processing tasks with Transformer-based models.
We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time.
Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition.
arXiv Detail & Related papers (2023-04-18T12:46:12Z) - Symbolic Music Structure Analysis with Graph Representations and
Changepoint Detection Methods [1.1677169430445211]
We propose three methods to segment symbolic music by its form or structure: Norm, G-PELT and G-Window.
We have found that encoding symbolic music with graph representations and computing the novelty of Adjacency Matrices represent the structure of symbolic music pieces well.
arXiv Detail & Related papers (2023-03-24T09:45:11Z) - An Comparative Analysis of Different Pitch and Metrical Grid Encoding
Methods in the Task of Sequential Music Generation [4.941630596191806]
This paper presents an analysis of the influence of pitch and meter on the performance of a token-based sequential music generation model.
For complexity, the single token approach and the multiple token approach are compared; for grid resolution, 0 (ablation), 1 (bar-level), 4 (downbeat-level) 12, (8th-triplet-level) up to 64 (64th-note-grid-level) up to 16 subdivisions per beat are compared.
Results suggest that the class-octave encoding significantly outperforms the taken-for-granted MIDI encoding on pitch-related metrics.
arXiv Detail & Related papers (2023-01-31T03:19:50Z) - Byte Pair Encoding for Symbolic Music [0.0]
Byte Pair embeddings significantly decreases the sequence length while increasing the vocabulary size.
We leverage the embedding capabilities of such models with more expressive tokens, resulting in both better results and faster inference in generation and classification tasks.
The source code is shared on Github, along with a companion website.
arXiv Detail & Related papers (2023-01-27T20:22:18Z) - TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment [68.08689660963468]
A new algorithm called Token-Aware Cascade contrastive learning (TACo) improves contrastive learning using two novel techniques.
We set new state-of-the-art on three public text-video retrieval benchmarks of YouCook2, MSR-VTT and ActivityNet.
arXiv Detail & Related papers (2021-08-23T07:24:57Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.