Related papers: F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation

F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation

URL: http://arxiv.org/abs/2502.10491v1
Date: Fri, 14 Feb 2025 13:15:18 GMT
Title: F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation
Authors: Manvi Agarwal, Changhong Wang, Gael Richard,
Abstract summary: We propose F-StrIPE, a structure-informed PE scheme that works in linear complexity.<n>We illustrate the empirical merits of F-StrIPE using melody for symbolic music.
Score: 1.3108652488669736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While music remains a challenging domain for generative models like Transformers, recent progress has been made by exploiting suitable musically-informed priors. One technique to leverage information about musical structure in Transformers is inserting such knowledge into the positional encoding (PE) module. However, Transformers carry a quadratic cost in sequence length. In this paper, we propose F-StrIPE, a structure-informed PE scheme that works in linear complexity. Using existing kernel approximation techniques based on random features, we show that F-StrIPE is a generalization of Stochastic Positional Encoding (SPE). We illustrate the empirical merits of F-StrIPE using melody harmonization for symbolic music.

Related papers

PaTH Attention: Position Encoding via Accumulating Householder Transformations [56.32365080761523]
PaTH is a flexible data-dependent position encoding scheme based on accumulated products of Householder transformations.<n>We derive an efficient parallel algorithm for training through exploiting a compact representation of products of Householder matrices.
arXiv Detail & Related papers (2025-05-22T08:36:09Z)
Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation [1.3108652488669736]
We present a unified framework based on kernel methods to analyze both families of efficient PEs. We develop a novel PE method called RoPE, capable of extracting causal relationships from temporal sequences. For empirical validation, we use a symbolic music generation task, namely, melody harmonization.
arXiv Detail & Related papers (2025-04-07T11:51:29Z)
Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment [0.0]
Music102 enhances chord progression accompaniment through a D12-equivariant transformer. By encoding prior music knowledge, the model maintains equivariance across both melody and chord sequences. This work showcases the adaptability of self-attention mechanisms and layer normalization to the discrete musical domain.
arXiv Detail & Related papers (2024-10-23T03:11:01Z)
End-to-end Piano Performance-MIDI to Score Conversion with Transformers [26.900974153235456]
We present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files. We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data. Our method is also the first to directly predict notational details like trill marks or stem direction from performance data.
arXiv Detail & Related papers (2024-09-30T20:11:37Z)
Structure-informed Positional Encoding for Music Generation [0.0]
We propose a structure-informed positional encoding framework for music generation with Transformers. We test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. Our methods improve the melodic and structural consistency of the generated pieces.
arXiv Detail & Related papers (2024-02-20T13:41:35Z)
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners) They incorporate a wide range of relative positional encoding mechanisms (RPEs) These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z)
A K-variate Time Series Is Worth K Words: Evolution of the Vanilla Transformer Architecture for Long-term Multivariate Time Series Forecasting [52.33042819442005]
Transformer has become the de facto solution for MTSF, especially for the long-term cases. In this study, we point out that the current tokenization strategy in MTSF Transformer architectures ignores the token inductive bias of Transformers. We make a series of evolution on the basic architecture of the vanilla MTSF transformer. Surprisingly, the evolved simple transformer architecture is highly effective, which successfully avoids the over-smoothing phenomena in the vanilla MTSF transformer.
arXiv Detail & Related papers (2022-12-06T07:00:31Z)
Your Transformer May Not be as Powerful as You Expect [88.11364619182773]
We mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. We present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. We develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions.
arXiv Detail & Related papers (2022-05-26T14:51:30Z)
Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model. A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens. Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z)
Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths. We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately. The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE) Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions [37.66340344198797]
We build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models. In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music.
arXiv Detail & Related papers (2020-02-01T14:12:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.