Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
- URL: http://arxiv.org/abs/2504.05364v1
- Date: Mon, 07 Apr 2025 11:51:29 GMT
- Title: Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
- Authors: Manvi Agarwal, Changhong Wang, Gael Richard,
- Abstract summary: We present a unified framework based on kernel methods to analyze both families of efficient PEs.<n>We develop a novel PE method called RoPE, capable of extracting causal relationships from temporal sequences.<n>For empirical validation, we use a symbolic music generation task, namely, melody harmonization.
- Score: 1.3108652488669736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While music remains a challenging domain for generative models like Transformers, a two-pronged approach has recently proved successful: inserting musically-relevant structural information into the positional encoding (PE) module and using kernel approximation techniques based on Random Fourier Features (RFF) to lower the computational cost from quadratic to linear. Yet, it is not clear how such RFF-based efficient PEs compare with those based on rotation matrices, such as Rotary Positional Encoding (RoPE). In this paper, we present a unified framework based on kernel methods to analyze both families of efficient PEs. We use this framework to develop a novel PE method called RoPEPool, capable of extracting causal relationships from temporal sequences. Using RFF-based PEs and rotation-based PEs, we demonstrate how seemingly disparate PEs can be jointly studied by considering the content-context interactions they induce. For empirical validation, we use a symbolic music generation task, namely, melody harmonization. We show that RoPEPool, combined with highly-informative structural priors, outperforms all methods.
Related papers
- F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation [1.3108652488669736]
We propose F-StrIPE, a structure-informed PE scheme that works in linear complexity.<n>We illustrate the empirical merits of F-StrIPE using melody for symbolic music.
arXiv Detail & Related papers (2025-02-14T13:15:18Z) - Fast Gradient Computation for RoPE Attention in Almost Linear Time [27.28314860714307]
We develop the first almost linear time algorithm for backward computations in RoPE-based attention under bounded entries.<n>Our approach builds on recent advancements in fast RoPE attention computations.
arXiv Detail & Related papers (2024-12-23T06:20:22Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation [30.797422827190278]
We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis.
RoSA trains $textitlow-rank$ and $textithighly-sparse$ components on top of a set of fixed pretrained weights.
We show that RoSA outperforms LoRA, pure sparse fine-tuning, and alternative hybrid methods at the same parameter budget.
arXiv Detail & Related papers (2024-01-09T17:09:01Z) - NPEFF: Non-Negative Per-Example Fisher Factorization [52.44573961263344]
We introduce a novel interpretability method called NPEFF that is readily applicable to any end-to-end differentiable model.
We demonstrate that NPEFF has interpretable tunings through experiments on language and vision models.
arXiv Detail & Related papers (2023-10-07T02:02:45Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z) - Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners)
They incorporate a wide range of relative positional encoding mechanisms (RPEs)
These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z) - IFRNet: Intermediate Feature Refine Network for Efficient Frame
Interpolation [44.04110765492441]
We devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing.
Experiments on various benchmarks demonstrate the excellent performance and fast inference speed of proposed approaches.
arXiv Detail & Related papers (2022-05-29T10:18:18Z) - Functional Regularization for Reinforcement Learning via Learned Fourier
Features [98.90474131452588]
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis.
We show that it improves the sample efficiency of both state-based and image-based RL.
arXiv Detail & Related papers (2021-12-06T18:59:52Z) - A Generalizable Model-and-Data Driven Approach for Open-Set RFF
Authentication [74.63333951647581]
Radio-frequency fingerprints(RFFs) are promising solutions for realizing low-cost physical layer authentication.
Machine learning-based methods have been proposed for RFF extraction and discrimination.
We propose a new end-to-end deep learning framework for extracting RFFs from raw received signals.
arXiv Detail & Related papers (2021-08-10T03:59:37Z) - Relative Positional Encoding for Transformers with Linear Complexity [30.48367640796256]
relative positional encoding (RPE) was proposed as beneficial for classical Transformers.
RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix.
In this paper, we present precisely what is precisely what is a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE.
arXiv Detail & Related papers (2021-05-18T09:52:32Z) - Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability.
We formulate the optimization of MetaVRF as a variational inference problem.
We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.