Related papers: Algebraic Positional Encodings

Algebraic Positional Encodings

URL: http://arxiv.org/abs/2312.16045v3
Date: Thu, 31 Oct 2024 08:55:49 GMT
Title: Algebraic Positional Encodings
Authors: Konstantinos Kogkalidis, Jean-Philippe Bernardy, Vikas Garg,
Abstract summary: We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art.
Score: 7.0975366862235445
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds its desired structural properties. Our scheme can accommodate various structures, ncluding sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyper-parameter optimizations or "task search" of any kind. Code is available at https://github.com/konstantinosKokos/ape.

Related papers

Hierarchical Modeling and Architecture Optimization: Review and Unified Framework [0.6291443816903801]
This paper reviews literature on structured input spaces and proposes a unified framework that generalizes existing approaches.<n>A variable is described as meta if its value governs the presence of other decreed variables, enabling the modeling of conditional and hierarchical structures.
arXiv Detail & Related papers (2025-06-27T20:38:57Z)
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning [0.0]
We introduce a new structure for compositional embeddings built on directional non-commutative monoidal operators.<n>Our construction defines a distinct composition operator circ_i for each axis i, ensuring associative combination along each axis without imposing global commutativity.<n>All axis-specific operators commute with one another, enforcing a global interchange law that enables consistent crossaxis compositions.
arXiv Detail & Related papers (2025-05-21T13:27:14Z)
ffstruc2vec: Flat, Flexible and Scalable Learning of Node Representations from Structural Identities [0.0]
This paper introduces ffstruc2vec, a scalable deep-learning framework for learning node embedding vectors that preserve structural identities. Its flat, efficient architecture allows high flexibility in capturing diverse types of structural patterns, enabling broad adaptability to various downstream application tasks. The proposed framework significantly outperforms existing approaches across diverse unsupervised and supervised tasks in practical applications.
arXiv Detail & Related papers (2025-04-01T18:47:16Z)
AutoBayes: A Compositional Framework for Generalized Variational Inference [0.0]
We introduce a new compositional framework for generalized variational inference. We explain that exact Bayesian inference and the loss functions typical of variational inference satisfy chain rules akin to that of reverse-mode automatic differentiation.
arXiv Detail & Related papers (2025-03-24T12:05:45Z)
Group and Shuffle: Efficient Structured Orthogonal Parametrization [3.540195249269228]
We introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling.
arXiv Detail & Related papers (2024-06-14T13:29:36Z)
HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization [69.33162366130887]
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. We introduce a novel method designed to supplement the model with domain-level and task-specific characteristics. This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization.
arXiv Detail & Related papers (2024-01-18T04:23:21Z)
Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders [5.037881619912574]
In this paper, we investigate latent space separation methods for structural syntactic injection in Transformer-based VAEs. Specifically, we explore how syntactic structures can be leveraged in the encoding stage through the integration of graph-based and sequential models. Our empirical evaluation, carried out on natural language sentences and mathematical expressions, reveals that the proposed end-to-end VAE architecture can result in a better overall organisation of the latent space.
arXiv Detail & Related papers (2023-11-14T22:47:23Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus. By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z)
Frame Averaging for Equivariant Shape Space Learning [85.42901997467754]
A natural way to incorporate symmetries in shape space learning is to ask that the mapping to the shape space (encoder) and mapping from the shape space (decoder) are equivariant to the relevant symmetries. We present a framework for incorporating equivariance in encoders and decoders by introducing two contributions.
arXiv Detail & Related papers (2021-12-03T06:41:19Z)
Differentiable Spline Approximations [48.10988598845873]
Differentiable programming has significantly enhanced the scope of machine learning. Standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable. We show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications.
arXiv Detail & Related papers (2021-10-04T16:04:46Z)
Building powerful and equivariant graph neural networks with structural message-passing [74.93169425144755]
We propose a powerful and equivariant message-passing framework based on two ideas. First, we propagate a one-hot encoding of the nodes, in addition to the features, in order to learn a local context matrix around each node. Second, we propose methods for the parametrization of the message and update functions that ensure permutation equivariance.
arXiv Detail & Related papers (2020-06-26T17:15:16Z)
NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control [22.120942106939122]
We propose the use of adaptive search as a building block for general, non- neural optimization operations. We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
arXiv Detail & Related papers (2020-06-22T03:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.