Relational inductive biases on attention mechanisms
- URL: http://arxiv.org/abs/2507.04117v1
- Date: Sat, 05 Jul 2025 17:46:52 GMT
- Title: Relational inductive biases on attention mechanisms
- Authors: Víctor Mijangos, Ximena Gutierrez-Vasques, Verónica E. Arriola, Ulises Rodríguez-Domínguez, Alexis Cervantes, José Luis Almanzara,
- Abstract summary: We focus on characterizing the relational biases present in attention mechanisms.<n>We show that different attention layers are characterized by the underlying relationships they assume on the input data.
- Score: 1.1545092788508224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inductive learning aims to construct general models from specific examples, guided by biases that influence hypothesis selection and determine generalization capacity. In this work, we focus on characterizing the relational inductive biases present in attention mechanisms, understood as assumptions about the underlying relationships between data elements. From the perspective of geometric deep learning, we analyze the most common attention mechanisms in terms of their equivariance properties with respect to permutation subgroups, which allows us to propose a classification based on their relational biases. Under this perspective, we show that different attention layers are characterized by the underlying relationships they assume on the input data.
Related papers
- On the influence of dependent features in classification problems: a game-theoretic perspective [46.1232919707345]
This paper deals with a new measure of the influence of each feature on the response variable in classification problems.
We consider a sample of individuals characterized by specific features, each feature encompassing a finite range of values, and classified based on a binary response variable.
arXiv Detail & Related papers (2024-08-05T14:02:26Z) - On the Anatomy of Attention [0.0]
We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models.
Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations.
arXiv Detail & Related papers (2024-07-02T16:50:26Z) - On Understanding Attention-Based In-Context Learning for Categorical Data [49.40350941996942]
We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections.<n>This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations.
arXiv Detail & Related papers (2024-05-27T15:03:21Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - Properties from Mechanisms: An Equivariance Perspective on Identifiable
Representation Learning [79.4957965474334]
Key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties.
This paper asks, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?"
We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms.
arXiv Detail & Related papers (2021-10-29T14:04:08Z) - Distinguishing rule- and exemplar-based generalization in learning
systems [10.396761067379195]
We investigate two distinct inductive biases: feature-level bias and exemplar-vs-rule bias.
We find that most standard neural network models have a propensity towards exemplar-based extrapolation.
We discuss the implications of these findings for research on data augmentation, fairness, and systematic generalization.
arXiv Detail & Related papers (2021-10-08T18:37:59Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.