Related papers: A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

URL: http://arxiv.org/abs/2506.06179v1
Date: Fri, 06 Jun 2025 15:44:10 GMT
Title: A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization
Authors: Muhammed Ustaomeroglu, Guannan Qu,
Abstract summary: We show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions.<n>Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training.<n>We introduce HyperFeatureAttention, a novel neural network module designed to learn couplings of different feature-level interactions between entities.
Score: 6.015898117103069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of interacting entities, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population distributions and out-of-distribution scenarios. Building on our theories, we introduce HyperFeatureAttention, a novel neural network module designed to learn couplings of different feature-level interactions between entities. Furthermore, we propose HyperAttention, a new module that extends beyond pairwise interactions to capture multi-entity dependencies, such as three-way, four-way, or general n-way interactions.

Related papers

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z)
Relation Learning and Aggregate-attention for Multi-person Motion Prediction [13.052342503276936]
Multi-person motion prediction considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. We introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations.
arXiv Detail & Related papers (2024-11-06T07:48:30Z)
Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network.<n>We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.<n>We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, quantification, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations [3.3713037259290255]
Current analysis methods often fail to harness the richness of such data.<n> CREIMBO identifies the hidden composition of per-session neural ensembles through graph-driven dictionary learning.<n>We demonstrate CREIMBO's ability to recover true components in synthetic data.
arXiv Detail & Related papers (2024-05-27T17:48:32Z)
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework. These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z)
Inferring Relational Potentials in Interacting Systems [56.498417950856904]
We propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions. NIIP assigns low energy to the subset of trajectories which respect the relational constraints observed. It allows trajectory manipulation, such as interchanging interaction types across separately trained models, as well as trajectory forecasting.
arXiv Detail & Related papers (2023-10-23T00:44:17Z)
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds. We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z)
Collective Relational Inference for learning heterogeneous interactions [8.215734914005845]
We propose a novel probabilistic method for relational inference, which possesses two distinctive characteristics compared to existing methods. We evaluate the proposed methodology across several benchmark datasets and demonstrate that it outperforms existing methods in accurately inferring interaction types. Overall the proposed model is data-efficient and generalizable to large systems when trained on smaller ones.
arXiv Detail & Related papers (2023-04-30T19:45:04Z)
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions [49.097973114627344]
We present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. We propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame.
arXiv Detail & Related papers (2023-04-12T08:12:29Z)
Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus. On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z)
Learning Heterogeneous Interaction Strengths by Trajectory Prediction with Graph Neural Network [0.0]
We propose the attentive relational inference network (RAIN) to infer continuously weighted interaction graphs without any ground-truth interaction strengths. We show that our RAIN model with the PA mechanism accurately infers continuous interaction strengths for simulated physical systems in an unsupervised manner.
arXiv Detail & Related papers (2022-08-28T09:13:33Z)
DIDER: Discovering Interpretable Dynamically Evolving Relations [14.69985920418015]
This paper introduces DIDER, Discovering Interpretable Dynamically Evolving Relations, a generic end-to-end interaction modeling framework with intrinsic interpretability. We evaluate DIDER on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-08-22T20:55:56Z)
Learning Interaction Variables and Kernels from Observations of Agent-Based Systems [14.240266845551488]
We propose a learning technique that, given observations of states and velocities along trajectories of agents, yields both the variables upon which the interaction kernel depends and the interaction kernel itself. This yields an effective dimension reduction which avoids the curse of dimensionality from the high-dimensional observation data. We demonstrate the learning capability of our method to a variety of first-order interacting systems.
arXiv Detail & Related papers (2022-08-04T16:31:01Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.