Infusing Lattice Symmetry Priors in Attention Mechanisms for
Sample-Efficient Abstract Geometric Reasoning
- URL: http://arxiv.org/abs/2306.03175v1
- Date: Mon, 5 Jun 2023 18:32:53 GMT
- Title: Infusing Lattice Symmetry Priors in Attention Mechanisms for
Sample-Efficient Abstract Geometric Reasoning
- Authors: Mattia Atzeni, Mrinmaya Sachan, Andreas Loukas
- Abstract summary: Abstraction and Reasoning (ARC) has been postulated as an important step towards general AI.
We argue that solving these tasks requires extreme generalization that can only be achieved by proper accounting for core knowledge priors.
We introduce LatFormer, a model that incorporates lattice priors in attention masks.
- Score: 45.4605460163454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Abstraction and Reasoning Corpus (ARC) (Chollet, 2019) and its most
recent language-complete instantiation (LARC) has been postulated as an
important step towards general AI. Yet, even state-of-the-art machine learning
models struggle to achieve meaningful performance on these problems, falling
behind non-learning based approaches. We argue that solving these tasks
requires extreme generalization that can only be achieved by proper accounting
for core knowledge priors. As a step towards this goal, we focus on geometry
priors and introduce LatFormer, a model that incorporates lattice symmetry
priors in attention masks. We show that, for any transformation of the
hypercubic lattice, there exists a binary attention mask that implements that
group action. Hence, our study motivates a modification to the standard
attention mechanism, where attention weights are scaled using soft masks
generated by a convolutional network. Experiments on synthetic geometric
reasoning show that LatFormer requires 2 orders of magnitude fewer data than
standard attention and transformers. Moreover, our results on ARC and LARC
tasks that incorporate geometric priors provide preliminary evidence that these
complex datasets do not lie out of the reach of deep learning models.
Related papers
- ReGLA: Refining Gated Linear Attention [42.97193398172823]
Linear attention has been designed to reduce the quadratic space-time complexity that is inherent in standard transformers.
We developed a feature mapping function to address some crucial issues that previous suggestions overlooked.
We also explored the saturation phenomenon of the gating mechanism and augmented it with a refining module.
arXiv Detail & Related papers (2025-02-03T18:03:13Z) - Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training [31.495803865226158]
Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons.
AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged.
We show that SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications.
arXiv Detail & Related papers (2024-05-27T12:48:30Z) - Efficient Compression of Overparameterized Deep Models through
Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models.
Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z) - Evaluating Robustness of Visual Representations for Object Assembly Task
Requiring Spatio-Geometrical Reasoning [8.626019848533707]
This paper focuses on evaluating and benchmarking the robustness of visual representations in the context of object assembly tasks.
We employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders.
Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations.
arXiv Detail & Related papers (2023-10-15T20:41:07Z) - Deep learning applied to computational mechanics: A comprehensive
review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail.
Both hybrid and pure machine learning (ML) methods are discussed.
History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z) - Learning Mechanically Driven Emergent Behavior with Message Passing
Neural Networks [0.0]
We introduce the Asymmetric Buckling Columns dataset.
The goal is to classify the direction of symmetry breaking under compression after the onset of instability.
In addition to investigating GNN model architecture, we study the effect of different input data representation approaches.
arXiv Detail & Related papers (2022-02-03T02:46:16Z) - Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.