Chain and Causal Attention for Efficient Entity Tracking
- URL: http://arxiv.org/abs/2410.05565v1
- Date: Mon, 7 Oct 2024 23:54:10 GMT
- Title: Chain and Causal Attention for Efficient Entity Tracking
- Authors: Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen,
- Abstract summary: We propose an efficient and frugal enhancement to the standard attention mechanism.
By considering attention as an adjacency matrix, our model can track entity states with a single layer.
Our contributions include theoretical insights, an improved attention mechanism, and empirical validation.
- Score: 46.577761606415805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the limitations of transformers for entity-tracking tasks in large language models. We identify a theoretical constraint, showing that transformers require at least $\log_2 (n+1)$ layers to handle entity tracking with $n$ state changes. To address this issue, we propose an efficient and frugal enhancement to the standard attention mechanism, enabling it to manage long-term dependencies more efficiently. By considering attention as an adjacency matrix, our model can track entity states with a single layer. Empirical results demonstrate significant improvements in entity tracking datasets while keeping competitive performance on standard natural language modeling. Our modified attention allows us to achieve the same performance with drastically fewer layers. Additionally, our enhanced mechanism reveals structured internal representations of attention. Extensive experiments on both toy and complex datasets validate our approach. Our contributions include theoretical insights, an improved attention mechanism, and empirical validation.
Related papers
- Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models.
SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details.
Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z) - Towards Better Text-to-Image Generation Alignment via Attention Modulation [16.020834525343997]
We propose an attribution-focusing mechanism, a training-free phase-wise mechanism by modulation of attention for diffusion model.
An object-focused masking scheme and a phase-wise dynamic weight control mechanism are integrated into the cross-attention modules.
The experimental results in various alignment scenarios demonstrate that our model attains better image-text alignment with minimal additional computational cost.
arXiv Detail & Related papers (2024-04-22T06:18:37Z) - EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction [3.579132482505273]
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs.
Existing approaches often fall short of two key aspects: richness of representation and coherence in output structure.
In our work, we introduce EnriCo, which mitigates these shortcomings.
arXiv Detail & Related papers (2024-04-18T20:15:48Z) - Representation Alignment Contrastive Regularization for Multi-Object Tracking [29.837560662395713]
Mainstream-performance in multi-object tracking algorithms relies on modeling heavily-temporal relationships during the data association stage.
This work aims to simplify deep learning-based,temporal relationship models and introduce interpretability into data association designs.
arXiv Detail & Related papers (2024-04-03T08:33:08Z) - Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models.
Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.