Related papers: Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention

Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention

URL: http://arxiv.org/abs/2508.20407v1
Date: Thu, 28 Aug 2025 04:10:19 GMT
Title: Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
Authors: Zhongpan Tang,
Abstract summary: This paper introduces a novel linear attention architecture-textbfTLinFormer.<n>By reconfiguring neuron connection patterns, TLinFormer achieves strict linear complexity while computing exact attention scores.<n>We show that TLinFormer exhibits overwhelming advantages in key metrics such as textbfinference latency, textbfKV cache efficiency, and textbfmemory footprint.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its application in long-sequence tasks. To address this challenge, existing linear attention methods typically sacrifice model performance by relying on data-agnostic kernel approximations or restrictive context selection. This paper returns to the first principles of connectionism, starting from the topological structure of information flow, to introduce a novel linear attention architecture-\textbf{TLinFormer}. By reconfiguring neuron connection patterns, TLinFormer achieves strict linear complexity while computing exact attention scores and ensuring information flow remains aware of the full historical context. This design aims to bridge the performance gap prevalent between existing efficient attention methods and standard attention. Through a series of experiments, we systematically evaluate the performance of TLinFormer against a standard Transformer baseline on long-sequence inference tasks. The results demonstrate that TLinFormer exhibits overwhelming advantages in key metrics such as \textbf{inference latency}, \textbf{KV cache efficiency}, \textbf{memory footprint}, and \textbf{overall speedup}.

Related papers

PRISM: Parallel Residual Iterative Sequence Model [52.26239951489612]
We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
arXiv Detail & Related papers (2026-02-11T12:39:41Z)
LUNA: Linear Universal Neural Attention with Generalization Guarantees [27.74721677870656]
textscLuna achieves state-of-the-art average accuracy among efficient Transformers under compute parity.<n>textscLuna also excels at post-hoc conversion: replacing softmax in fine-tuned BERT and ViT-B/16 checkpoints.
arXiv Detail & Related papers (2025-12-08T21:49:55Z)
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling [0.0]
Gated Associative Memory (GAM) network is a novel, fully parallel architecture for sequence modeling.<n>We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline.<n>Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets.
arXiv Detail & Related papers (2025-08-30T20:59:46Z)
Provable In-Context Learning of Nonlinear Regression with Transformers [66.99048542127768]
In-context learning (ICL) is the ability to perform unseen tasks using task specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z)
Log-Linear Attention [81.09631871212211]
This paper develops log-linear attention, an attention mechanism that balances linear attention's efficiency and the expressiveness of softmax attention.<n>We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length.<n>Log-linear attention is a general framework and can be applied on top of existing linear attention variants.
arXiv Detail & Related papers (2025-06-05T08:44:51Z)
Accurate and Efficient Multivariate Time Series Forecasting via Offline Clustering [22.545533166145706]
We introduce the Forecaster with Offline Clustering Using Segments (FOCUS)<n>FOCUS is a novel approach to MTS forecasting that simplifies long-range dependency modeling.<n>It achieves state-of-the-art accuracy while significantly reducing computational costs.
arXiv Detail & Related papers (2025-05-09T02:34:06Z)
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance [51.42269790543461]
We introduce a training-free approach for enhancing alignment in Transformer-based Text-Guided Diffusion Models (TGDMs)<n>Existing TGDMs often struggle to generate semantically aligned images, particularly when dealing with complex text prompts or multi-concept attribute binding challenges.<n>Our method addresses these challenges by directly optimizing cross-attention maps during the generation process.
arXiv Detail & Related papers (2025-03-22T07:03:57Z)
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection [68.8373788348678]
Visual instruction tuning adapts pre-trained Multimodal Large Language Models to follow human instructions.<n>PRISM is the first training-free framework for efficient visual instruction selection.<n>It reduces the end-to-end time for data selection and model tuning to just 30% of conventional pipelines.
arXiv Detail & Related papers (2025-02-17T18:43:41Z)
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner. Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z)
Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers [0.0]
We present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism.<n>We obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments.<n>We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length.
arXiv Detail & Related papers (2024-05-07T19:05:26Z)
Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization [6.799413002613627]
Todyformer is a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers. We show that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks.
arXiv Detail & Related papers (2024-02-02T23:05:30Z)
FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
Linear attention offers a much more efficient alternative with its linear complexity. Current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead. We propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness.
arXiv Detail & Related papers (2023-08-01T10:37:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.