Related papers: ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning

ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning

URL: http://arxiv.org/abs/2510.01585v1
Date: Thu, 02 Oct 2025 02:05:30 GMT
Title: ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning
Authors: Haochen You, Baojing Liu,
Abstract summary: We present ReSSFormer, a Recursive Sparse Structured Transformer that integrates three complementary innovations.<n>ReSSFormer replaces conventional depth stacking with recurrent inference, substitutes full attention with token- and expert-level sparsity, and models latent token topology directly from content.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Transformer architectures have demonstrated impressive scalability across domains, they continue to face challenges in long-context reasoning, computational efficiency, and structural generalization - largely due to rigid layer stacking, dense attention, and reliance on positional encodings. We present ReSSFormer, a Recursive Sparse Structured Transformer that integrates three complementary innovations: Recurrent Reasoning & Memory Unit (R2MU) for iterative reasoning with bounded depth, Adaptive Sparse Attention Module (ASAM) for efficient and focused context selection, and Self-Organizing Encoder Structure (SOES) for position-free structure induction. ReSSFormer replaces conventional depth stacking with recurrent inference, substitutes full attention with token- and expert-level sparsity, and models latent token topology directly from content. Across language modeling, multi-hop QA, and structure-sensitive tasks, ReSSFormer consistently outperforms strong baselines under comparable FLOPs and parameter budgets, highlighting its scalability, efficiency, and structural flexibility.

Related papers

Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning [29.722512436773638]
We propose textscStructure-R1, a framework that transforms retrieved content into structured representations optimized for reasoning.<n>We show that textscStructure-R1 consistently achieves competitive performance with a 7B-scale backbone model.<n>Our theoretical analysis demonstrates how structured representations enhance reasoning by improving information density and contextual clarity.
arXiv Detail & Related papers (2025-10-16T23:19:28Z)
CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z)
Unlocking Symbol-Level Precoding Efficiency Through Tensor Equivariant Neural Network [84.22115118596741]
We propose an end-to-end deep learning (DL) framework with low inference complexity for symbol-level precoding.<n>We show that the proposed framework captures substantial performance gains of optimal SLP, while achieving an approximately 80-times speedup over conventional methods.
arXiv Detail & Related papers (2025-10-02T15:15:50Z)
Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z)
Lexical Manifold Reconfiguration in Large Language Models: A Novel Architectural Approach for Contextual Modulation [0.0]
A structured approach was developed for dynamically reconfiguring token embeddings through continuous geometric transformations.<n>A manifold-based transformation mechanism was integrated to regulate lexical positioning, allowing embeddings to undergo controlled shifts.<n> Empirical evaluations demonstrated that embedding reconfiguration contributed to reductions in perplexity, improved lexical coherence, and enhanced sentence-level continuity.
arXiv Detail & Related papers (2025-02-12T22:11:07Z)
ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z)
Structured Context Recomposition for Large Language Models Using Probabilistic Layer Realignment [0.0]
This paper introduces a probabilistic layer realignment strategy that dynamically adjusts learned representations within transformer layers.<n>It mitigates abrupt topic shifts and logical inconsistencies, particularly in scenarios where sequences exceed standard attention window constraints.<n>While SCR incurs a moderate increase in processing time, memory overhead remains within feasible limits, making it suitable for practical deployment in autoregressive generative applications.
arXiv Detail & Related papers (2025-01-29T12:46:42Z)
Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation [0.0]
This study introduces hierarchical embedding augmentation as a means to redefine the representation of tokens through multi-level semantic structures.<n>Results reveal substantial improvements in computational efficiency, with marked reductions in processing overhead for longer input sequences.<n>The ability to dynamically adjust token representations and memory configurations contributed to the model's robustness under varied and unpredictable input conditions.
arXiv Detail & Related papers (2025-01-23T22:20:36Z)
Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting [52.364260925700485]
Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF)<n>It remains unclear whether existing Transformers fully leverage the intrinsic topological structure among tokens throughout intermediate layers.<n>We propose the Topology Enhancement Method (TEM), a novel Transformer-based TSF method that explicitly and adaptively preserves token-level topology.
arXiv Detail & Related papers (2024-04-16T07:21:39Z)
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings [60.698130703909804]
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset. We propose SQ-Transformer that explicitly encourages systematicity in the embeddings and attention layers. We show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets.
arXiv Detail & Related papers (2024-02-09T15:53:15Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.