SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention
- URL: http://arxiv.org/abs/2512.20724v1
- Date: Tue, 23 Dec 2025 19:35:02 GMT
- Title: SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention
- Authors: Alexandros Christoforos, Chadbourne Davis,
- Abstract summary: SA-DiffuSeq is a diffusion framework that integrates sparse attention to improve scalability for long document modeling.<n>Our results indicate that incorporating structured sparsity into diffusion models is a promising direction for efficient and expressive long text generation.
- Score: 45.88028371034407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion based approaches to long form text generation suffer from prohibitive computational cost and memory overhead as sequence length increases. We introduce SA-DiffuSeq, a diffusion framework that integrates sparse attention to fundamentally improve scalability for long document modeling. By selectively allocating attention within the diffusion process, SA-DiffuSeq significantly reduces computational complexity while maintaining semantic coherence and generation quality. A key component of our method is a soft absorbing state tailored to sparse attention dynamics, which stabilizes diffusion trajectories and accelerates sequence reconstruction. This design improves sampling efficiency and enhances precision in long range dependency modeling. Extensive experiments demonstrate that SA-DiffuSeq consistently surpasses state of the art diffusion baselines in both training efficiency and sampling speed, with especially strong gains on extended sequences. These properties make SA-DiffuSeq well suited for demanding long form applications such as scientific writing, large scale code generation, and multi turn long context dialogue. Overall, our results indicate that incorporating structured sparsity into diffusion models is a promising direction for efficient and expressive long text generation.
Related papers
- Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z) - MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts [45.88028371034407]
MoE-DiffuSeq is a mixture of experts based framework for enhancing diffusion models in long document generation.<n>MoE-DiffuSeq integrates sparse attention with a mixture of experts architecture, enabling efficient and scalable long sequence modeling.
arXiv Detail & Related papers (2025-12-23T18:50:54Z) - SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation [62.14510717860079]
We propose a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion.<n>SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation.<n>Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation.
arXiv Detail & Related papers (2025-10-07T17:29:28Z) - Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models [74.15250326312179]
Diffusion Large Language Models offer efficient parallel generation and capable global modeling.<n>The dominant application ofDLLMs is hindered by the need for a statically predefined generation length.<n>We introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion.
arXiv Detail & Related papers (2025-08-01T17:56:07Z) - Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models [12.446047799880587]
Token-level diffusion doesn't model word-order dependencies explicitly.<n>Passage-level diffusion struggles with learning robust representations for long-form text.<n>We propose Segment-Level Diffusion, a framework that enhances diffusion-based text generation.
arXiv Detail & Related papers (2024-12-15T22:47:44Z) - UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework.
It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation.
UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z) - Discrete Diffusion Language Model for Efficient Text Summarization [19.267738861590487]
We introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively.<n>Our approaches achieve state-of-the-art performance on three benchmark summarization datasets: Gigaword, CNN/DailyMail, and Arxiv.
arXiv Detail & Related papers (2024-06-25T09:55:22Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.