Related papers: Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

URL: http://arxiv.org/abs/2603.02727v2
Date: Thu, 05 Mar 2026 00:01:22 GMT
Title: Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation
Authors: Hongbo Zheng, Afshin Bozorgpour, Dorit Merhof, Minjia Zhang,
Abstract summary: PVT-GDLA is a decoder-centric Transformer that restores sharp, long-range dependencies at linear time.<n>It achieves state-of-the-art accuracy across CT, MRI, ultrasound, and dermoscopy benchmarks under equal training budgets.
Score: 15.30336007288786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical image segmentation requires models that preserve fine anatomical boundaries while remaining efficient for clinical deployment. While transformers capture long-range dependencies, they suffer from quadratic attention cost and large data requirements, whereas CNNs are compute-friendly yet struggle with global reasoning. Linear attention offers $\mathcal{O}(N)$ scaling, but often exhibits training instability and attention dilution, yielding diffuse maps. We introduce PVT-GDLA, a decoder-centric Transformer that restores sharp, long-range dependencies at linear time. Its core, Gated Differential Linear Attention (GDLA), computes two kernelized attention paths on complementary query/key subspaces and subtracts them with a learnable, channel-wise scale to cancel common-mode noise and amplify relevant context. A lightweight, head-specific gate injects nonlinearity and input-adaptive sparsity, mitigating attention sink, and a parallel local token-mixing branch with depthwise convolution strengthens neighboring-token interactions, improving boundary fidelity, all while retaining $\mathcal{O}(N)$ complexity and low parameter overhead. Coupled with a pretrained Pyramid Vision Transformer (PVT) encoder, PVT-GDLA achieves state-of-the-art accuracy across CT, MRI, ultrasound, and dermoscopy benchmarks under equal training budgets, with comparable parameters but lower FLOPs than CNN-, Transformer-, hybrid-, and linear-attention baselines. PVT-GDLA provides a practical path to fast, scalable, high-fidelity medical segmentation in clinical environments and other resource-constrained settings.

Related papers

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series [15.981619117274667]
Accurate analysis of medical time series (MedTS) data, such as electroencephalography (EEG) and electrocardiography (ECG), plays a pivotal role in healthcare applications.<n>Recent advances in deep learning have leveraged Transformer-based models to effectively capture temporal dependencies.<n>This limitation stems from a structural mismatch: MedTS signals are inherently centralized, whereas the Transformer's attention mechanism is decentralized.<n>We propose CoTAR, a centralized-based module designed to replace decentralized attention.
arXiv Detail & Related papers (2026-02-09T04:39:22Z)
Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping [61.459927600301654]
Multi-condition control is bottlenecked by the conventional concatenate-and-attend'' strategy.<n>Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant.<n>We propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies.
arXiv Detail & Related papers (2026-02-06T16:39:10Z)
LINA: Linear Autoregressive Image Generative Models with Continuous Tokens [56.80443965097921]
Autoregressive models with continuous tokens form a promising paradigm for visual generation, especially for text-to-image (T2I) synthesis.<n>We study how to design compute-efficient linear attention within this framework.<n>We present LINA, a simple and compute-efficient T2I model built entirely on linear attention, capable of generating high-fidelity 1024x1024 images from user instructions.
arXiv Detail & Related papers (2026-01-30T06:44:33Z)
MedLiteNet: Lightweight Hybrid Medical Image Segmentation Model [17.73370811236741]
We introduce the MedLiteNet, a lightweight CNN Transformer hybrid tailored for dermoscopic segmentation.<n>The encoder stacks depth-wise Mobile Inverted Bottleneck blocks to curb computation, inserts a bottleneck-level cross-scale token-mixing unit to exchange information between resolutions, and embeds a boundary-aware self-attention module to sharpen lesion contours.
arXiv Detail & Related papers (2025-09-03T05:59:13Z)
U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs [0.0]
We propose a deep learning based enhanced UNet model, U-R-Veda, which integrates convolution transformations, vision transformer, residual links, channelattention, and spatial attention.<n>The model significantly improves the semantic segmentation of cardiac magnetic resonance (CMR) images.<n>Performance results show that U-R-Veda achieves an average accuracy of 95.2%, based on DSC.
arXiv Detail & Related papers (2025-06-25T04:10:09Z)
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis [9.090504201460817]
Histo Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors. Previous methods typically employ Multi-pathology Learning to enable slide-level prediction given only slide-level labels. To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention.
arXiv Detail & Related papers (2024-10-18T06:12:36Z)
Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z)
Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers [4.208461204572879]
Medical image segmentation is a critical task that plays a vital role in diagnosis, treatment planning, and disease monitoring. We address the local feature deficiency of the Transformer model by carefully re-designing the self-attention map. We propose a multi-scale context enhancement block within skip connections to adaptively model inter-scale dependencies.
arXiv Detail & Related papers (2023-08-25T15:42:19Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation [67.19443246236048]
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases. Some small-sized airway branches (e.g., bronchus and terminaloles) significantly aggravate the difficulty of automatic segmentation. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function.
arXiv Detail & Related papers (2022-09-05T16:38:13Z)
Weakly-supervised Learning For Catheter Segmentation in 3D Frustum Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method. The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.