MirrorLA: Reflecting Feature Map for Vision Linear Attention
- URL: http://arxiv.org/abs/2602.04346v1
- Date: Wed, 04 Feb 2026 09:14:09 GMT
- Title: MirrorLA: Reflecting Feature Map for Vision Linear Attention
- Authors: Weikang Meng, Liangyu Huo, Yadan Luo, Yaowei Wang, Yingjian Li, Zheng Zhang,
- Abstract summary: Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance.<n>We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation.<n>MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
- Score: 49.41670925034762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance. We identify the root cause of this degradation as the non-negativity constraint imposed on kernel feature maps: standard projections like ReLU act as "passive truncation" operators, indiscriminately discarding semantic information residing in the negative domain. We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation. By leveraging learnable Householder reflections, MirrorLA rotates the feature geometry into the non-negative orthant to maximize information retention. Our approach restores representational density through a cohesive, multi-scale design: it first optimizes local discriminability via block-wise isometries, stabilizes long-context dynamics using variance-aware modulation to diversify activations, and finally, integrates dispersed subspaces via cross-head reflections to induce global covariance mixing. MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
Related papers
- AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning [8.698253005940503]
We propose Activation-Guided Zeroth-Order optimization (AGZO)<n>Unlike prior methods, AGZO extracts a compact, activation-informed subspace on the fly during the forward pass and restricts perturbations to this low-rank subspace.<n>AGZO consistently outperforms state-of-the-art ZO baselines and significantly narrows the performance gap with first-order fine-tuning.
arXiv Detail & Related papers (2026-01-24T02:28:15Z) - Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - RefLSM: Linearized Structural-Prior Reflectance Model for Medical Image Segmentation and Bias-Field Correction [10.716406019360441]
We propose a novel variational Reflectance-based Level Set Model (RefLSM) for medical image segmentation.<n>RefLSM explicitly integrates Retinex-inspired reflectance decomposition into the segmentation framework.<n>We show that RefLSM achieves superior segmentation accuracy, robustness, and computational efficiency compared to state-of-the-art level set methods.
arXiv Detail & Related papers (2025-12-08T06:06:29Z) - Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability [0.0]
We introduce Sparse Isotonic Shapley Regression (SISR), a unified nonlinear explanation framework.<n>SISR learns a monotonic transformation to restore additivity--obviating the need for a closed-form specification--and enforces an L0 sparsity constraint on the Shapley vector.<n>SISR stabilizes attributions across payoff schemes, correctly filters irrelevant features while standard Shapley values suffer severe rank and sign distortions.
arXiv Detail & Related papers (2025-12-02T08:34:43Z) - Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy [95.93943805282868]
Lipschitz-regularized object detection (LROD)<n>We propose Lipschitz-regularized YOLO (LR-YOLO), a framework that integrates image restoration directly into the detector's feature learning.<n> experiments on haze and low-light benchmarks demonstrate that LR-YOLO consistently improves detection stability, optimization smoothness, and overall accuracy.
arXiv Detail & Related papers (2025-10-28T09:41:42Z) - Enhancing CLIP Robustness via Cross-Modality Alignment [54.01929554563447]
We propose Cross-modality Alignment, an optimal transport-based framework for vision-language models.<n> COLA restores global image-text alignment and local structural consistency in the feature space.<n> COLA is training-free and compatible with existing fine-tuned models.
arXiv Detail & Related papers (2025-10-28T03:47:44Z) - Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention [54.42902794496325]
Linear attention, a variant of softmax attention, demonstrates promise in global context modeling.<n>We propose Rank Enhanced Linear Attention (RELA), a simple yet effective method that enriches feature representations by integrating a lightweight depthwise convolution.<n>Building upon RELA, we propose an efficient and effective image restoration Transformer, named LAformer.
arXiv Detail & Related papers (2025-05-22T02:57:23Z) - An Accelerated Alternating Partial Bregman Algorithm for ReLU-based Matrix Decomposition [0.0]
In this paper, we aim to investigate the sparse low-rank characteristics rectified on non-negative matrices.<n>We propose a novel regularization term incorporating useful structures in clustering and compression tasks.<n>We derive corresponding closed-form solutions while maintaining the $L$-smooth property always holds for any $Lge 1$.
arXiv Detail & Related papers (2025-03-04T08:20:34Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Robust Locality-Aware Regression for Labeled Data Classification [5.432221650286726]
We propose a new discriminant feature extraction framework, namely Robust Locality-Aware Regression (RLAR)
In our model, we introduce a retargeted regression to perform the marginal representation learning adaptively instead of using the general average inter-class margin.
To alleviate the disturbance of outliers and prevent overfitting, we measure the regression term and locality-aware term together with the regularization term by the L2,1 norm.
arXiv Detail & Related papers (2020-06-15T11:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.