Related papers: LADY: Linear Attention for Autonomous Driving Efficiency without Transformers

LADY: Linear Attention for Autonomous Driving Efficiency without Transformers

URL: http://arxiv.org/abs/2512.15038v2
Date: Thu, 18 Dec 2025 04:52:38 GMT
Title: LADY: Linear Attention for Autonomous Driving Efficiency without Transformers
Authors: Jihao Huang, Xi Xia, Zhiyuan Li, Tianle Liu, Jingke Wang, Junbo Chen, Tengju Ye,
Abstract summary: LADY is the first fully linear attention-based generative model for end-to-end autonomous driving.<n>We introduce a lightweight linear cross-attention mechanism that enables effective cross-modal information exchange.<n>The model has been deployed and validated on edge devices, demonstrating its practicality in resource-limited scenarios.
Score: 12.89500537893449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end paradigms have demonstrated great potential for autonomous driving. Additionally, most existing methods are built upon Transformer architectures. However, transformers incur a quadratic attention cost, limiting their ability to model long spatial and temporal sequences-particularly on resource-constrained edge platforms. As autonomous driving inherently demands efficient temporal modeling, this challenge severely limits their deployment and real-time performance. Recently, linear attention mechanisms have gained increasing attention due to their superior spatiotemporal complexity. However, existing linear attention architectures are limited to self-attention, lacking support for cross-modal and cross-temporal interactions-both crucial for autonomous driving. In this work, we propose LADY, the first fully linear attention-based generative model for end-to-end autonomous driving. LADY enables fusion of long-range temporal context at inference with constant computational and memory costs, regardless of the history length of camera and LiDAR features. Additionally, we introduce a lightweight linear cross-attention mechanism that enables effective cross-modal information exchange. Experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that LADY achieves state-of-the-art performance with constant-time and memory complexity, offering improved planning performance and significantly reduced computational cost. Additionally, the model has been deployed and validated on edge devices, demonstrating its practicality in resource-limited scenarios.

Related papers

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation [53.47253633654885]
InstaDrive is a novel framework that enhances driving video realism through two key advancements.<n>By incorporating these instance-aware mechanisms, InstaDrive achieves state-of-the-art video generation quality.<n>Our project page is https://shanpoyang654.io/InstaDrive/page.html.
arXiv Detail & Related papers (2026-02-03T08:22:13Z)
Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation [12.450883696383878]
Imitation learning (IL) has emerged as a central paradigm in autonomous driving.<n>IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors.<n>Over successive planning cycles, small, often imperceptible errors compound, potentially resulting in severe failures.<n>We propose Sequence of Experts (SoE) to enhance closed-loop performance without increasing model size or data requirements.
arXiv Detail & Related papers (2025-12-15T08:50:23Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving [37.260140808367716]
We propose AutoDrive-R$2$, a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems.<n>We first propose an innovative CoT dataset named nuScenesR$2$-6K for supervised fine-tuning.<n>We then employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework to ensure reliable smoothness and realistic trajectory planning.
arXiv Detail & Related papers (2025-09-02T04:32:24Z)
ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving [14.486548540613791]
We introduce ViLaD, a novel Large Vision Language Diffusion framework for end-to-end autonomous driving.<n>ViLaD enables parallel generation of entire driving decision sequences, significantly reducing computational latency.<n>We conduct comprehensive experiments on the nuScenes dataset, where ViLaD outperforms state-of-the-art autoregressive VLM baselines in both planning accuracy and inference speed.
arXiv Detail & Related papers (2025-08-18T04:01:56Z)
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving [5.450011907283289]
This paper introduces GMF-Drive, an end-to-end framework that overcomes challenges through two principled innovations.<n>First, we supersede the information-limited histogram-based LiDAR representation with a geometrically-augmented pillar format.<n>Second, we propose a novel hierarchical mamba fusion architecture that substitutes an expensive transformer with a highly efficient, spatially-aware state-space model.
arXiv Detail & Related papers (2025-08-08T08:17:18Z)
Lightweight Temporal Transformer Decomposition for Federated Autonomous Driving [11.79541267274746]
We propose a method that processes sequential image frames and temporal steering data by breaking down large attention maps into smaller matrices.<n>This approach reduces model complexity, enabling efficient weight updates for convergence and real-time predictions.<n>Experiments on three datasets demonstrate that our method outperforms recent approaches by a clear margin while achieving real-time performance.
arXiv Detail & Related papers (2025-06-30T05:14:16Z)
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
From Imitation to Exploration: End-to-end Autonomous Driving based on World Model [24.578178308010912]
RAMBLE is an end-to-end world model-based RL method for driving decision-making.<n>It can handle complex and dynamic traffic scenarios.<n>It achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0.
arXiv Detail & Related papers (2024-10-03T06:45:59Z)
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner. Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z)
Scalable Vehicle Re-Identification via Self-Supervision [66.2562538902156]
Vehicle Re-Identification is one of the key elements in city-scale vehicle analytics systems. Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity. We propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time.
arXiv Detail & Related papers (2022-05-16T12:14:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.