Lightweight Temporal Transformer Decomposition for Federated Autonomous Driving
- URL: http://arxiv.org/abs/2506.23523v1
- Date: Mon, 30 Jun 2025 05:14:16 GMT
- Title: Lightweight Temporal Transformer Decomposition for Federated Autonomous Driving
- Authors: Tuong Do, Binh X. Nguyen, Quang D. Tran, Erman Tjiputra, Te-Chuan Chiu, Anh Nguyen,
- Abstract summary: We propose a method that processes sequential image frames and temporal steering data by breaking down large attention maps into smaller matrices.<n>This approach reduces model complexity, enabling efficient weight updates for convergence and real-time predictions.<n>Experiments on three datasets demonstrate that our method outperforms recent approaches by a clear margin while achieving real-time performance.
- Score: 11.79541267274746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional vision-based autonomous driving systems often face difficulties in navigating complex environments when relying solely on single-image inputs. To overcome this limitation, incorporating temporal data such as past image frames or steering sequences, has proven effective in enhancing robustness and adaptability in challenging scenarios. While previous high-performance methods exist, they often rely on resource-intensive fusion networks, making them impractical for training and unsuitable for federated learning. To address these challenges, we propose lightweight temporal transformer decomposition, a method that processes sequential image frames and temporal steering data by breaking down large attention maps into smaller matrices. This approach reduces model complexity, enabling efficient weight updates for convergence and real-time predictions while leveraging temporal information to enhance autonomous driving performance. Intensive experiments on three datasets demonstrate that our method outperforms recent approaches by a clear margin while achieving real-time performance. Additionally, real robot experiments further confirm the effectiveness of our method.
Related papers
- InstaRevive: One-Step Image Enhancement via Dynamic Score Matching [66.97989469865828]
InstaRevive is an image enhancement framework that employs score-based diffusion distillation to harness potent generative capability.<n>Our framework delivers high-quality and visually appealing results across a diverse array of challenging tasks and datasets.
arXiv Detail & Related papers (2025-04-22T01:19:53Z) - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation [15.726401007342087]
Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving.<n>Recent approaches have incorporated historical data to mitigate the issue, but they often incur high computational costs and may introduce noisy information that interferes with object detection.<n>We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance.<n>Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and produces correction occupancy components to refine the base network's predictions.
arXiv Detail & Related papers (2025-02-21T13:07:45Z) - Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception [18.403242474776764]
This work presents an innovative real-time streaming perception method, Transtreaming, which addresses the challenge of real-time object detection with dynamic computational delay.
The proposed model outperforms the existing state-of-the-art methods, even in single-frame detection scenarios.
Transtreaming meets the stringent real-time processing requirements on all kinds of devices.
arXiv Detail & Related papers (2024-09-10T15:26:38Z) - Event-Aided Time-to-Collision Estimation for Autonomous Driving [28.13397992839372]
We present a novel method that estimates the time to collision using a neuromorphic event-based camera.
The proposed algorithm consists of a two-step approach for efficient and accurate geometric model fitting on event data.
Experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-07-10T02:37:36Z) - Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations [53.797896854533384]
Class-agnostic motion prediction methods directly predict the motion of the entire point cloud.
While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming.
We introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively.
arXiv Detail & Related papers (2024-03-20T02:58:45Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Blur Interpolation Transformer for Real-World Motion from Blur [52.10523711510876]
We propose a encoded blur transformer (BiT) to unravel the underlying temporal correlation in blur.
Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies.
In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs.
arXiv Detail & Related papers (2022-11-21T13:10:10Z) - Scalable Vehicle Re-Identification via Self-Supervision [66.2562538902156]
Vehicle Re-Identification is one of the key elements in city-scale vehicle analytics systems.
Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity.
We propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time.
arXiv Detail & Related papers (2022-05-16T12:14:42Z) - LUNAR: Cellular Automata for Drifting Data Streams [19.98517714325424]
We propose LUNAR, a streamified version of cellular automata.
It is able to act as a real incremental learner while adapting to drifting conditions.
arXiv Detail & Related papers (2020-02-06T09:10:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.