Related papers: New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles

New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles

URL: http://arxiv.org/abs/2512.01882v1
Date: Mon, 01 Dec 2025 17:04:56 GMT
Title: New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles
Authors: Aref Ghoreishee, Abhishek Mishra, Lifeng Zhou, John Walsh, Nagarajan Kandasamy,
Abstract summary: This work proposes an end-to-end multi-modal reinforcement learning framework for high-level decision-making in autonomous vehicles.<n>The framework integrates heterogeneous sensory input, including camera images, LiDAR point clouds, and vehicle heading information, through a cross-attention transformer-based perception module.
Score: 11.558832874246646
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work proposes an end-to-end multi-modal reinforcement learning framework for high-level decision-making in autonomous vehicles. The framework integrates heterogeneous sensory input, including camera images, LiDAR point clouds, and vehicle heading information, through a cross-attention transformer-based perception module. Although transformers have become the backbone of modern multi-modal architectures, their high computational cost limits their deployment in resource-constrained edge environments. To overcome this challenge, we propose a spiking temporal-aware transformer-like architecture that uses ternary spiking neurons for computationally efficient multi-modal fusion. Comprehensive evaluations across multiple tasks in the Highway Environment demonstrate the effectiveness and efficiency of the proposed approach for real-time autonomous decision-making.

Related papers

Towards Safety-Compliant Transformer Architectures for Automotive Systems [31.658299857884316]
This paper presents a conceptual framework for integrating Transformers into automotive systems from a safety perspective.<n>We outline how multimodal Foundation Models can leverage sensor diversity and redundancy to improve fault tolerance and robustness.
arXiv Detail & Related papers (2026-01-26T14:12:27Z)
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z)
A CLIP-based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving [6.223368492604449]
Uncertainty Modal Modeling (UMM) framework integrates a multimodal token mapper, synthetic modality augmentation strategy, and cross-modal cue interactive learner.<n>UMM achieves strong robustness, generalization, and computational efficiency under uncertain modality conditions.
arXiv Detail & Related papers (2025-08-15T04:50:27Z)
Aerial Reliable Collaborative Communications for Terrestrial Mobile Users via Evolutionary Multi-Objective Deep Reinforcement Learning [59.660724802286865]
Unmanned aerial vehicles (UAVs) have emerged as the potential aerial base stations (BSs) to improve terrestrial communications.<n>This work employs collaborative beamforming through a UAV-enabled virtual antenna array to improve transmission performance from the UAV to terrestrial mobile users.
arXiv Detail & Related papers (2025-02-09T09:15:47Z)
A Coalition Game for On-demand Multi-modal 3D Automated Delivery System [4.378407481656902]
We introduce a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last-mile delivery in urban environments.<n>We investigate cooperation structures among the modes to capture how strategic collaboration can improve overall routing efficiency.<n>Several numerical experiments on last-mile delivery applications have been conducted, showing the results from the case study in the city of Mississauga.
arXiv Detail & Related papers (2024-12-23T03:50:29Z)
Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving [3.770103075126785]
We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation. We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs. We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
arXiv Detail & Related papers (2024-11-06T06:58:17Z)
Parameterized Decision-making with Multi-modal Perception for Autonomous Driving [12.21578713219778]
We propose a parameterized decision-making framework with multi-modal perception based on deep reinforcement learning, called AUTO. A hybrid reward function takes into account aspects of safety, traffic efficiency, passenger comfort, and impact to guide the framework to generate optimal actions.
arXiv Detail & Related papers (2023-12-19T08:27:02Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
Eco-Driving Control of Connected and Automated Vehicles using Neural Network based Rollout [0.0]
Connected and autonomous vehicles have the potential to minimize energy consumption. Existing deterministic and methods created to solve the eco-driving problem generally suffer from high computational and memory requirements. This work proposes a hierarchical multi-horizon optimization framework implemented via a neural network.
arXiv Detail & Related papers (2023-10-16T23:13:51Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.