Related papers: LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation

LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation

URL: http://arxiv.org/abs/2411.14927v1
Date: Fri, 22 Nov 2024 13:34:29 GMT
Title: LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation
Authors: Zhenwei Yang, Jilei Mao, Wenxian Yang, Yibo Ai, Yu Kong, Haibao Yu, Weidong Zhang,
Abstract summary: We introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for Vehicle-Temporal Cooperation (VIC) LET-VIC leverages Vehicle-to-Everything (V2X) communication to enhance temporal perception by fusing spatial and temporal data from both vehicle and infrastructure sensors. Experiments on the V2X-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models, achieving at least a 13.7% improvement in mAP and a 13.1% improvement in AMOTA without considering communication delays.
Score: 16.465037559349323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Temporal perception, the ability to detect and track objects over time, is critical in autonomous driving for maintaining a comprehensive understanding of dynamic environments. However, this task is hindered by significant challenges, including incomplete perception caused by occluded objects and observational blind spots, which are common in single-vehicle perception systems. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for Vehicle-Infrastructure Cooperation (VIC). LET-VIC leverages Vehicle-to-Everything (V2X) communication to enhance temporal perception by fusing spatial and temporal data from both vehicle and infrastructure sensors. First, it spatially integrates Bird's Eye View (BEV) features from vehicle-side and infrastructure-side LiDAR data, creating a comprehensive view that mitigates occlusions and compensates for blind spots. Second, LET-VIC incorporates temporal context across frames, allowing the model to leverage historical data for enhanced tracking stability and accuracy. To further improve robustness, LET-VIC includes a Calibration Error Compensation (CEC) module to address sensor misalignments and ensure precise feature alignment. Experiments on the V2X-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models, achieving at least a 13.7% improvement in mAP and a 13.1% improvement in AMOTA without considering communication delays. This work offers a practical solution and a new research direction for advancing temporal perception in autonomous driving through vehicle-infrastructure cooperation.

Related papers

EffiComm: Bandwidth Efficient Multi Agent Communication [11.311414617703308]
Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots.<n>We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy.
arXiv Detail & Related papers (2025-07-25T15:03:26Z)
Automating Traffic Monitoring with SHM Sensor Networks via Vision-Supervised Deep Learning [0.0]
Bridges, as critical components of civil infrastructure, are increasingly affected by deterioration.<n>Recent advances in deep learning have enabled progress toward continuous, automated monitoring.<n>We propose a fully automated deep-learning pipeline for continuous traffic monitoring using structural health monitoring (SHM) sensor networks.
arXiv Detail & Related papers (2025-06-23T18:27:14Z)
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs. We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations. This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z)
InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios [13.821143687548494]
This paper introduces a new 3D infrastructure-side collaborative perception dataset, abbreviated as inscope. InScope encapsulates a 20-day capture duration with 303 tracking trajectories and 187,787 3D bounding boxes annotated by experts.
arXiv Detail & Related papers (2024-07-31T13:11:14Z)
V2I-Calib: A Novel Calibration Approach for Collaborative Vehicle and Infrastructure LiDAR Systems [19.919120489121987]
This paper introduces a novel approach to V2I calibration, leveraging spatial association information among perceived objects. Central to this method is the innovative Overall Intersection over Union (oIoU) metric, which quantifies the correlation between targets identified by vehicle and infrastructure systems. Our approach involves identifying common targets within the perception results of vehicle and infrastructure LiDAR systems through the construction of an affinity matrix.
arXiv Detail & Related papers (2024-07-14T13:34:00Z)
Unified End-to-End V2X Cooperative Autonomous Driving [21.631099800753795]
UniE2EV2X is a V2X-integrated end-to-end autonomous driving system that consolidates key driving modules within a unified network. The framework employs a deformable attention-based data fusion strategy, effectively facilitating cooperation between vehicles and infrastructure. We implement the UniE2EV2X framework on the challenging DeepAccident, a simulation dataset designed for V2X cooperative driving.
arXiv Detail & Related papers (2024-05-07T03:01:40Z)
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising [49.86409475232849]
Trajectory prediction is fundamental in computer vision and autonomous driving. Existing approaches in this field often assume precise and complete observational data. We present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique.
arXiv Detail & Related papers (2024-04-02T18:30:29Z)
V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric Heterogenous Distillation Network [13.248981195106069]
We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD) The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
arXiv Detail & Related papers (2023-10-10T13:12:03Z)
MSight: An Edge-Cloud Infrastructure-based Perception System for Connected Automated Vehicles [58.461077944514564]
This paper presents MSight, a cutting-edge roadside perception system specifically designed for automated vehicles. MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction. Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency.
arXiv Detail & Related papers (2023-10-08T21:32:30Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Integrated Sensing, Computation, and Communication for UAV-assisted Federated Edge Learning [52.7230652428711]
Federated edge learning (FEEL) enables privacy-preserving model training through periodic communication between edge devices and the server. Unmanned Aerial Vehicle (UAV)mounted edge devices are particularly advantageous for FEEL due to their flexibility and mobility in efficient data collection.
arXiv Detail & Related papers (2023-06-05T16:01:33Z)
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics [77.34726150561087]
This work aims to carry out a study on the current scenario of camera and radar-based perception for ADAS and autonomous vehicles. Concepts and characteristics related to both sensors, as well as to their fusion, are presented. We give an overview of the Deep Learning-based detection and segmentation tasks, and the main datasets, metrics, challenges, and open questions in vehicle perception.
arXiv Detail & Related papers (2023-03-08T00:48:32Z)
MASS: Mobility-Aware Sensor Scheduling of Cooperative Perception for Connected Automated Driving [19.66714697653504]
A new paradigm, Cooperative Perception (CP), comes to the rescue by sharing sensor data from a cooperative vehicle (CoV) Existing methods rely on the exchange of meta-information, such as visibility maps, to predict the perception gains from nearby vehicles. We propose a new approach, learning while scheduling, for distributed scheduling of CP. The proposed MASS algorithm achieves the best average perception gain and improves recall by up to 4.2 percentage points compared to other learning-based algorithms.
arXiv Detail & Related papers (2023-02-25T09:03:05Z)
Adaptive Feature Fusion for Cooperative Perception using LiDAR Point Clouds [0.0]
Cooperative perception allows a Connected Autonomous Vehicle to interact with the other CAVs in the vicinity. It can compensate for the limitations of the conventional vehicular perception such as blind spots, low resolution, and weather effects. We evaluate the performance of cooperative perception for both vehicle and pedestrian detection using the CODD dataset.
arXiv Detail & Related papers (2022-07-30T01:53:05Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection [8.681912341444901]
DAIR-V2X is the first large-scale, multi-modality, multi-view dataset from real scenarios for Vehicle-Infrastructure Cooperative Autonomous Driving. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations.
arXiv Detail & Related papers (2022-04-12T07:13:33Z)
Multi-Stream Attention Learning for Monocular Vehicle Velocity and Inter-Vehicle Distance Estimation [25.103483428654375]
Vehicle velocity and inter-vehicle distance estimation are essential for ADAS (Advanced driver-assistance systems) and autonomous vehicles. Recent studies focus on using a low-cost monocular camera to perceive the environment around the vehicle in a data-driven fashion. MSANet is proposed to extract different aspects of features, e.g., spatial and contextual features, for joint vehicle velocity and inter-vehicle distance estimation.
arXiv Detail & Related papers (2021-10-22T06:14:12Z)
Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework. We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design. We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z)
Improving Perception via Sensor Placement: Designing Multi-LiDAR Systems for Autonomous Vehicles [16.45799795374353]
We propose an easy-to-compute information-theoretic surrogate cost metric based on Probabilistic Occupancy Grids (POG) to optimize LiDAR placement for maximal sensing. Our results confirm that sensor placement is an important factor in 3D point cloud-based object detection and could lead to a variation of performance by 10% 20% on the state-of-the-art perception algorithms.
arXiv Detail & Related papers (2021-05-02T01:52:18Z)
PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [82.97006521937101]
We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles. We propose Net, an end-to-end model that takes as input sensor data, and outputs at each time step object tracks and their future level.
arXiv Detail & Related papers (2020-05-29T17:57:25Z)
LIBRE: The Multiple 3D LiDAR Dataset [54.25307983677663]
We present LIBRE: LiDAR Benchmarking and Reference, a first-of-its-kind dataset featuring 10 different LiDAR sensors. LIBRE will contribute to the research community to provide a means for a fair comparison of currently available LiDARs. It will also facilitate the improvement of existing self-driving vehicles and robotics-related software.
arXiv Detail & Related papers (2020-03-13T06:17:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.