Direct-CP: Directed Collaborative Perception for Connected and Autonomous Vehicles via Proactive Attention
- URL: http://arxiv.org/abs/2409.08840v1
- Date: Fri, 13 Sep 2024 13:53:52 GMT
- Title: Direct-CP: Directed Collaborative Perception for Connected and Autonomous Vehicles via Proactive Attention
- Authors: Yihang Tao, Senkang Hu, Zhengru Fang, Yuguang Fang,
- Abstract summary: We propose Direct-CP, a proactive and direction-aware CP system aiming at improving CP in specific directions.
Our key idea is to enable an ego vehicle to proactively signal its interested directions and readjust its attention to enhance local directional CP performance.
Our approach achieves 19.8% higher local perception accuracy in interested directions and 2.5% higher overall perception accuracy than the state-of-the-art methods in collaborative 3D object detection tasks.
- Score: 7.582576346284436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative perception (CP) leverages visual data from connected and autonomous vehicles (CAV) to enhance an ego vehicle's field of view (FoV). Despite recent progress, current CP methods expand the ego vehicle's 360-degree perceptual range almost equally, which faces two key challenges. Firstly, in areas with uneven traffic distribution, focusing on directions with little traffic offers limited benefits. Secondly, under limited communication budgets, allocating excessive bandwidth to less critical directions lowers the perception accuracy in more vital areas. To address these issues, we propose Direct-CP, a proactive and direction-aware CP system aiming at improving CP in specific directions. Our key idea is to enable an ego vehicle to proactively signal its interested directions and readjust its attention to enhance local directional CP performance. To achieve this, we first propose an RSU-aided direction masking mechanism that assists an ego vehicle in identifying vital directions. Additionally, we design a direction-aware selective attention module to wisely aggregate pertinent features based on ego vehicle's directional priorities, communication budget, and the positional data of CAVs. Moreover, we introduce a direction-weighted detection loss (DWLoss) to capture the divergence between directional CP outcomes and the ground truth, facilitating effective model training. Extensive experiments on the V2X-Sim 2.0 dataset demonstrate that our approach achieves 19.8\% higher local perception accuracy in interested directions and 2.5\% higher overall perception accuracy than the state-of-the-art methods in collaborative 3D object detection tasks.
Related papers
- Enhancing Lane Segment Perception and Topology Reasoning with Crowdsourcing Trajectory Priors [12.333249510969289]
In this paper, we investigate prior augmentation from a novel perspective of trajectory priors.
We design a confidence-based fusion module that takes alignment into account during the fusion process.
The results indicate that our method's performance significantly outperforms the current state-of-the-art methods.
arXiv Detail & Related papers (2024-11-26T07:05:05Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Enhanced Cooperative Perception for Autonomous Vehicles Using Imperfect Communication [0.24466725954625887]
We propose a novel approach to realize an optimized Cooperative Perception (CP) under constrained communications.
At the core of our approach is recruiting the best helper from the available list of front vehicles to augment the visual range.
Our results demonstrate the efficacy of our two-step optimization process in improving the overall performance of cooperative perception.
arXiv Detail & Related papers (2024-04-10T15:37:15Z) - FENet: Focusing Enhanced Network for Lane Detection [0.0]
This research pioneers networks augmented with Focusing Sampling, Partial Field of View Evaluation, Enhanced FPN architecture and Directional IoU Loss.
Experiments demonstrate our Focusing Sampling strategy, emphasizing vital distant details unlike uniform approaches.
Future directions include collecting on-road data and integrating complementary dual frameworks to further breakthroughs guided by human perception principles.
arXiv Detail & Related papers (2023-12-28T17:52:09Z) - V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network [13.248981195106069]
We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD)
The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
arXiv Detail & Related papers (2023-10-10T13:12:03Z) - MSight: An Edge-Cloud Infrastructure-based Perception System for
Connected Automated Vehicles [58.461077944514564]
This paper presents MSight, a cutting-edge roadside perception system specifically designed for automated vehicles.
MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction.
Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency.
arXiv Detail & Related papers (2023-10-08T21:32:30Z) - Language-Guided 3D Object Detection in Point Cloud for Autonomous
Driving [91.91552963872596]
We propose a new multi-modal visual grounding task, termed LiDAR Grounding.
It jointly learns the LiDAR-based object detector with the language features and predicts the targeted region directly from the detector.
Our work offers a deeper insight into the LiDAR-based grounding task and we expect it presents a promising direction for the autonomous driving community.
arXiv Detail & Related papers (2023-05-25T06:22:10Z) - Trajectory Design for UAV-Based Internet-of-Things Data Collection: A
Deep Reinforcement Learning Approach [93.67588414950656]
In this paper, we investigate an unmanned aerial vehicle (UAV)-assisted Internet-of-Things (IoT) system in a 3D environment.
We present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm.
Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional non-learning based baseline methods.
arXiv Detail & Related papers (2021-07-23T03:33:29Z) - DeepWORD: A GCN-based Approach for Owner-Member Relationship Detection
in Autonomous Driving [2.895229237964064]
We propose an innovative relationship prediction method, namely DeepWORD, by designing a graph convolution network (GCN)
Specifically, we utilize the feature maps with local correlation as the input of nodes to improve the information richness.
We establish an annotated owner-member relationship dataset called WORD as a large-scale benchmark, which will be available soon.
arXiv Detail & Related papers (2021-03-30T06:12:29Z) - Coordinate Attention for Efficient Mobile Network Design [96.40415345942186]
We propose a novel attention mechanism for mobile networks by embedding positional information into channel attention.
Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes.
Our coordinate attention is beneficial to ImageNet classification and behaves better in down-stream tasks, such as object detection and semantic segmentation.
arXiv Detail & Related papers (2021-03-04T09:18:02Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.