TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
- URL: http://arxiv.org/abs/2412.10308v2
- Date: Tue, 25 Mar 2025 09:18:04 GMT
- Title: TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
- Authors: Yan Xia, Yunxiang Lu, Rui Song, Oussema Dhaouadi, João F. Henriques, Daniel Cremers,
- Abstract summary: We propose a novel image-to-point cloud registration (I2P) method, TrafficLoc, in a coarse-tofine matching fashion.<n>To overcome the lack of large-scale real-world intersection datasets, we first introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla.<n>Our TrafficLoc greatly improves the performance over the SOTA I2P methods (up to 86%) on Carla Intersection and generalizes well to real-world data.
- Score: 49.43995864524434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of localizing traffic cameras within a 3D reference map and propose a novel image-to-point cloud registration (I2P) method, TrafficLoc, in a coarse-tofine matching fashion. To overcome the lack of large-scale real-world intersection datasets, we first introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. We find that current I2P methods struggle with cross-modal matching under large viewpoint differences, especially at traffic intersections. TrafficLoc thus employs a novel Geometry-guided Attention Loss (GAL) to focus only on the corresponding geometric regions under different viewpoints during 2D-3D feature fusion. To address feature inconsistency in paired image patch-point groups, we further propose Inter-intra Contrastive Learning (ICL) to enhance separating 2D patch/3D group features within each intra-modality and introduce Dense Training Alignment (DTA) with soft-argmax for improving position regression. Extensive experiments show our TrafficLoc greatly improves the performance over the SOTA I2P methods (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating the superiority across both in-vehicle and traffic cameras. Our project page is publicly available at https://tum-luk.github.io/projects/trafficloc/.
Related papers
- Geographical Information Alignment Boosts Traffic Analysis via Transpose Cross-attention [4.323740171581589]
We propose a plug-in-and-play module for common GNN frameworks, termed Geographic Information Alignment (GIA)<n>This module can efficiently fuse the node feature and geographic position information through a novel Transpose Cross-attention mechanism.<n>Our method can obtain gains ranging from 1.3% to 10.9% in F1 score and 0.3% to 4.8% in AUC.
arXiv Detail & Related papers (2024-12-03T21:04:49Z) - Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - GLACE: Global Local Accelerated Coordinate Encoding [66.87005863868181]
Scene coordinate regression methods are effective in small-scale scenes but face significant challenges in large-scale scenes.
We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network.
Our method achieves state-of-the-art results on large-scale scenes with a low-map-size model.
arXiv Detail & Related papers (2024-06-06T17:59:50Z) - Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception [8.429178814528617]
A topology-aware energy loss function-based network training strategy named EIEGSeg is proposed.
EIEGSeg is designed for multi-class segmentation on real-time traffic scene perception.
Our results demonstrate that EIEGSeg consistently improves the performance, especially on real-time, lightweight networks.
arXiv Detail & Related papers (2023-10-02T01:30:42Z) - CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D [5.543544712471748]
This work introduces a new approach for joint detection of centerlines based on image data by localizing the features jointly in 2D and 3D.
To develop and evaluate our approach, a large urban driving dataset dubbed AV Breadcrumbs is automatically labeled by leveraging vector map representations and projective geometry to annotate over 900,000 images.
arXiv Detail & Related papers (2023-02-04T23:30:04Z) - PersFormer: 3D Lane Detection via Perspective Transformer and the
OpenLane Benchmark [109.03773439461615]
PersFormer is an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module.
We release one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotation and scenario diversity.
arXiv Detail & Related papers (2022-03-21T16:12:53Z) - 3D Scene Understanding at Urban Intersection using Stereo Vision and
Digital Map [4.640144833676576]
We introduce a stereo vision and 3D digital map based approach to spatially and temporally analyze the traffic situation at urban intersections.
We qualitatively and quantitatively evaluate our proposed technique on real traffic data collected at an urban canyon in Tokyo to demonstrate the efficacy of the system.
arXiv Detail & Related papers (2021-12-10T02:05:15Z) - Road Network Guided Fine-Grained Urban Traffic Flow Inference [108.64631590347352]
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem.
We propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that exploits the prior knowledge of road networks.
Our method can generate high-quality fine-grained traffic flow maps.
arXiv Detail & Related papers (2021-09-29T07:51:49Z) - Automatic Map Update Using Dashcam Videos [1.6911482053867475]
We propose an SfM-based solution for automatic map update, with a focus on real-time change detection and localization.
Our system can locate the objects detected from 2D images in a 3D space, utilizing sparse SfM point clouds.
arXiv Detail & Related papers (2021-09-24T18:00:57Z) - Robust 2D/3D Vehicle Parsing in CVIS [54.825777404511605]
We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS)
Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters.
In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation.
arXiv Detail & Related papers (2021-03-11T03:35:05Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - Crowdsourced 3D Mapping: A Combined Multi-View Geometry and
Self-Supervised Learning Approach [10.610403488989428]
We propose a framework that estimates the 3D positions of semantically meaningful landmarks without assuming known camera intrinsics.
We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning.
We achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively.
arXiv Detail & Related papers (2020-07-25T12:10:16Z) - Traffic Prediction Framework for OpenStreetMap using Deep Learning based
Complex Event Processing and Open Traffic Cameras [4.6453787256723365]
We propose a deep learning-based Complex Event Processing (CEP) method that relies on publicly available video camera streams for traffic estimation.
The proposed framework performs near-real-time object detection and objects property extraction across camera clusters in parallel to derive multiple measures related to traffic.
The system achieves a near-real-time performance of 1.42 seconds median latency and an average F-score of 0.80.
arXiv Detail & Related papers (2020-07-12T17:10:43Z) - Monocular Vision based Crowdsourced 3D Traffic Sign Positioning with
Unknown Camera Intrinsics and Distortion Coefficients [11.38332845467423]
We demonstrate an approach to computing 3D traffic sign positions without knowing the camera focal lengths, principal point, and distortion coefficients a priori.
We achieve an average single journey relative and absolute positioning accuracy of 0.26 m and 1.38 m, respectively.
arXiv Detail & Related papers (2020-07-09T07:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.