VisionTraj: A Noise-Robust Trajectory Recovery Framework based on
Large-scale Camera Network
- URL: http://arxiv.org/abs/2312.06428v1
- Date: Mon, 11 Dec 2023 14:52:43 GMT
- Title: VisionTraj: A Noise-Robust Trajectory Recovery Framework based on
Large-scale Camera Network
- Authors: Zhishuai Li, Ziyue Li, Xiaoru Hu, Guoqing Du, Yunhao Nie, Feng Zhu,
Lei Bai, Rui Zhao
- Abstract summary: Trajectory recovery based on snapshots from the city-wide multi-camera network facilitates urban mobility sensing and driveway optimization.
This paper proposes VisionTraj, the first learning-based model that reconstructs vehicle trajectories from snapshots recorded by road network cameras.
- Score: 18.99662554949384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Trajectory recovery based on the snapshots from the city-wide multi-camera
network facilitates urban mobility sensing and driveway optimization. The
state-of-the-art solutions devoted to such a vision-based scheme typically
incorporate predefined rules or unsupervised iterative feedback, struggling
with multi-fold challenges such as lack of open-source datasets for training
the whole pipeline, and the vulnerability to the noises from visual inputs. In
response to the dilemma, this paper proposes VisionTraj, the first
learning-based model that reconstructs vehicle trajectories from snapshots
recorded by road network cameras. Coupled with it, we elaborate on two rational
vision-trajectory datasets, which produce extensive trajectory data along with
corresponding visual snapshots, enabling supervised vision-trajectory interplay
extraction. Following the data creation, based on the results from the
off-the-shelf multi-modal vehicle clustering, we first re-formulate the
trajectory recovery problem as a generative task and introduce the canonical
Transformer as the autoregressive backbone. Then, to identify clustering noises
(e.g., false positives) with the bound on the snapshots' spatiotemporal
dependencies, a GCN-based soft-denoising module is conducted based on the fine-
and coarse-grained Re-ID clusters. Additionally, we harness strong semantic
information extracted from the tracklet to provide detailed insights into the
vehicle's entry and exit actions during trajectory recovery. The denoising and
tracklet components can also act as plug-and-play modules to boost baselines.
Experimental results on the two hand-crafted datasets show that the proposed
VisionTraj achieves a maximum +11.5% improvement against the sub-best model.
Related papers
- DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input [45.04354435388718]
We propose a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input.
We jointly train a pose network, a depth network, and a Gaussian network to predict the primitives that represent the driving scenes.
Our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.
arXiv Detail & Related papers (2024-09-19T13:16:04Z) - UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised [12.440461420762265]
Road segmentation is a critical task for autonomous driving systems.
Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps.
One of the primary challenges is the scarcity of large-scale, accurately labeled datasets.
arXiv Detail & Related papers (2024-09-10T03:57:30Z) - EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation [17.0226030258296]
Associating driver attention with driving scene across two fields of views is a hard cross-domain perception problem.
Previous methods typically focus on a single view or map attention to the scene via estimated gaze.
We propose a novel method for end-to-end scene-associated driver attention estimation, called EraWNet.
arXiv Detail & Related papers (2024-08-16T07:12:47Z) - Leveraging the Power of Data Augmentation for Transformer-based Tracking [64.46371987827312]
We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
arXiv Detail & Related papers (2023-09-15T09:18:54Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Backbone is All Your Need: A Simplified Architecture for Visual Object
Tracking [69.08903927311283]
Existing tracking approaches rely on customized sub-modules and need prior knowledge for architecture selection.
This paper presents a simplified tracking architecture (SimTrack) by leveraging a transformer backbone for joint feature extraction and interaction.
Our SimTrack improves the baseline with 2.5%/2.6% AUC gains on LaSOT/TNL2K and gets results competitive with other specialized tracking algorithms without bells and whistles.
arXiv Detail & Related papers (2022-03-10T12:20:58Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Transformer Meets Convolution: A Bilateral Awareness Net-work for
Semantic Segmentation of Very Fine Resolution Ur-ban Scene Images [6.460167724233707]
We propose a bilateral awareness network (BANet) which contains a dependency path and a texture path.
BANet captures the long-range relationships and fine-grained details in VFR images.
Experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effective-ness of BANet.
arXiv Detail & Related papers (2021-06-23T13:57:36Z) - Multi-modal Scene-compliant User Intention Estimation for Navigation [1.9117798322548485]
A framework to generated user intention distributions when operating a mobile vehicle is proposed in this work.
The model learns from past observed trajectories and leverages traversability information derived from the visual surroundings.
Experiments were conducted on a dataset collected with a custom wheelchair model built onto the open-source urban driving simulator CARLA.
arXiv Detail & Related papers (2021-06-13T05:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.