Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction
from High-Angle Video
- URL: http://arxiv.org/abs/2209.08417v1
- Date: Sat, 17 Sep 2022 22:32:05 GMT
- Title: Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction
from High-Angle Video
- Authors: Tianya T. Zhang Ph.D., Peter J. Jin Ph.D., Han Zhou, Benedetto
Piccoli, Ph.D
- Abstract summary: We develop a model that imposes parity constraints at both pixel and instance levels to generate instance-aware embeddings for vehicle segmentation on STMap.
The designed model is applied to process all public NGSIM US-101 videos to generate complete vehicle trajectories.
- Score: 1.8520147498637294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial-temporal Map (STMap)-based methods have shown great potential to
process high-angle videos for vehicle trajectory reconstruction, which can meet
the needs of various data-driven modeling and imitation learning applications.
In this paper, we developed Spatial-Temporal Deep Embedding (STDE) model that
imposes parity constraints at both pixel and instance levels to generate
instance-aware embeddings for vehicle stripe segmentation on STMap. At pixel
level, each pixel was encoded with its 8-neighbor pixels at different ranges,
and this encoding is subsequently used to guide a neural network to learn the
embedding mechanism. At the instance level, a discriminative loss function is
designed to pull pixels belonging to the same instance closer and separate the
mean value of different instances far apart in the embedding space. The output
of the spatial-temporal affinity is then optimized by the mutex-watershed
algorithm to obtain final clustering results. Based on segmentation metrics,
our model outperformed five other baselines that have been used for STMap
processing and shows robustness under the influence of shadows, static noises,
and overlapping. The designed model is applied to process all public NGSIM
US-101 videos to generate complete vehicle trajectories, indicating a good
scalability and adaptability. Last but not least, the strengths of the scanline
method with STDE and future directions were discussed. Code, STMap dataset and
video trajectory are made publicly available in the online repository. GitHub
Link: shorturl.at/jklT0.
Related papers
- LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations [11.395749549636868]
Lane Model Transformer Network (LMT-Net) is an encoder-decoder neural network architecture that performs polyline encoding and predicts lane pairs and their connectivity.
We evaluate the performance of LMT-Net on an internal dataset that consists of multiple vehicle observations as well as human annotations as Ground Truth (GT)
arXiv Detail & Related papers (2024-09-19T02:14:35Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - iSDF: Real-Time Neural Signed Distance Fields for Robot Perception [64.80458128766254]
iSDF is a continuous learning system for real-time signed distance field reconstruction.
It produces more accurate reconstructions and better approximations of collision costs and gradients.
arXiv Detail & Related papers (2022-04-05T15:48:39Z) - Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles
with Adaptive Truncated Signed Distance Function [9.414880946870916]
We propose a novel 3D reconstruction and semantic mapping system using LiDAR and camera sensors.
An Adaptive Truncated Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities.
An optimal image patch selection strategy is proposed to estimate the optimal semantic class for each triangle mesh.
arXiv Detail & Related papers (2022-02-28T15:11:25Z) - Spatial-Temporal Map Vehicle Trajectory Detection Using Dynamic Mode
Decomposition and Res-UNet+ Neural Networks [0.0]
This paper presents a machine-learning-enhanced longitudinal scanline method to extract vehicle trajectories from high-angle traffic cameras.
The Dynamic Mode Decomposition (DMD) method is applied to extract vehicle strands by decomposing the Spatial-Temporal Map (STMap) into the sparse foreground and low-rank background.
A deep neural network named Res-UNet+ was designed for the semantic segmentation task by adapting two prevalent deep learning architectures.
arXiv Detail & Related papers (2022-01-13T00:49:24Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.