Cyclist Trajectory Forecasts by Incorporation of Multi-View Video
Information
- URL: http://arxiv.org/abs/2106.15991v1
- Date: Wed, 30 Jun 2021 11:34:43 GMT
- Title: Cyclist Trajectory Forecasts by Incorporation of Multi-View Video
Information
- Authors: Stefan Zernetsch and Oliver Trupp and Viktor Kress and Konrad Doll and
Bernhard Sick
- Abstract summary: This article presents a novel approach to incorporate visual cues from video-data from a wide-angle stereo camera system mounted at an urban intersection into the forecast of cyclist trajectories.
We extract features from image and optical flow sequences using 3D convolutional neural networks (3D-ConvNet) and combine them with features extracted from the cyclist's past trajectory to forecast future cyclist positions.
- Score: 2.984037222955095
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This article presents a novel approach to incorporate visual cues from
video-data from a wide-angle stereo camera system mounted at an urban
intersection into the forecast of cyclist trajectories. We extract features
from image and optical flow (OF) sequences using 3D convolutional neural
networks (3D-ConvNet) and combine them with features extracted from the
cyclist's past trajectory to forecast future cyclist positions. By the use of
additional information, we are able to improve positional accuracy by about 7.5
% for our test dataset and by up to 22 % for specific motion types compared to
a method solely based on past trajectories. Furthermore, we compare the use of
image sequences to the use of OF sequences as additional information, showing
that OF alone leads to significant improvements in positional accuracy. By
training and testing our methods using a real-world dataset recorded at a
heavily frequented public intersection and evaluating the methods' runtimes, we
demonstrate the applicability in real traffic scenarios. Our code and parts of
our dataset are made publicly available.
Related papers
- PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - Automatic vehicle trajectory data reconstruction at scale [2.010294990327175]
We propose an automatic trajectory data reconciliation to correct common errors in vision-based vehicle trajectory data.
We show that the reconciled trajectories improve the accuracy on all the tested input data for a wide range of measures.
arXiv Detail & Related papers (2022-12-15T15:39:55Z) - Cross-Camera Trajectories Help Person Retrieval in a Camera Network [124.65912458467643]
Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network.
We propose a pedestrian retrieval framework based on cross-camera generation, which integrates both temporal and spatial information.
To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset.
arXiv Detail & Related papers (2022-04-27T13:10:48Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [62.265410865423]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Robust 2D/3D Vehicle Parsing in CVIS [54.825777404511605]
We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS)
Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters.
In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation.
arXiv Detail & Related papers (2021-03-11T03:35:05Z) - Vehicle Trajectory Prediction in Crowded Highway Scenarios Using Bird
Eye View Representations and CNNs [0.0]
This paper describes a novel approach to perform vehicle trajectory predictions employing graphic representations.
The problem is faced as an image to image regression problem training the network to learn the underlying relations between the traffic participants.
The model has been tested in highway scenarios with more than 30 vehicles simultaneously in two opposite traffic flow streams.
arXiv Detail & Related papers (2020-08-26T11:15:49Z) - Crowdsourced 3D Mapping: A Combined Multi-View Geometry and
Self-Supervised Learning Approach [10.610403488989428]
We propose a framework that estimates the 3D positions of semantically meaningful landmarks without assuming known camera intrinsics.
We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning.
We achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively.
arXiv Detail & Related papers (2020-07-25T12:10:16Z) - AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points [92.91569287889203]
We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction.
To better capture the moving objects in videos, we introduce dynamic points.
We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
arXiv Detail & Related papers (2020-07-11T08:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.