AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points
- URL: http://arxiv.org/abs/2007.05719v1
- Date: Sat, 11 Jul 2020 08:43:34 GMT
- Title: AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points
- Authors: Yuexin Ma, Xinge ZHU, Xinjing Cheng, Ruigang Yang, Jiming Liu, Dinesh
Manocha
- Abstract summary: We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction.
To better capture the moving objects in videos, we introduce dynamic points.
We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
- Score: 92.91569287889203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current methods for trajectory prediction operate in supervised manners, and
therefore require vast quantities of corresponding ground truth data for
training. In this paper, we present a novel, label-free algorithm,
AutoTrajectory, for trajectory extraction and prediction to use raw videos
directly. To better capture the moving objects in videos, we introduce dynamic
points. We use them to model dynamic motions by using a forward-backward
extractor to keep temporal consistency and using image reconstruction to keep
spatial consistency in an unsupervised manner. Then we aggregate dynamic points
to instance points, which stand for moving objects such as pedestrians in
videos. Finally, we extract trajectories by matching instance points for
prediction training. To the best of our knowledge, our method is the first to
achieve unsupervised learning of trajectory extraction and prediction. We
evaluate the performance on well-known trajectory datasets and show that our
method is effective for real-world videos and can use raw videos to further
improve the performance of existing models.
Related papers
- VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions [10.748597086208145]
In this work, we propose a novel method that also incorporates visual input from surround-view cameras.
Our method achieves a latency of 53 ms, making it feasible for real-time processing.
Our experiments show that both the visual inputs and the textual descriptions contribute to improvements in trajectory prediction performance.
arXiv Detail & Related papers (2024-07-17T06:39:52Z) - LG-Traj: LLM Guided Pedestrian Trajectory Prediction [9.385936248154987]
We introduce LG-Traj, a novel approach to generate motion cues present in pedestrian past/observed trajectories.
These motion cues, along with pedestrian coordinates, facilitate a better understanding of the underlying representation.
Our method employs a transformer-based architecture comprising a motion encoder to model motion patterns and a social decoder to capture social interactions among pedestrians.
arXiv Detail & Related papers (2024-03-12T19:06:23Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models [28.304047711166056]
Large-scale pre-trained models have shown promising advances in detecting and segmenting objects in 2D static images in the wild.
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
In this paper, we re-purpose an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments objects of any category in 2D videos.
arXiv Detail & Related papers (2023-10-10T20:25:30Z) - PreViTS: Contrastive Pretraining with Video Tracking Supervision [53.73237606312024]
PreViTS is an unsupervised SSL framework for selecting clips containing the same object.
PreViTS spatially constrains the frame regions to learn from and trains the model to locate meaningful objects.
We train a momentum contrastive (MoCo) encoder on VGG-Sound and Kinetics-400 datasets with PreViTS.
arXiv Detail & Related papers (2021-12-01T19:49:57Z) - Video Annotation for Visual Tracking via Selection and Refinement [74.08109740917122]
We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
arXiv Detail & Related papers (2021-08-09T05:56:47Z) - Object Tracking Using Spatio-Temporal Future Prediction [41.33609264685531]
We introduce a learning-based tracking method that takes into account background motion modeling and trajectory prediction.
Our trajectory prediction module predicts the target object's locations in the current and future frames based on the object's past trajectory.
To dynamically switch between the appearance-based tracker and the trajectory prediction, we employ a network that can assess how good a tracking prediction is.
arXiv Detail & Related papers (2020-10-15T09:02:50Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.