Learning Knowledge-Rich Sequential Model for Planar Homography
Estimation in Aerial Video
- URL: http://arxiv.org/abs/2304.02715v1
- Date: Wed, 5 Apr 2023 19:28:58 GMT
- Title: Learning Knowledge-Rich Sequential Model for Planar Homography
Estimation in Aerial Video
- Authors: Pu Li, Xiaobai Liu
- Abstract summary: We develop a sequential estimator that processes a sequence of video frames and estimates their pairwise planar homographic transformations in batches.
We also incorporate a set of spatial-temporal knowledge to regularize the learning of such a sequence-to-sequence model.
Empirical studies suggest that our sequential model achieves significant improvement over alternative image-based methods.
- Score: 12.853493070295457
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents an unsupervised approach that leverages raw aerial videos
to learn to estimate planar homographic transformation between consecutive
video frames. Previous learning-based estimators work on pairs of images to
estimate their planar homographic transformations but suffer from severe
over-fitting issues, especially when applying over aerial videos. To address
this concern, we develop a sequential estimator that directly processes a
sequence of video frames and estimates their pairwise planar homographic
transformations in batches. We also incorporate a set of spatial-temporal
knowledge to regularize the learning of such a sequence-to-sequence model. We
collect a set of challenging aerial videos and compare the proposed method to
the alternative algorithms. Empirical studies suggest that our sequential model
achieves significant improvement over alternative image-based methods and the
knowledge-rich regularization further boosts our system performance. Our codes
and dataset could be found at https://github.com/Paul-LiPu/DeepVideoHomography
Related papers
- Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models [96.97910688908956]
We introduce the first zero-shot approach for Video Semantic (VSS) based on pre-trained diffusion models.
We propose a framework tailored for VSS based on pre-trained image and video diffusion models.
Experiments show that our proposed approach outperforms existing zero-shot image semantic segmentation approaches.
arXiv Detail & Related papers (2024-05-27T08:39:38Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream.
This poses great challenges given the high correlation between consecutive video frames.
We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z) - Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle
Adjustment [21.98302129015761]
We propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework.
We show that our method PoseDiffusion significantly improves over the classic SfM pipelines.
It is observed that our method can generalize across datasets without further training.
arXiv Detail & Related papers (2023-06-27T17:59:07Z) - Half-sibling regression meets exoplanet imaging: PSF modeling and
subtraction using a flexible, domain knowledge-driven, causal framework [7.025418443146435]
Existing post-processing algorithms do not use all prior domain knowledge that is available about the problem.
We propose a new method that builds on our understanding of the systematic noise and the causal structure of the data-generating process.
Our algorithm provides a better false-positive fraction than PCA-based PSF subtraction, a popular baseline method in the field.
arXiv Detail & Related papers (2022-04-07T13:34:30Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Efficient training for future video generation based on hierarchical
disentangled representation of latent variables [66.94698064734372]
We propose a novel method for generating future prediction videos with less memory usage than the conventional methods.
We achieve high-efficiency by training our method in two stages: (1) image reconstruction to encode video frames into latent variables, and (2) latent variable prediction to generate the future sequence.
Our experiments show that the proposed method can efficiently generate future prediction videos, even for complex datasets that cannot be handled by previous methods.
arXiv Detail & Related papers (2021-06-07T10:43:23Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Continual Learning of Predictive Models in Video Sequences via
Variational Autoencoders [6.698751933050415]
This paper proposes a method for performing continual learning of predictive models that facilitate the inference of future frames in video sequences.
An initial Variational Autoencoder, together with a set of fully connected neural networks are utilized to respectively learn the appearance of video frames and their dynamics at the latent space level.
arXiv Detail & Related papers (2020-06-02T21:17:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.