HVC-Net: Unifying Homography, Visibility, and Confidence Learning for
Planar Object Tracking
- URL: http://arxiv.org/abs/2209.08924v1
- Date: Mon, 19 Sep 2022 11:11:56 GMT
- Title: HVC-Net: Unifying Homography, Visibility, and Confidence Learning for
Planar Object Tracking
- Authors: Haoxian Zhang, Yonggen Ling
- Abstract summary: We present a unified convolutional neural network (CNN) model that jointly considers homography, visibility, and confidence.
Our approach outperforms the state-of-the-art methods on public POT and TMT datasets.
- Score: 5.236567998857959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust and accurate planar tracking over a whole video sequence is vitally
important for many vision applications. The key to planar object tracking is to
find object correspondences, modeled by homography, between the reference image
and the tracked image. Existing methods tend to obtain wrong correspondences
with changing appearance variations, camera-object relative motions and
occlusions. To alleviate this problem, we present a unified convolutional
neural network (CNN) model that jointly considers homography, visibility, and
confidence. First, we introduce correlation blocks that explicitly account for
the local appearance changes and camera-object relative motions as the base of
our model. Second, we jointly learn the homography and visibility that links
camera-object relative motions with occlusions. Third, we propose a confidence
module that actively monitors the estimation quality from the pixel correlation
distributions obtained in correlation blocks. All these modules are plugged
into a Lucas-Kanade (LK) tracking pipeline to obtain both accurate and robust
planar object tracking. Our approach outperforms the state-of-the-art methods
on public POT and TMT datasets. Its superior performance is also verified on a
real-world application, synthesizing high-quality in-video advertisements.
Related papers
- UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - Spatio-Temporal Relation Learning for Video Anomaly Detection [35.59510027883497]
Anomaly identification is highly dependent on the relationship between the object and the scene.
In this paper, we propose a Spatial-Temporal Relation Learning framework to tackle the video anomaly detection task.
Experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2022-09-27T02:19:31Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.