Keypoints Tracking via Transformer Networks
- URL: http://arxiv.org/abs/2203.12848v1
- Date: Thu, 24 Mar 2022 05:06:46 GMT
- Title: Keypoints Tracking via Transformer Networks
- Authors: Oleksii Nasypanyi, Francois Rameau
- Abstract summary: We propose a pioneering work on sparse keypoints tracking across images using transformer networks.
We study the particular case of real-time and robust keypoints tracking.
Our method consists of two successive stages, a coarse matching followed by a fine localization of the keypoints' correspondences.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this thesis, we propose a pioneering work on sparse keypoints tracking
across images using transformer networks. While deep learning-based keypoints
matching have been widely investigated using graph neural networks - and more
recently transformer networks, they remain relatively too slow to operate in
real-time and are particularly sensitive to the poor repeatability of the
keypoints detectors. In order to address these shortcomings, we propose to
study the particular case of real-time and robust keypoints tracking.
Specifically, we propose a novel architecture which ensures a fast and robust
estimation of the keypoints tracking between successive images of a video
sequence. Our method takes advantage of a recent breakthrough in computer
vision, namely, visual transformer networks. Our method consists of two
successive stages, a coarse matching followed by a fine localization of the
keypoints' correspondences prediction. Through various experiments, we
demonstrate that our approach achieves competitive results and demonstrates
high robustness against adverse conditions, such as illumination change,
occlusion and viewpoint differences.
Related papers
- Self-supervised Interest Point Detection and Description for Fisheye and
Perspective Images [7.451395029642832]
Keypoint detection and matching is a fundamental task in many computer vision problems.
In this work, we focus on the case when this is caused by the geometry of the cameras used for image acquisition.
We build on a state-of-the-art approach and derive a self-supervised procedure that enables training an interest point detector and descriptor network.
arXiv Detail & Related papers (2023-06-02T22:39:33Z) - ViT-Calibrator: Decision Stream Calibration for Vision Transformer [49.60474757318486]
We propose a new paradigm dubbed Decision Stream that boosts the performance of general Vision Transformers.
We shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions.
arXiv Detail & Related papers (2023-04-10T02:40:24Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - BTranspose: Bottleneck Transformers for Human Pose Estimation with
Self-Supervised Pre-Training [0.304585143845864]
In this paper, we consider the recently proposed Bottleneck Transformers, which combine CNN and multi-head self attention (MHSA) layers effectively.
We consider different backbone architectures and pre-train them using the DINO self-supervised learning method.
Experiments show that our model achieves an AP of 76.4, which is competitive with other methods such as [1] and has fewer network parameters.
arXiv Detail & Related papers (2022-04-21T15:45:05Z) - Self-Supervised Equivariant Learning for Oriented Keypoint Detection [35.94215211409985]
We introduce a self-supervised learning framework using rotation-equivariant CNNs to learn to detect robust oriented keypoints.
We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map.
Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.
arXiv Detail & Related papers (2022-04-19T02:26:07Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z) - Augmented Shortcuts for Vision Transformers [49.70151144700589]
We study the relationship between shortcuts and feature diversity in vision transformer models.
We present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts.
Experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-06-30T09:48:30Z) - A Detector-oblivious Multi-arm Network for Keypoint Matching [14.051194519908455]
We propose a Multi-Arm Network (MAN) to learn region overlap and depth.
Comprehensive experiments conducted on outdoor and indoor datasets demonstrated that our proposed MAN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-04-02T08:55:04Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.