TopicFM: Robust and Interpretable Feature Matching with Topic-assisted
- URL: http://arxiv.org/abs/2207.00328v1
- Date: Fri, 1 Jul 2022 10:39:14 GMT
- Title: TopicFM: Robust and Interpretable Feature Matching with Topic-assisted
- Authors: Khang Truong Giang, Soohwan Song, Sungho Jo
- Abstract summary: We propose an architecture for image matching which is efficient, robust, and interpretable.
We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic.
Our method can only perform matching in co-visibility regions to reduce computations.
- Score: 8.314830611853168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finding correspondences across images is an important task in many visual
applications. Recent state-of-the-art methods focus on end-to-end
learning-based architectures designed in a coarse-to-fine manner. They use a
very deep CNN or multi-block Transformer to learn robust representation, which
requires high computation power. Moreover, these methods learn features without
reasoning about objects, shapes inside images, thus lacks of interpretability.
In this paper, we propose an architecture for image matching which is
efficient, robust, and interpretable. More specifically, we introduce a novel
feature matching module called TopicFM which can roughly organize same spatial
structure across images into a topic and then augment the features inside each
topic for accurate matching. To infer topics, we first learn global embedding
of topics and then use a latent-variable model to detect-then-assign the image
structures into topics. Our method can only perform matching in co-visibility
regions to reduce computations. Extensive experiments in both outdoor and
indoor datasets show that our method outperforms the recent methods in terms of
matching performance and computational efficiency. The code is available at
https://github.com/TruongKhang/TopicFM.
Related papers
- Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature
Matching [8.314830611853168]
This study tackles the challenge of image matching in difficult scenarios, such as scenes with significant variations or limited texture.
Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers.
We propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images.
arXiv Detail & Related papers (2023-07-02T06:14:07Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.