TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature
Matching
- URL: http://arxiv.org/abs/2307.00485v1
- Date: Sun, 2 Jul 2023 06:14:07 GMT
- Title: TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature
Matching
- Authors: Khang Truong Giang, Soohwan Song, Sungho Jo
- Abstract summary: This study tackles the challenge of image matching in difficult scenarios, such as scenes with significant variations or limited texture.
Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers.
We propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images.
- Score: 8.314830611853168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study tackles the challenge of image matching in difficult scenarios,
such as scenes with significant variations or limited texture, with a strong
emphasis on computational efficiency. Previous studies have attempted to
address this challenge by encoding global scene contexts using Transformers.
However, these approaches suffer from high computational costs and may not
capture sufficient high-level contextual information, such as structural shapes
or semantic instances. Consequently, the encoded features may lack
discriminative power in challenging scenes. To overcome these limitations, we
propose a novel image-matching method that leverages a topic-modeling strategy
to capture high-level contexts in images. Our method represents each image as a
multinomial distribution over topics, where each topic represents a latent
semantic instance. By incorporating these topics, we can effectively capture
comprehensive context information and obtain discriminative and high-quality
features. Additionally, our method effectively matches features within
corresponding semantic regions by estimating the covisible topics. To enhance
the efficiency of feature matching, we have designed a network with a
pooling-and-merging attention module. This module reduces computation by
employing attention only on fixed-sized topics and small-sized features.
Through extensive experiments, we have demonstrated the superiority of our
method in challenging scenarios. Specifically, our method significantly reduces
computational costs while maintaining higher image-matching accuracy compared
to state-of-the-art methods. The code will be updated soon at
https://github.com/TruongKhang/TopicFM
Related papers
- Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation [2.2559617939136505]
We propose a simple and training-free method to enhance the validity and robustness of the matching strategy.
The core concept involves randomly dropping feature channels (setting them to zero) during the matching process.
This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios.
arXiv Detail & Related papers (2024-05-19T08:00:38Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Image-Specific Information Suppression and Implicit Local Alignment for
Text-based Person Search [61.24539128142504]
Text-based person search (TBPS) is a challenging task that aims to search pedestrian images with the same identity from an image gallery given a query text.
Most existing methods rely on explicitly generated local parts to model fine-grained correspondence between modalities.
We propose an efficient joint Multi-level Alignment Network (MANet) for TBPS, which can learn aligned image/text feature representations between modalities at multiple levels.
arXiv Detail & Related papers (2022-08-30T16:14:18Z) - TopicFM: Robust and Interpretable Feature Matching with Topic-assisted [8.314830611853168]
We propose an architecture for image matching which is efficient, robust, and interpretable.
We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic.
Our method can only perform matching in co-visibility regions to reduce computations.
arXiv Detail & Related papers (2022-07-01T10:39:14Z) - Bi-level Feature Alignment for Versatile Image Translation and
Manipulation [88.5915443957795]
Generative adversarial networks (GANs) have achieved great success in image translation and manipulation.
High-fidelity image generation with faithful style control remains a grand challenge in computer vision.
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance.
arXiv Detail & Related papers (2021-07-07T05:26:29Z) - Collaboration among Image and Object Level Features for Image
Colourisation [25.60139324272782]
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum.
Previous approaches attacked the problem either by requiring intense user interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image level (context) features.
We propose a single network, named UCapsNet, that separate image-level features obtained through convolutions and object-level features captured by means of capsules.
Then, by skip connections over different layers, we enforce collaboration between such disentangling factors to produce high quality and plausible image colourisation.
arXiv Detail & Related papers (2021-01-19T11:48:12Z) - Multi-layer Feature Aggregation for Deep Scene Parsing Models [19.198074549944568]
In this paper, we explore the effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency.
The proposed module can auto-select the intermediate visual features to correlate the spatial and semantic information.
Experiments on four public scene parsing datasets prove that the deep parsing network equipped with the proposed feature aggregation module can achieve very promising results.
arXiv Detail & Related papers (2020-11-04T23:07:07Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.