XoFTR: Cross-modal Feature Matching Transformer
- URL: http://arxiv.org/abs/2404.09692v1
- Date: Mon, 15 Apr 2024 11:46:24 GMT
- Title: XoFTR: Cross-modal Feature Matching Transformer
- Authors: Önder Tuzcuoğlu, Aybora Köksal, Buğra Sofu, Sinan Kalkan, A. Aydın Alatan,
- Abstract summary: Cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images.
XTRoF incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences.
To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.
- Score: 7.686047196317477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.
Related papers
- Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement [71.13353154514418]
Low-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge.
We present a novel Mamba scanning mechanism, called RAWMamba, to effectively handle raw images with different CFAs.
We also present a Retinex Decomposition Module (RDM) grounded in Retinex prior, which decouples illumination from reflectance to facilitate more effective denoising and automatic non-linear exposure correction.
arXiv Detail & Related papers (2024-09-11T06:12:03Z) - Inter-Instance Similarity Modeling for Contrastive Learning [22.56316444504397]
We propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT)
Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images.
Our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets.
arXiv Detail & Related papers (2023-06-21T13:03:47Z) - Breaking Modality Disparity: Harmonized Representation for Infrared and
Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration.
We employ homography to simulate the deformation between different planes.
We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z) - Unsupervised Misaligned Infrared and Visible Image Fusion via
Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion.
To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z) - Towards Homogeneous Modality Learning and Multi-Granularity Information
Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views.
Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data.
In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z) - Simultaneous Face Hallucination and Translation for Thermal to Visible
Face Verification using Axial-GAN [74.22129648654783]
We introduce the task of thermal-to-visible face verification from low-resolution thermal images.
We propose Axial-Generative Adversarial Network (Axial-GAN) to synthesize high-resolution visible images for matching.
arXiv Detail & Related papers (2021-04-13T22:34:28Z) - Bayesian Fusion for Infrared and Visible Images [26.64101343489016]
In this paper, a novel Bayesian fusion model is established for infrared and visible images.
We aim at making the fused image satisfy human visual system.
Compared with the previous methods, the novel model can generate better fused images with high-light targets and rich texture details.
arXiv Detail & Related papers (2020-05-12T14:57:19Z) - Pyramidal Edge-maps and Attention based Guided Thermal Super-resolution [28.798966778371145]
Guided super-resolution (GSR) of thermal images using visible range images is challenging because of the difference in the spectral-range between the images.
We propose a novel algorithm for GSR based on pyramidal edge-maps extracted from the visible image.
Our model outperforms the state-of-the-art GSR methods, both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-13T12:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.