Related papers: XoFTR: Cross-modal Feature Matching Transformer

XoFTR: Cross-modal Feature Matching Transformer

URL: http://arxiv.org/abs/2404.09692v1
Date: Mon, 15 Apr 2024 11:46:24 GMT
Title: XoFTR: Cross-modal Feature Matching Transformer
Authors: Önder Tuzcuoğlu, Aybora Köksal, Buğra Sofu, Sinan Kalkan, A. Aydın Alatan,
Abstract summary: Cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. XTRoF incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.
Score: 7.686047196317477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.

Related papers

TIR-Diffusion: Diffusion-based Thermal Infrared Image Denoising via Latent and Wavelet Domain Optimization [11.970228442183476]
We propose a diffusion-based TIR image denoising framework.<n>Our method fine-tunes the model via a novel loss function combining latent-space and discrete wavelet transform (DWT) / dual-tree complex wavelet transform (DTCWT) losses.<n> Experiments on benchmark datasets demonstrate superior performance of our approach compared to state-of-the-art denoising methods.
arXiv Detail & Related papers (2025-07-30T06:27:32Z)
Infrared and Visible Image Fusion Based on Implicit Neural Representations [3.8530055385287403]
Infrared and visible light image fusion aims to combine the strengths of both modalities to generate images that are rich in information.<n>This paper proposes an image fusion method based on Implicit Neural Representations (INR), referred to as INRFuse.<n> Experimental results indicate that INRFuse outperforms existing methods in both subjective visual quality and objective evaluation metrics.
arXiv Detail & Related papers (2025-06-20T06:34:19Z)
Image Quality Assessment: Enhancing Perceptual Exploration and Interpretation with Collaborative Feature Refinement and Hausdorff distance [47.01352278293561]
Current full-reference image quality assessment (FR-IQA) methods often fuse features from reference and distorted images. This work introduces a pioneering training-free FR-IQA method that accurately predicts image quality in alignment with the human visual system.
arXiv Detail & Related papers (2024-12-20T12:39:49Z)
Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters. We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z)
Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR) In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks. We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z)
Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement [71.13353154514418]
Low-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge. We present a novel Mamba scanning mechanism, called RAWMamba, to effectively handle raw images with different CFAs. We also present a Retinex Decomposition Module (RDM) grounded in Retinex prior, which decouples illumination from reflectance to facilitate more effective denoising and automatic non-linear exposure correction.
arXiv Detail & Related papers (2024-09-11T06:12:03Z)
Inter-Instance Similarity Modeling for Contrastive Learning [22.56316444504397]
We propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT) Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images. Our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets.
arXiv Detail & Related papers (2023-06-21T13:03:47Z)
Breaking Modality Disparity: Harmonized Representation for Infrared and Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration. We employ homography to simulate the deformation between different planes. We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z)
Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion. To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z)
Towards Homogeneous Modality Learning and Multi-Granularity Information Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views. Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data. In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z)
Simultaneous Face Hallucination and Translation for Thermal to Visible Face Verification using Axial-GAN [74.22129648654783]
We introduce the task of thermal-to-visible face verification from low-resolution thermal images. We propose Axial-Generative Adversarial Network (Axial-GAN) to synthesize high-resolution visible images for matching.
arXiv Detail & Related papers (2021-04-13T22:34:28Z)
Bayesian Fusion for Infrared and Visible Images [26.64101343489016]
In this paper, a novel Bayesian fusion model is established for infrared and visible images. We aim at making the fused image satisfy human visual system. Compared with the previous methods, the novel model can generate better fused images with high-light targets and rich texture details.
arXiv Detail & Related papers (2020-05-12T14:57:19Z)
Pyramidal Edge-maps and Attention based Guided Thermal Super-resolution [28.798966778371145]
Guided super-resolution (GSR) of thermal images using visible range images is challenging because of the difference in the spectral-range between the images. We propose a novel algorithm for GSR based on pyramidal edge-maps extracted from the visible image. Our model outperforms the state-of-the-art GSR methods, both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-13T12:11:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.