Related papers: Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques

Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques

URL: http://arxiv.org/abs/2507.22791v1
Date: Wed, 30 Jul 2025 15:56:36 GMT
Title: Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Authors: Weide Liu, Wei Zhou, Jun Liu, Ping Hu, Jun Cheng, Jungong Han, Weisi Lin,
Abstract summary: Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM.<n>This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and contemporary deep learning approaches.
Score: 91.26187560114381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.

Related papers

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences.<n>We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images.<n>Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z)
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations? [55.99654128127689]
Cross-modal contrastive distillation has recently been explored for learning effective 3D representations.<n>Existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process.<n>We propose a new framework, namely CMCR, to address these shortcomings.
arXiv Detail & Related papers (2024-12-12T06:09:49Z)
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization [2.423045468361048]
We introduce CABLD, a novel self-supervised deep learning framework for 3D brain landmark detection in unlabeled scans.<n>We demonstrate the proposed method with the intricate task of MRI-based 3D brain landmark detection.<n>Our framework provides a robust and accurate solution for anatomical landmark detection, reducing the need for extensively annotated datasets.
arXiv Detail & Related papers (2024-11-26T19:56:29Z)
Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.<n>Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z)
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain [46.44049019428938]
We introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method.<n>LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy.<n>We propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets.
arXiv Detail & Related papers (2024-02-09T05:06:58Z)
Single-subject Multi-contrast MRI Super-resolution via Implicit Neural Representations [9.683341998041634]
Implicit Neural Representations (INR) proposed to learn two different contrasts of complementary views in a continuous spatial function. Our model provides realistic super-resolution across different pairs of contrasts in our experiments with three datasets.
arXiv Detail & Related papers (2023-03-27T10:18:42Z)
Generalized Iris Presentation Attack Detection Algorithm under Cross-Database Settings [63.90855798947425]
Presentation attacks pose major challenges to most of the biometric modalities. We propose a generalized deep learning-based presentation attack detection network, MVANet. It is inspired by the simplicity and success of hybrid algorithm or fusion of multiple detection networks.
arXiv Detail & Related papers (2020-10-25T22:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.