Related papers: Image Matching across Wide Baselines: From Paper to Practice

Image Matching across Wide Baselines: From Paper to Practice

URL: http://arxiv.org/abs/2003.01587v5
Date: Thu, 11 Feb 2021 13:50:17 GMT
Title: Image Matching across Wide Baselines: From Paper to Practice
Authors: Yuhe Jin and Dmytro Mishkin and Anastasiia Mishchuk and Jiri Matas and Pascal Fua and Kwang Moo Yi and Eduard Trulls
Abstract summary: We introduce a comprehensive benchmark for local features and robust estimation algorithms. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods. We show that with proper settings, classical solutions may still outperform the perceived state of the art.
Score: 80.9424750998559
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of Structure from Motion (SfM) pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online https://github.com/vcg-uvic/image-matching-benchmark, providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge https://vision.uvic.ca/image-matching-challenge.

Related papers

A Guide to Structureless Visual Localization [63.41481414949785]
Methods that estimate the camera pose of a query image in a known scene are core components of many applications, including self-driving cars and augmented / mixed reality systems. State-of-the-art visual localization algorithms are structure-based, i.e., they store a 3D model of the scene and use 2D-3D correspondences between the query image and 3D points in the model for camera pose estimation. This paper is dedicated to providing, to the best of our knowledge, first comprehensive discussion and comparison of structureless methods.
arXiv Detail & Related papers (2025-04-24T15:08:36Z)
Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment. This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain. Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z)
XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications [34.2082611110639]
This paper presents a novel approach to Visual Inertial Odometry (VIO) focusing on the initialization and feature matching modules. Existing methods for gyroscopes often suffer from poor stability in visual Structure from Motion (SfM) or in solving a huge number of parameters simultaneously. By tightly coupling measurements, we enhance the robustness and accuracy of visual SfM. In terms of feature matching, we introduce a hybrid method that combines optical flow and descriptor-based matching.
arXiv Detail & Related papers (2025-02-03T12:17:51Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
TopicFM: Robust and Interpretable Feature Matching with Topic-assisted [8.314830611853168]
We propose an architecture for image matching which is efficient, robust, and interpretable. We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic. Our method can only perform matching in co-visibility regions to reduce computations.
arXiv Detail & Related papers (2022-07-01T10:39:14Z)
A modular software framework for the design and implementation of ptychography algorithms [55.41644538483948]
We present SciCom, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art reconstruction algorithms. Despite its simplicity, the software leverages accelerated processing through the PyTorch interface. Results are shown on both synthetic and real datasets.
arXiv Detail & Related papers (2022-05-06T16:32:37Z)
Towards a Unified Approach to Homography Estimation Using Image Features and Pixel Intensities [0.0]
The homography matrix is a key component in various vision-based robotic tasks. Traditionally, homography estimation algorithms are classified into feature- or intensity-based. This paper proposes a new hybrid method that unifies both classes into a single nonlinear optimization procedure.
arXiv Detail & Related papers (2022-02-20T02:47:05Z)
Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection [139.10628924049476]
Humans perform co-saliency detection by first summarizing the consensus knowledge in the whole group and then searching corresponding objects in each image. Previous methods usually lack robustness, scalability, or stability for the first process and simply fuse consensus features with image features for the second process. We propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process.
arXiv Detail & Related papers (2021-10-01T12:06:42Z)
Video Frame Interpolation via Structure-Motion based Iterative Fusion [19.499969588931414]
We propose a structure-motion based iterative fusion method for video frame Interpolation. Inspired by the observation that audiences have different visual preferences on foreground and background objects, we for the first time propose to use saliency masks in the evaluation processes of the task of video frame Interpolation.
arXiv Detail & Related papers (2021-05-11T22:11:17Z)
How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm. We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure. This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z)
Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry. We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.