Related papers: Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks

Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks

URL: http://arxiv.org/abs/2408.16445v2
Date: Sun, 15 Sep 2024 19:02:12 GMT
Title: Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks
Authors: Sierra Bonilla, Chiara Di Vece, Rema Daher, Xinwei Ju, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano,
Abstract summary: Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision. Traditionally, parametric techniques have been employed for this task. Recent advancements have seen a shift towards learning-based methods.
Score: 9.388897214344572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision, with applications ranging from navigation and object tracking to segmentation and three-dimensional modeling. Traditionally, parametric techniques have been employed for this task. However, recent advancements have seen a shift towards learning-based methods. Given the rapid pace of research and the frequent introduction of new image matching methods, it is essential to evaluate them. In this paper, we present a comprehensive evaluation of various image matching methods using a structure-from-motion pipeline. We assess the performance of these methods on both in-domain and out-of-domain datasets, identifying key limitations in both the methods and benchmarks. We also investigate the impact of edge detection as a pre-processing step. Our analysis reveals that image matching for 3D reconstruction remains an open challenge, necessitating careful selection and tuning of models for specific scenarios, while also highlighting mismatches in how metrics currently represent method performance.

Related papers

Deep Learning Reforms Image Matching: A Survey and Outlook [38.104899835728574]
Image matching serves as a cornerstone in computer vision and underpins a wide range of applications.<n>Recent deep learning advances have significantly boosted both robustness and accuracy.<n>This survey adopts a unique perspective by comprehensively reviewing how deep learning has incrementally transformed the classical image matching pipeline.
arXiv Detail & Related papers (2025-06-05T04:25:22Z)
Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z)
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z)
Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches. The deep learning approach confirmed its potential for performing image segmentation. The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z)
DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving. We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
Data-Driven Interpolation for Super-Scarce X-Ray Computed Tomography [1.3535770763481902]
We train shallow neural networks to combine two neighbouring acquisitions into an estimated measurement at an intermediate angle. This yields an enhanced sequence of measurements that can be reconstructed using standard methods. Results are obtained for 2D and 3D imaging, on large biomedical datasets.
arXiv Detail & Related papers (2022-05-16T15:42:41Z)
Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. We follow a retrieval-based strategy and prevent the network from learning object-specific features. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z)
End-to-end learning of keypoint detection and matching for relative pose estimation [1.8352113484137624]
We propose a new method for estimating the relative pose between two images. We jointly learn keypoint detection, description extraction, matching and robust pose estimation. We demonstrate our method for the task of visual localization of a query image within a database of images with known pose.
arXiv Detail & Related papers (2021-04-02T15:16:17Z)
Recent Progress in Appearance-based Action Recognition [73.6405863243707]
Action recognition is a task to identify various human actions in a video. Recent appearance-based methods have achieved promising progress towards accurate action recognition.
arXiv Detail & Related papers (2020-11-25T10:18:12Z)
Novel Object Viewpoint Estimation through Reconstruction Alignment [45.16865218423492]
We learn a reconstruct and align approach to estimate the viewpoint of a novel object. In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss. At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image.
arXiv Detail & Related papers (2020-06-05T17:58:14Z)
HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning [24.13425816781179]
Local feature extraction remains an active research area due to the advances in fields such as SLAM, 3D reconstructions, or AR applications. We propose a method that treats both extractions independently and focuses on their interaction in the learning process. We show improvements over the state of the art in terms of image matching on HPatches and 3D reconstruction quality while keeping on par on camera localisation tasks.
arXiv Detail & Related papers (2020-05-12T13:55:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.