Learning and Crafting for the Wide Multiple Baseline Stereo
- URL: http://arxiv.org/abs/2112.12027v1
- Date: Wed, 22 Dec 2021 16:52:55 GMT
- Title: Learning and Crafting for the Wide Multiple Baseline Stereo
- Authors: Dmytro Mishkin
- Abstract summary: This thesis introduces the wide multiple baseline stereo (WxBS) problem.
WxBS considers the matching of images that differ in more than one image acquisition factor.
A new dataset with the ground truth, evaluation metric and baselines has been introduced.
- Score: 4.7210697296108926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This thesis introduces the wide multiple baseline stereo (WxBS) problem.
WxBS, a generalization of the standard wide baseline stereo problem, considers
the matching of images that simultaneously differ in more than one image
acquisition factor such as viewpoint, illumination, sensor type, or where
object appearance changes significantly, e.g., over time. A new dataset with
the ground truth, evaluation metric and baselines has been introduced.
The thesis presents the following improvements of the WxBS pipeline. (i) A
loss function, called HardNeg, for learning a local image descriptor that
relies on hard negative mining within a mini-batch and on the maximization of
the distance between the closest positive and the closest negative patches.
(ii) The descriptor trained with the HardNeg loss, called HardNet, is compact
and shows state-of-the-art performance in standard matching, patch verification
and retrieval benchmarks. (iii) A method for learning the affine shape,
orientation, and potentially other parameters related to geometric and
appearance properties of local features. (iv) A tentative correspondences
generation strategy which generalizes the standard first to second closest
distance ratio is presented. The selection strategy, which shows performance
superior to the standard method, is applicable to either hard-engineered
descriptors like SIFT, LIOP, and MROGH or deeply learned like HardNet. (v) A
feedback loop is introduced for the two-view matching problem, resulting in
MODS -- matching with on-demand view synthesis -- algorithm. MODS is an
algorithm that handles a viewing angle difference even larger than the previous
state-of-the-art ASIFT algorithm, without a significant increase of
computational cost over "standard" wide and narrow baseline approaches. Last,
but not least, a comprehensive benchmark for local features and robust
estimation algorithms is introduced.
Related papers
- Match and Locate: low-frequency monocular odometry based on deep feature
matching [0.65268245109828]
We introduce a novel approach for the robotic odometry which only requires a single camera.
The approach is based on matching image features between the consecutive frames of the video stream using deep feature matching models.
We evaluate the performance of the approach in the AISG-SLA Visual Localisation Challenge and find that while being computationally efficient and easy to implement our method shows competitive results.
arXiv Detail & Related papers (2023-11-16T17:32:58Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for
Weakly-Textured Surface Reconstruction [2.9896482273918434]
This paper proposes robust loss functions leveraging constraints beneath multi-view images to alleviate matching ambiguity.
Our strategy can be implemented with arbitrary depth estimation frameworks and can be trained with arbitrary large-scale MVS datasets.
Our method reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D.
arXiv Detail & Related papers (2022-03-04T07:05:23Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - ProAlignNet : Unsupervised Learning for Progressively Aligning Noisy
Contours [12.791313859673187]
"ProAlignNet" accounts for large scale misalignments and complex transformations between the contour shapes.
It learns by training with a novel loss function which is derived an upperbound of a proximity-sensitive and local shape-dependent similarity metric.
In two real-world applications, the proposed models consistently perform superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-05-23T14:56:14Z) - RANSAC-Flow: generic two-stage image alignment [53.11926395028508]
We show that a simple unsupervised approach performs surprisingly well across a range of tasks.
Despite its simplicity, our method shows competitive results on a range of tasks and datasets.
arXiv Detail & Related papers (2020-04-03T12:37:58Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.