Fine-tuning deep learning models for stereo matching using results from
semi-global matching
- URL: http://arxiv.org/abs/2205.14051v1
- Date: Fri, 27 May 2022 15:38:10 GMT
- Title: Fine-tuning deep learning models for stereo matching using results from
semi-global matching
- Authors: Hessah Albanwan, Rongjun Qin
- Abstract summary: Deep learning (DL) methods are widely investigated for stereo image matching tasks due to their reported high accuracies.
With satellite images covering large-scale areas with variances in locations, content, land covers, and spatial patterns, we expect their performances to be impacted.
We propose a finetuning method that takes advantage of disparity maps derived from Census-based semi-global-matching (SGM) on target stereo data.
- Score: 1.0152838128195467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) methods are widely investigated for stereo image matching
tasks due to their reported high accuracies. However, their
transferability/generalization capabilities are limited by the instances seen
in the training data. With satellite images covering large-scale areas with
variances in locations, content, land covers, and spatial patterns, we expect
their performances to be impacted. Increasing the number and diversity of
training data is always an option, but with the ground-truth disparity being
limited in remote sensing due to its high cost, it is almost impossible to
obtain the ground-truth for all locations. Knowing that classical stereo
matching methods such as Census-based semi-global-matching (SGM) are widely
adopted to process different types of stereo data, we therefore, propose a
finetuning method that takes advantage of disparity maps derived from SGM on
target stereo data. Our proposed method adopts a simple scheme that uses the
energy map derived from the SGM algorithm to select high confidence disparity
measurements, at the same utilizing the images to limit these selected
disparity measurements on texture-rich regions. Our approach aims to
investigate the possibility of improving the transferability of current DL
methods to unseen target data without having their ground truth as a
requirement. To perform a comprehensive study, we select 20 study-sites around
the world to cover a variety of complexities and densities. We choose
well-established DL methods like geometric and context network (GCNet), pyramid
stereo matching network (PSMNet), and LEAStereo for evaluation. Our results
indicate an improvement in the transferability of the DL methods across
different regions visually and numerically.
Related papers
- An evaluation of Deep Learning based stereo dense matching dataset shift
from aerial images and a large scale stereo dataset [2.048226951354646]
We present a method for generating ground-truth disparity maps directly from Light Detection and Ranging (LiDAR) and images.
We evaluate 11 dense matching methods across datasets with diverse scene types, image resolutions, and geometric configurations.
arXiv Detail & Related papers (2024-02-19T20:33:46Z) - A Comparative Study on Deep-Learning Methods for Dense Image Matching of
Multi-angle and Multi-date Remote Sensing Stereo Images [1.0152838128195467]
This paper provides an evaluation of four deep learning (DL) stereo matching methods through hundreds of multi-date multi-site satellite stereo pairs.
Our experiments show that E2E algorithms can achieve upper limits of geometric accuracies, while may not generalize well for unseen data.
All DL algorithms are robust to geometric configurations of stereo pairs and are less sensitive in comparison to the Census-SGM.
arXiv Detail & Related papers (2022-10-25T14:10:04Z) - Evaluating the Label Efficiency of Contrastive Self-Supervised Learning
for Multi-Resolution Satellite Imagery [0.0]
Self-supervised learning has been applied in the remote sensing domain to exploit readily-available unlabeled data.
In this paper, we study self-supervised visual representation learning through the lens of label efficiency.
arXiv Detail & Related papers (2022-10-13T06:54:13Z) - Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly
Supervised Object Detection [54.24966006457756]
We propose a WSOD framework called the Spatial Likelihood Voting with Self-knowledge Distillation Network (SLV-SD Net)
SLV-SD Net converges region proposal localization without bounding box annotations.
Experiments on the PASCAL VOC 2007/2012 and MS-COCO datasets demonstrate the excellent performance of SLV-SD Net.
arXiv Detail & Related papers (2022-04-14T11:56:19Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale
Transformer [17.455782652441187]
We propose a semi-supervised network for wide-angle portraits correction.
Our network, named as Multi-Scale Swin-Unet (MS-Unet), is built upon the multi-scale swin transformer block (MSTB)
arXiv Detail & Related papers (2021-09-14T09:40:25Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Improving Deep Stereo Network Generalization with Geometric Priors [93.09496073476275]
Large datasets of diverse real-world scenes with dense ground truth are difficult to obtain.
Many algorithms rely on small real-world datasets of similar scenes or synthetic datasets.
We propose to incorporate prior knowledge of scene geometry into an end-to-end stereo network to help networks generalize better.
arXiv Detail & Related papers (2020-08-25T15:24:02Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.