An evaluation of Deep Learning based stereo dense matching dataset shift
from aerial images and a large scale stereo dataset
- URL: http://arxiv.org/abs/2402.12522v1
- Date: Mon, 19 Feb 2024 20:33:46 GMT
- Title: An evaluation of Deep Learning based stereo dense matching dataset shift
from aerial images and a large scale stereo dataset
- Authors: Teng Wu, Bruno Vallet, Marc Pierrot-Deseilligny, Ewelina Rupnik
- Abstract summary: We present a method for generating ground-truth disparity maps directly from Light Detection and Ranging (LiDAR) and images.
We evaluate 11 dense matching methods across datasets with diverse scene types, image resolutions, and geometric configurations.
- Score: 2.048226951354646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dense matching is crucial for 3D scene reconstruction since it enables the
recovery of scene 3D geometry from image acquisition. Deep Learning (DL)-based
methods have shown effectiveness in the special case of epipolar stereo
disparity estimation in the computer vision community. DL-based methods depend
heavily on the quality and quantity of training datasets. However, generating
ground-truth disparity maps for real scenes remains a challenging task in the
photogrammetry community. To address this challenge, we propose a method for
generating ground-truth disparity maps directly from Light Detection and
Ranging (LiDAR) and images to produce a large and diverse dataset for six
aerial datasets across four different areas and two areas with different
resolution images. We also introduce a LiDAR-to-image co-registration
refinement to the framework that takes special precautions regarding occlusions
and refrains from disparity interpolation to avoid precision loss. Evaluating
11 dense matching methods across datasets with diverse scene types, image
resolutions, and geometric configurations, which are deeply investigated in
dataset shift, GANet performs best with identical training and testing data,
and PSMNet shows robustness across different datasets, and we proposed the best
strategy for training with a limit dataset. We will also provide the dataset
and training models; more information can be found at
https://github.com/whuwuteng/Aerial_Stereo_Dataset.
Related papers
- 3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching [6.13702551312774]
We introduce a large-scale Multi-sources,Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching (3MOS)
It consists of 155K optical-SAR image pairs, including SAR data from six commercial satellites, with resolutions ranging from 1.25m to 12.5m.
The data has been classified into eight scenes including urban, rural, plains, hills, mountains, water, desert, and frozen earth.
arXiv Detail & Related papers (2024-04-01T00:31:11Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - SIDAR: Synthetic Image Dataset for Alignment & Restoration [2.9649783577150837]
There is a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models.
Our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering.
The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal.
arXiv Detail & Related papers (2023-05-19T23:32:06Z) - Unsupervised Domain Adaptation with Histogram-gated Image Translation
for Delayered IC Image Analysis [2.720699926154399]
Histogram-gated Image Translation (HGIT) is an unsupervised domain adaptation framework which transforms images from a given source dataset to the domain of a target dataset.
Our method achieves the best performance compared to the reported domain adaptation techniques, and is also reasonably close to the fully supervised benchmark.
arXiv Detail & Related papers (2022-09-27T15:53:22Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Sci-Net: a Scale Invariant Model for Building Detection from Aerial
Images [0.0]
We propose a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions.
Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations.
arXiv Detail & Related papers (2021-11-12T16:45:20Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z) - Weakly-supervised land classification for coastal zone based on deep
convolutional neural networks by incorporating dual-polarimetric
characteristics into training dataset [1.125851164829582]
We explore the performance of DCNNs on semantic segmentation using spaceborne polarimetric synthetic aperture radar (PolSAR) datasets.
The semantic segmentation task using PolSAR data can be categorized as weakly supervised learning when the characteristics of SAR data and data annotating procedures are factored in.
Three DCNN models, including SegNet, U-Net, and LinkNet, are implemented next.
arXiv Detail & Related papers (2020-03-30T17:32:49Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.