Achieving Domain Robustness in Stereo Matching Networks by Removing
Shortcut Learning
- URL: http://arxiv.org/abs/2106.08486v1
- Date: Tue, 15 Jun 2021 23:22:54 GMT
- Title: Achieving Domain Robustness in Stereo Matching Networks by Removing
Shortcut Learning
- Authors: WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar, David Suter
- Abstract summary: We show that learning of features in the synthetic domain is heavily influenced by two "shortcuts" presented in the synthetic data.
We will show that by removing such shortcuts, we can achieve domain robustness in the state-of-the-art stereo matching frameworks.
- Score: 14.497880004212979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based stereo matching and depth estimation networks currently excel
on public benchmarks with impressive results. However, state-of-the-art
networks often fail to generalize from synthetic imagery to more challenging
real data domains. This paper is an attempt to uncover hidden secrets of
achieving domain robustness and in particular, discovering the important
ingredients of generalization success of stereo matching networks by analyzing
the effect of synthetic image learning on real data performance. We provide
evidence that demonstrates that learning of features in the synthetic domain by
a stereo matching network is heavily influenced by two "shortcuts" presented in
the synthetic data: (1) identical local statistics (RGB colour features)
between matching pixels in the synthetic stereo images and (2) lack of realism
in synthetic textures on 3D objects simulated in game engines. We will show
that by removing such shortcuts, we can achieve domain robustness in the
state-of-the-art stereo matching frameworks and produce a remarkable
performance on multiple realistic datasets, despite the fact that the networks
were trained on synthetic data, only. Our experimental results point to the
fact that eliminating shortcuts from the synthetic data is key to achieve
domain-invariant generalization between synthetic and real data domains.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Massively Annotated Datasets for Assessment of Synthetic and Real Data in Face Recognition [0.2775636978045794]
We study the drift between the performance of models trained on real and synthetic datasets.
We conduct studies on the differences between real and synthetic datasets on the attribute set.
Interestingly enough, we have verified that while real samples suffice to explain the synthetic distribution, the opposite could not be further from being true.
arXiv Detail & Related papers (2024-04-23T17:10:49Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Domain Adaptation of Synthetic Driving Datasets for Real-World
Autonomous Driving [0.11470070927586014]
Network trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data.
In this paper, we propose and evaluate novel ways for the betterment of such approaches.
We propose a novel method to efficiently incorporate semantic supervision into this pair selection, which helps in boosting the performance of the model.
arXiv Detail & Related papers (2023-02-08T15:51:54Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z) - Revisiting Domain Generalized Stereo Matching Networks from a Feature
Consistency Perspective [65.37571681370096]
We propose a simple pixel-wise contrastive learning across the viewpoints.
A stereo selective whitening loss is introduced to better preserve the stereo feature consistency across domains.
Our method achieves superior performance over several state-of-the-art networks.
arXiv Detail & Related papers (2022-03-21T11:21:41Z) - ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance
and Domain Generalization in Stereo Matching Networks [14.306250516592305]
We show that learning of feature representations in stereo matching networks is heavily influenced by synthetic data artefacts.
We propose an Information-Theoretic Shortcut Avoidance(ITSA) approach to automatically restrict shortcut-related information from being encoded into the feature representations.
We show that using this method, state-of-the-art stereo matching networks that are trained purely on synthetic data can effectively generalize to challenging and previously unseen real data scenarios.
arXiv Detail & Related papers (2022-01-06T22:03:50Z) - Fake It Till You Make It: Face analysis in the wild using synthetic data
alone [9.081019005437309]
We show that it is possible to perform face-related computer vision in the wild using synthetic data alone.
We describe how to combine a procedurally-generated 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism.
arXiv Detail & Related papers (2021-09-30T13:07:04Z) - From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real
Data [58.50411487497146]
We propose a novel image dehazing framework collaborating with unlabeled real data.
First, we develop a disentangled image dehazing network (DID-Net), which disentangles the feature representations into three component maps.
Then a disentangled-consistency mean-teacher network (DMT-Net) is employed to collaborate unlabeled real data for boosting single image dehazing.
arXiv Detail & Related papers (2021-08-06T04:00:28Z) - Attention-based Adversarial Appearance Learning of Augmented Pedestrians [49.25430012369125]
We propose a method to synthesize realistic data for the pedestrian recognition task.
Our approach utilizes an attention mechanism driven by an adversarial loss to learn domain discrepancies.
Our experiments confirm that the proposed adaptation method is robust to such discrepancies and reveals both visual realism and semantic consistency.
arXiv Detail & Related papers (2021-07-06T15:27:00Z) - Semi-synthesis: A fast way to produce effective datasets for stereo
matching [16.602343511350252]
Close-to-real-scene texture rendering is a key factor to boost up stereo matching performance.
We propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture.
With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
arXiv Detail & Related papers (2021-01-26T14:34:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.