Related papers: Achieving Domain Robustness in Stereo Matching Networks by Removing Shortcut Learning

Achieving Domain Robustness in Stereo Matching Networks by Removing Shortcut Learning

URL: http://arxiv.org/abs/2106.08486v1
Date: Tue, 15 Jun 2021 23:22:54 GMT
Title: Achieving Domain Robustness in Stereo Matching Networks by Removing Shortcut Learning
Authors: WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar, David Suter
Abstract summary: We show that learning of features in the synthetic domain is heavily influenced by two "shortcuts" presented in the synthetic data. We will show that by removing such shortcuts, we can achieve domain robustness in the state-of-the-art stereo matching frameworks.
Score: 14.497880004212979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning-based stereo matching and depth estimation networks currently excel on public benchmarks with impressive results. However, state-of-the-art networks often fail to generalize from synthetic imagery to more challenging real data domains. This paper is an attempt to uncover hidden secrets of achieving domain robustness and in particular, discovering the important ingredients of generalization success of stereo matching networks by analyzing the effect of synthetic image learning on real data performance. We provide evidence that demonstrates that learning of features in the synthetic domain by a stereo matching network is heavily influenced by two "shortcuts" presented in the synthetic data: (1) identical local statistics (RGB colour features) between matching pixels in the synthetic stereo images and (2) lack of realism in synthetic textures on 3D objects simulated in game engines. We will show that by removing such shortcuts, we can achieve domain robustness in the state-of-the-art stereo matching frameworks and produce a remarkable performance on multiple realistic datasets, despite the fact that the networks were trained on synthetic data, only. Our experimental results point to the fact that eliminating shortcuts from the synthetic data is key to achieve domain-invariant generalization between synthetic and real data domains.

Related papers

Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients [8.437109106999443]
Domain randomization through synthesis is a powerful strategy to train networks that are unbiased with respect to the domain of the input images. We introduce Learn2 Synth, a novel procedure in which synthesis parameters are learned using a small set of real labeled data. This approach allows the training procedure to benefit from real labeled examples, without ever using these real examples to train the segmentation network.
arXiv Detail & Related papers (2024-11-23T00:52:49Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
Massively Annotated Datasets for Assessment of Synthetic and Real Data in Face Recognition [0.2775636978045794]
We study the drift between the performance of models trained on real and synthetic datasets. We conduct studies on the differences between real and synthetic datasets on the attribute set. Interestingly enough, we have verified that while real samples suffice to explain the synthetic distribution, the opposite could not be further from being true.
arXiv Detail & Related papers (2024-04-23T17:10:49Z)
Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2) The dataset is composed of both real and synthetic videos from seven gesture classes. We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z)
Domain Adaptation of Synthetic Driving Datasets for Real-World Autonomous Driving [0.11470070927586014]
Network trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data. In this paper, we propose and evaluate novel ways for the betterment of such approaches. We propose a novel method to efficiently incorporate semantic supervision into this pair selection, which helps in boosting the performance of the model.
arXiv Detail & Related papers (2023-02-08T15:51:54Z)
Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift. Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z)
Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective [65.37571681370096]
We propose a simple pixel-wise contrastive learning across the viewpoints. A stereo selective whitening loss is introduced to better preserve the stereo feature consistency across domains. Our method achieves superior performance over several state-of-the-art networks.
arXiv Detail & Related papers (2022-03-21T11:21:41Z)
Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans. We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space. Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z)
ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks [14.306250516592305]
We show that learning of feature representations in stereo matching networks is heavily influenced by synthetic data artefacts. We propose an Information-Theoretic Shortcut Avoidance(ITSA) approach to automatically restrict shortcut-related information from being encoded into the feature representations. We show that using this method, state-of-the-art stereo matching networks that are trained purely on synthetic data can effectively generalize to challenging and previously unseen real data scenarios.
arXiv Detail & Related papers (2022-01-06T22:03:50Z)
Fake It Till You Make It: Face analysis in the wild using synthetic data alone [9.081019005437309]
We show that it is possible to perform face-related computer vision in the wild using synthetic data alone. We describe how to combine a procedurally-generated 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism.
arXiv Detail & Related papers (2021-09-30T13:07:04Z)
From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real Data [58.50411487497146]
We propose a novel image dehazing framework collaborating with unlabeled real data. First, we develop a disentangled image dehazing network (DID-Net), which disentangles the feature representations into three component maps. Then a disentangled-consistency mean-teacher network (DMT-Net) is employed to collaborate unlabeled real data for boosting single image dehazing.
arXiv Detail & Related papers (2021-08-06T04:00:28Z)
Attention-based Adversarial Appearance Learning of Augmented Pedestrians [49.25430012369125]
We propose a method to synthesize realistic data for the pedestrian recognition task. Our approach utilizes an attention mechanism driven by an adversarial loss to learn domain discrepancies. Our experiments confirm that the proposed adaptation method is robust to such discrepancies and reveals both visual realism and semantic consistency.
arXiv Detail & Related papers (2021-07-06T15:27:00Z)
Semi-synthesis: A fast way to produce effective datasets for stereo matching [16.602343511350252]
Close-to-real-scene texture rendering is a key factor to boost up stereo matching performance. We propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture. With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
arXiv Detail & Related papers (2021-01-26T14:34:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.