Autoencoder for Synthetic to Real Generalization: From Simple to More
Complex Scenes
- URL: http://arxiv.org/abs/2204.00386v1
- Date: Fri, 1 Apr 2022 12:23:41 GMT
- Title: Autoencoder for Synthetic to Real Generalization: From Simple to More
Complex Scenes
- Authors: Steve Dias Da Cruz, Bertram Taetz, Thomas Stifter, Didier Stricker
- Abstract summary: We focus on autoencoder architectures and aim at learning latent space representations that are invariant to inductive biases caused by the domain shift between simulated and real images.
We present approaches to increase generalizability and improve the preservation of the semantics to real datasets of increasing visual complexity.
- Score: 13.618797548020462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning on synthetic data and transferring the resulting properties to their
real counterparts is an important challenge for reducing costs and increasing
safety in machine learning. In this work, we focus on autoencoder architectures
and aim at learning latent space representations that are invariant to
inductive biases caused by the domain shift between simulated and real images
showing the same scenario. We train on synthetic images only, present
approaches to increase generalizability and improve the preservation of the
semantics to real datasets of increasing visual complexity. We show that
pre-trained feature extractors (e.g. VGG) can be sufficient for generalization
on images of lower complexity, but additional improvements are required for
visually more complex scenes. To this end, we demonstrate a new sampling
technique, which matches semantically important parts of the image, while
randomizing the other parts, leads to salient feature extraction and a
neglection of unimportant parts. This helps the generalization to real data and
we further show that our approach outperforms fine-tuned classification models.
Related papers
- Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection [13.840950434728533]
State-of-the-art Synthetic Image Detection (SID) research has led to strong evidence on the advantages of feature extraction from foundation models.
We leverage the image representations extracted by intermediate Transformer blocks of CLIP's image-encoder via a lightweight network.
Our method is compared against the state-of-the-art by evaluating it on 20 test datasets and exhibits an average +10.6% absolute performance improvement.
arXiv Detail & Related papers (2024-02-29T12:18:43Z) - RestoreFormer++: Towards Real-World Blind Face Restoration from
Undegraded Key-Value Pairs [63.991802204929485]
Blind face restoration aims at recovering high-quality face images from those with unknown degradations.
Current algorithms mainly introduce priors to complement high-quality details and achieve impressive progress.
We propose RestoreFormer++, which introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors.
We show that RestoreFormer++ outperforms state-of-the-art algorithms on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-08-14T16:04:53Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training [52.93808218720784]
Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks.
Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tuning performance scales with pre-trained models.
We observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data.
arXiv Detail & Related papers (2021-08-25T02:29:28Z) - Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure
Synthetic Data [17.529045507657944]
We extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN)
A high-order degradation modeling process is introduced to better simulate complex real-world degradations.
We also consider the common ringing and overshoot artifacts in the synthesis process.
arXiv Detail & Related papers (2021-07-22T17:43:24Z) - On the Transfer of Disentangled Representations in Realistic Settings [44.367245337475445]
We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images.
We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings.
arXiv Detail & Related papers (2020-10-27T16:15:24Z) - Synthetic Convolutional Features for Improved Semantic Segmentation [139.5772851285601]
We suggest to generate intermediate convolutional features and propose the first synthesis approach that is catered to such intermediate convolutional features.
This allows us to generate new features from label masks and include them successfully into the training procedure.
Experimental results and analysis on two challenging datasets Cityscapes and ADE20K show that our generated feature improves performance on segmentation tasks.
arXiv Detail & Related papers (2020-09-18T14:12:50Z) - Automated Synthetic-to-Real Generalization [142.41531132965585]
We propose a textitlearning-to-optimize (L2O) strategy to automate the selection of layer-wise learning rates.
We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data.
arXiv Detail & Related papers (2020-07-14T10:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.