Attention Mechanism for Contrastive Learning in GAN-based Image-to-Image
Translation
- URL: http://arxiv.org/abs/2302.12052v1
- Date: Thu, 23 Feb 2023 14:23:23 GMT
- Title: Attention Mechanism for Contrastive Learning in GAN-based Image-to-Image
Translation
- Authors: Hanzhen Zhang, Liguo Zhou, Ruining Wang, Alois Knoll
- Abstract summary: We propose a GAN-based model that is capable of generating high-quality images across different domains.
We leverage Contrastive Learning to train the model in a self-supervised way using image data acquired in the real world using real sensors and simulated images from 3D games.
- Score: 3.90801108629495
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Using real road testing to optimize autonomous driving algorithms is
time-consuming and capital-intensive. To solve this problem, we propose a
GAN-based model that is capable of generating high-quality images across
different domains. We further leverage Contrastive Learning to train the model
in a self-supervised way using image data acquired in the real world using real
sensors and simulated images from 3D games. In this paper, we also apply an
Attention Mechanism module to emphasize features that contain more information
about the source domain according to their measurement of significance.
Finally, the generated images are used as datasets to train neural networks to
perform a variety of downstream tasks to verify that the approach can fill in
the gaps between the virtual and real worlds.
Related papers
- Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering [37.46760714516923]
This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars.
By focusing on the fusion of RGB imagery with depth completion information or optical flow data, we propose a framework that integrates these modalities through both early and hybrid fusion techniques.
arXiv Detail & Related papers (2024-09-18T09:36:24Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks [47.07188762367792]
We present ARSim, a framework designed to enhance real multi-view image data with 3D synthetic objects of interest.
We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it.
The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.
arXiv Detail & Related papers (2024-03-22T17:49:11Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [63.54342601757723]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Unsupervised Domain Transfer with Conditional Invertible Neural Networks [83.90291882730925]
We propose a domain transfer approach based on conditional invertible neural networks (cINNs)
Our method inherently guarantees cycle consistency through its invertible architecture, and network training can efficiently be conducted with maximum likelihood.
Our method enables the generation of realistic spectral data and outperforms the state of the art on two downstream classification tasks.
arXiv Detail & Related papers (2023-03-17T18:00:27Z) - Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data [3.9447103367861542]
This paper proposes a generative model of LiDAR range images applicable to the data-level domain transfer.
Motivated by the fact that LiDAR measurement is based on point-by-point range imaging, we train an implicit image representation-based generative adversarial networks.
We demonstrate the fidelity and diversity of our model in comparison with the point-based and image-based state-of-the-art generative models.
arXiv Detail & Related papers (2022-10-21T06:08:39Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - Inertial Sensor Data To Image Encoding For Human Action Recognition [0.0]
Convolutional Neural Networks (CNNs) are successful deep learning models in the field of computer vision.
In this paper, we use 4 types of spatial domain methods for transforming inertial sensor data to activity images.
For creating a multimodal fusion framework, we made each type of activity images multimodal by convolving with two spatial domain filters.
arXiv Detail & Related papers (2021-05-28T01:22:52Z) - Sparse Signal Models for Data Augmentation in Deep Learning ATR [0.8999056386710496]
We propose a data augmentation approach to incorporate domain knowledge and improve the generalization power of a data-intensive learning algorithm.
We exploit the sparsity of the scattering centers in the spatial domain and the smoothly-varying structure of the scattering coefficients in the azimuthal domain to solve the ill-posed problem of over-parametrized model fitting.
arXiv Detail & Related papers (2020-12-16T21:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.