Image-to-Image Translation for Autonomous Driving from Coarsely-Aligned
Image Pairs
- URL: http://arxiv.org/abs/2209.11673v1
- Date: Fri, 23 Sep 2022 16:03:18 GMT
- Title: Image-to-Image Translation for Autonomous Driving from Coarsely-Aligned
Image Pairs
- Authors: Youya Xia, Josephine Monica, Wei-Lun Chao, Bharath Hariharan, Kilian Q
Weinberger, Mark Campbell
- Abstract summary: A self-driving car must be able to handle adverse weather conditions to operate safely.
In this paper, we investigate the idea of turning sensor inputs captured in an adverse condition into a benign one.
We show that our coarsely-aligned training scheme leads to a better image translation quality and improved downstream tasks.
- Score: 57.33431586417377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A self-driving car must be able to reliably handle adverse weather conditions
(e.g., snowy) to operate safely. In this paper, we investigate the idea of
turning sensor inputs (i.e., images) captured in an adverse condition into a
benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic
segmentation) can attain high accuracy. Prior work primarily formulates this as
an unpaired image-to-image translation problem due to the lack of paired images
captured under the exact same camera poses and semantic layouts. While
perfectly-aligned images are not available, one can easily obtain
coarsely-paired images. For instance, many people drive the same routes daily
in both good and adverse weather; thus, images captured at close-by GPS
locations can form a pair. Though data from repeated traversals are unlikely to
capture the same foreground objects, we posit that they provide rich contextual
information to supervise the image translation model. To this end, we propose a
novel training objective leveraging coarsely-aligned image pairs. We show that
our coarsely-aligned training scheme leads to a better image translation
quality and improved downstream tasks, such as semantic segmentation, monocular
depth estimation, and visual localization.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - Towards Pragmatic Semantic Image Synthesis for Urban Scenes [4.36080478413575]
We present a new task: given a dataset with synthetic images and labels and a dataset with unlabeled real images, our goal is to learn a model that can generate images with the content of the input mask and the appearance of real images.
We leverage the synthetic image as a guide to the content of the generated image by penalizing the difference between their high-level features on a patch level.
In contrast to previous works which employ one discriminator that overfits the target domain semantic distribution, we employ a discriminator for the whole image and multiscale discriminators on the image patches.
arXiv Detail & Related papers (2023-05-16T18:01:12Z) - Unpaired Translation from Semantic Label Maps to Images by Leveraging
Domain-Specific Simulations [11.638139969660266]
We introduce a contrastive learning framework for generating photorealistic images from simulated label maps.
Our proposed method is shown to generate realistic and scene-accurate translations.
arXiv Detail & Related papers (2023-02-21T14:36:18Z) - Extremal Domain Translation with Neural Optimal Transport [76.38747967445994]
We propose the extremal transport (ET) which is a formalization of the theoretically best possible unpaired translation between a pair of domains.
Inspired by the recent advances in neural optimal transport (OT), we propose a scalable algorithm to approximate ET maps as a limit of partial OT maps.
We test our algorithm on toy examples and on the unpaired image-to-image translation task.
arXiv Detail & Related papers (2023-01-30T13:28:23Z) - Semi-Supervised Image-to-Image Translation using Latent Space Mapping [37.232496213047845]
We introduce a general framework for semi-supervised image translation.
Our main idea is to learn the translation over the latent feature space instead of the image space.
Thanks to the low dimensional feature space, it is easier to find the desired mapping function.
arXiv Detail & Related papers (2022-03-29T05:14:26Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - PREGAN: Pose Randomization and Estimation for Weakly Paired Image Style
Translation [11.623477199795037]
We propose a weakly-paired setting for the style translation, where the content in the two images is aligned with errors in poses.
PREGAN is validated on both simulated and real-world collected data to show the effectiveness.
arXiv Detail & Related papers (2020-10-31T16:11:11Z) - Contrastive Learning for Unpaired Image-to-Image Translation [64.47477071705866]
In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain.
We propose a framework based on contrastive learning to maximize mutual information between the two.
We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time.
arXiv Detail & Related papers (2020-07-30T17:59:58Z) - A Sim2Real Deep Learning Approach for the Transformation of Images from
Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's
Eye View [0.0]
Distances can be more easily estimated when the camera perspective is transformed to a bird's eye view (BEV)
This paper describes a methodology to obtain a corrected 360deg BEV image given images from multiple vehicle-mounted cameras.
The neural network approach does not rely on manually labeled data, but is trained on a synthetic dataset in such a way that it generalizes well to real-world data.
arXiv Detail & Related papers (2020-05-08T14:54:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.