Structural-analogy from a Single Image Pair
- URL: http://arxiv.org/abs/2004.02222v3
- Date: Wed, 6 Jan 2021 16:57:44 GMT
- Title: Structural-analogy from a Single Image Pair
- Authors: Sagie Benaim, Ron Mokady, Amit Bermano, Daniel Cohen-Or, Lior Wolf
- Abstract summary: In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
We generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A.
Our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only.
- Score: 118.61885732829117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of unsupervised image-to-image translation has seen substantial
advancements in recent years through the use of deep neural networks.
Typically, the proposed solutions learn the characterizing distribution of two
large, unpaired collections of images, and are able to alter the appearance of
a given image, while keeping its geometry intact. In this paper, we explore the
capabilities of neural networks to understand image structure given only a
single pair of images, A and B. We seek to generate images that are
structurally aligned: that is, to generate an image that keeps the appearance
and style of B, but has a structural arrangement that corresponds to A. The key
idea is to map between image patches at different scales. This enables
controlling the granularity at which analogies are produced, which determines
the conceptual distinction between style and content. In addition to structural
alignment, our method can be used to generate high quality imagery in other
conditional generation tasks utilizing images A and B only: guided image
synthesis, style and texture transfer, text translation as well as video
translation. Our code and additional results are available in
https://github.com/rmokady/structural-analogy/.
Related papers
- SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Describing Sets of Images with Textual-PCA [89.46499914148993]
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.
Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases.
arXiv Detail & Related papers (2022-10-21T17:10:49Z) - Review Neural Networks about Image Transformation Based on IGC Learning
Framework with Annotated Information [13.317099281011515]
In Computer Vision (CV), many problems can be regarded as the image transformation task, e.g., semantic segmentation and style transfer.
Some surveys only review the research on style transfer or image-to-image translation, all of which are just a branch of image transformation.
This paper proposes a novel learning framework including Independent learning, Guided learning, and Cooperative learning.
arXiv Detail & Related papers (2022-06-21T07:27:47Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - OptGAN: Optimizing and Interpreting the Latent Space of the Conditional
Text-to-Image GANs [8.26410341981427]
We study how to ensure that generated samples are believable, realistic or natural.
We present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture.
arXiv Detail & Related papers (2022-02-25T20:00:33Z) - Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task.
Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions.
We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Dual Graph Convolutional Networks with Transformer and Curriculum
Learning for Image Captioning [26.496357517937614]
Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image.
We propose Dual Graph Convolutional Networks (Dual-GCN) with transformer and curriculum learning for image captioning.
arXiv Detail & Related papers (2021-08-05T04:57:06Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.