RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
- URL: http://arxiv.org/abs/2007.08513v1
- Date: Thu, 16 Jul 2020 17:59:04 GMT
- Title: RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
- Authors: Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang
- Abstract summary: We propose a differentiable retrieval module to synthesize images from scene description with retrieved patches as reference.
We conduct extensive quantitative and qualitative experiments to demonstrate that the proposed method can generate realistic and diverse images.
- Score: 76.87013602243053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image generation from scene description is a cornerstone technique for the
controlled generation, which is beneficial to applications such as content
creation and image editing. In this work, we aim to synthesize images from
scene description with retrieved patches as reference. We propose a
differentiable retrieval module. With the differentiable retrieval module, we
can (1) make the entire pipeline end-to-end trainable, enabling the learning of
better feature embedding for retrieval; (2) encourage the selection of mutually
compatible patches with additional objective functions. We conduct extensive
quantitative and qualitative experiments to demonstrate that the proposed
method can generate realistic and diverse images, where the retrieved patches
are reasonable and mutually compatible.
Related papers
- Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently.
We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame.
We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z) - Robust Multi-Modal Image Stitching for Improved Scene Understanding [2.0476854378186102]
We've devised a unique and comprehensive image-stitching pipeline that taps into OpenCV's stitching module.
Our approach integrates feature-based matching, transformation estimation, and blending techniques to bring about panoramic views that are of top-tier quality.
arXiv Detail & Related papers (2023-12-28T13:24:48Z) - Materialistic: Selecting Similar Materials in Images [30.85562156542794]
We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area.
Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images.
We demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.
arXiv Detail & Related papers (2023-05-22T17:50:48Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - RTIC: Residual Learning for Text and Image Composition using Graph
Convolutional Network [19.017377597937617]
We study the compositional learning of images and texts for image retrieval.
We introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods.
arXiv Detail & Related papers (2021-04-07T09:41:52Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.