DRAN: Detailed Region-Adaptive Normalization for Conditional Image
Synthesis
- URL: http://arxiv.org/abs/2109.14525v4
- Date: Mon, 26 Jun 2023 11:16:21 GMT
- Title: DRAN: Detailed Region-Adaptive Normalization for Conditional Image
Synthesis
- Authors: Yueming Lyu, Peibin Chen, Jingna Sun, Bo Peng, Xu Wang, Jing Dong
- Abstract summary: We propose a novel normalization module, named Detailed Region-Adaptive Normalization(DRAN)
It adaptively learns both fine-grained and coarse-grained style representations.
We collect a new makeup dataset (Makeup-Complex dataset) that contains a wide range of complex makeup styles.
- Score: 25.936764522125703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, conditional image synthesis has attracted growing attention
due to its controllability in the image generation process. Although recent
works have achieved realistic results, most of them have difficulty handling
fine-grained styles with subtle details. To address this problem, a novel
normalization module, named Detailed Region-Adaptive Normalization~(DRAN), is
proposed. It adaptively learns both fine-grained and coarse-grained style
representations. Specifically, we first introduce a multi-level structure,
Spatiality-aware Pyramid Pooling, to guide the model to learn coarse-to-fine
features. Then, to adaptively fuse different levels of styles, we propose
Dynamic Gating, making it possible to adaptively fuse different levels of
styles according to different spatial regions. Finally, we collect a new makeup
dataset (Makeup-Complex dataset) that contains a wide range of complex makeup
styles with diverse poses and expressions. To evaluate the effectiveness and
show the general use of our method, we conduct a set of experiments on makeup
transfer and semantic image synthesis. Quantitative and qualitative experiments
show that equipped with DRAN, simple baseline models are able to achieve
promising improvements in complex style transfer and detailed texture
synthesis. Both the code and the proposed dataset will be available at
https://github.com/Yueming6568/DRAN-makeup.git.
Related papers
- Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation [6.479933058008389]
Style-Extracting Diffusion Models generate images with unseen characteristics beneficial for downstream tasks.
In this work, we show the capability of our method on a natural image dataset as a proof-of-concept.
We verify the added value of the generated images by showing improved segmentation results and lower performance variability between patients.
arXiv Detail & Related papers (2024-03-21T14:36:59Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - Style Generation: Image Synthesis based on Coarsely Matched Texts [10.939482612568433]
We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network.
The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature.
The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
arXiv Detail & Related papers (2023-09-08T21:51:11Z) - Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain
Generalization [21.591831983223997]
We propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation.
Our method is based on a novel masked noise encoder for StyleGAN2 inversion.
We achieve up to $12.4%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts.
arXiv Detail & Related papers (2023-07-02T19:56:43Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.