PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
- URL: http://arxiv.org/abs/2403.01852v1
- Date: Mon, 4 Mar 2024 09:03:16 GMT
- Title: PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
- Authors: Zhengyao Lv and Yuxiang Wei and Wangmeng Zuo and Kwan-Yee K. Wong
- Abstract summary: High-quality images with consistent semantics and layout remains a challenge.
We propose the adaPtive LAyout-semantiC fusion modulE (PLACE) that harnesses pre-trained models to alleviate the aforementioned issues.
Our approach performs favorably in terms of visual quality, semantic consistency, and layout alignment.
- Score: 62.29033292210752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large-scale pre-trained text-to-image models have led
to remarkable progress in semantic image synthesis. Nevertheless, synthesizing
high-quality images with consistent semantics and layout remains a challenge.
In this paper, we propose the adaPtive LAyout-semantiC fusion modulE (PLACE)
that harnesses pre-trained models to alleviate the aforementioned issues.
Specifically, we first employ the layout control map to faithfully represent
layouts in the feature space. Subsequently, we combine the layout and semantic
features in a timestep-adaptive manner to synthesize images with realistic
details. During fine-tuning, we propose the Semantic Alignment (SA) loss to
further enhance layout alignment. Additionally, we introduce the Layout-Free
Prior Preservation (LFP) loss, which leverages unlabeled data to maintain the
priors of pre-trained models, thereby improving the visual quality and semantic
consistency of synthesized images. Extensive experiments demonstrate that our
approach performs favorably in terms of visual quality, semantic consistency,
and layout alignment. The source code and model are available at
https://github.com/cszy98/PLACE/tree/main.
Related papers
- Label-free Neural Semantic Image Synthesis [12.194020204848492]
We introduce the concept of neural semantic image synthesis, which uses neural layouts extracted from pre-trained foundation models as conditioning.
We experimentally show that images synthesized via neural semantic image synthesis achieve similar or superior pixel-level alignment of semantic classes.
We show that images generated by neural layout conditioning can effectively augment real data for training various perception tasks.
arXiv Detail & Related papers (2024-07-01T20:30:23Z) - Spatial-Aware Latent Initialization for Controllable Image Generation [9.23227552726271]
Text-to-image diffusion models have demonstrated impressive ability to generate high-quality images conditioned on the textual input.
Previous research has primarily focused on aligning cross-attention maps with layout conditions.
We propose leveraging a spatial-aware initialization noise during the denoising process to achieve better layout control.
arXiv Detail & Related papers (2024-01-29T13:42:01Z) - LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis [24.925757148750684]
We propose a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions.
LoCo seamlessly integrates into existing text-to-image and layout-to-image models, enhancing their performance in spatial control and addressing semantic failures observed in prior methods.
arXiv Detail & Related papers (2023-11-21T04:28:12Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff.
In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt.
The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z) - High-Fidelity Guided Image Synthesis with Latent Diffusion Models [50.39294302741698]
The proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
Human user study results show that the proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
arXiv Detail & Related papers (2022-11-30T15:43:20Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - Semantic-shape Adaptive Feature Modulation for Semantic Image Synthesis [71.56830815617553]
A fine-grained part-level semantic layout will benefit object details generation.
A Shape-aware Position Descriptor (SPD) is proposed to describe each pixel's positional feature.
A Semantic-shape Adaptive Feature Modulation (SAFM) block is proposed to combine the given semantic map and our positional features.
arXiv Detail & Related papers (2022-03-31T09:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.