LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation
- URL: http://arxiv.org/abs/2411.15252v1
- Date: Fri, 22 Nov 2024 08:44:39 GMT
- Title: LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation
- Authors: Fan Deng, Yaguang Wu, Xinyang Yu, Xiangjun Huang, Jian Yang, Guangyu Yan, Qiang Xu,
- Abstract summary: We present LocRef-Diffusion, a tuning-free model capable of personalized customization of multiple instances' appearance and position within an image.
To enhance the precision of instance placement, we introduce a layout-net, which controls instance generation locations.
To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features.
- Score: 17.169772329737913
- License:
- Abstract: Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need of further development. In this paper, we present LocRef-Diffusion, a novel, tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a Layout-net, which controls instance generation locations by leveraging both explicit instance layout information and an instance region cross-attention module. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features and integrates them into the diffusion model through cross-attention mechanisms. We conducted extensive experiments on the COCO and OpenImages datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance in layout and appearance guided generation.
Related papers
- EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.
The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.
We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment [40.112548587906005]
We present Refine-by-Align, a first-of-its-kind model that employs a diffusion-based framework to address this challenge.
We show that our pipeline greatly pushes the boundary of fine details in the image synthesis models.
arXiv Detail & Related papers (2024-11-30T01:26:04Z) - Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [40.06403155373455]
We propose a novel reinforcement learning framework for personalized text-to-image generation.
Our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment.
arXiv Detail & Related papers (2024-07-09T08:11:53Z) - LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models.
We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z) - GazeFusion: Saliency-Guided Image Generation [50.37783903347613]
Diffusion models offer unprecedented image generation power given just a text prompt.
We present a saliency-guided framework to incorporate the data priors of human visual attention mechanisms into the generation process.
arXiv Detail & Related papers (2024-03-16T21:01:35Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Diffusion Self-Guidance for Controllable Image Generation [106.59989386924136]
Self-guidance provides greater control over generated images by guiding the internal representations of diffusion models.
We show how a simple set of properties can be composed to perform challenging image manipulations.
We also show that self-guidance can be used to edit real images.
arXiv Detail & Related papers (2023-06-01T17:59:56Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - LayoutDiffuse: Adapting Foundational Diffusion Models for
Layout-to-Image Generation [24.694298869398033]
Our method trains efficiently, generates images with both high perceptual quality and layout alignment.
Our method significantly outperforms other 10 generative models based on GANs, VQ-VAE, and diffusion models.
arXiv Detail & Related papers (2023-02-16T14:20:25Z) - Paint by Example: Exemplar-based Image Editing with Diffusion Models [35.84464684227222]
In this paper, we investigate exemplar-guided image editing for more precise control.
We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar.
We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.
arXiv Detail & Related papers (2022-11-23T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.