Related papers: LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

URL: http://arxiv.org/abs/2411.15252v1
Date: Fri, 22 Nov 2024 08:44:39 GMT
Title: LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation
Authors: Fan Deng, Yaguang Wu, Xinyang Yu, Xiangjun Huang, Jian Yang, Guangyu Yan, Qiang Xu,
Abstract summary: We present LocRef-Diffusion, a tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a layout-net, which controls instance generation locations. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features.
Score: 17.169772329737913
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need of further development. In this paper, we present LocRef-Diffusion, a novel, tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a Layout-net, which controls instance generation locations by leveraging both explicit instance layout information and an instance region cross-attention module. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features and integrates them into the diffusion model through cross-attention mechanisms. We conducted extensive experiments on the COCO and OpenImages datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance in layout and appearance guided generation.

Related papers

STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation [4.769823364778397]
We propose a diffusion-based model that produces photo-realistic images and provides fine-grained control of stylized objects in scenes. Our approach learns a global condition for each layout, and a self-supervised semantic map for weight modulation. A new Styled-Mask Attention (SM Attention) is also introduced to cross-condition the global condition and image feature for capturing the objects' relationships.
arXiv Detail & Related papers (2025-03-15T17:36:24Z)
EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks. The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm. We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z)
Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment [40.112548587906005]
We present Refine-by-Align, a first-of-its-kind model that employs a diffusion-based framework to address this challenge. We show that our pipeline greatly pushes the boundary of fine details in the image synthesis models.
arXiv Detail & Related papers (2024-11-30T01:26:04Z)
A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks. Our approach enables versatile capabilities via different inference-time sampling schemes. Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z)
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [40.06403155373455]
We propose a novel reinforcement learning framework for personalized text-to-image generation. Our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment.
arXiv Detail & Related papers (2024-07-09T08:11:53Z)
FilterPrompt: Guiding Image Transfer in Diffusion Models [9.386850486378382]
FilterPrompt is an approach to enhance the model control effect. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features.
arXiv Detail & Related papers (2024-04-20T04:17:34Z)
LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models. We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z)
GazeFusion: Saliency-guided Image Generation [50.37783903347613]
Diffusion models offer unprecedented image generation capabilities given just a text prompt. We present a saliency-guided framework to incorporate the data priors of human visual attention into the generation process.
arXiv Detail & Related papers (2024-03-16T21:01:35Z)
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance. SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works. We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z)
Diffusion Self-Guidance for Controllable Image Generation [106.59989386924136]
Self-guidance provides greater control over generated images by guiding the internal representations of diffusion models. We show how a simple set of properties can be composed to perform challenging image manipulations. We also show that self-guidance can be used to edit real images.
arXiv Detail & Related papers (2023-06-01T17:59:56Z)
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z)
Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs) We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model. Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z)
LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation [24.694298869398033]
Our method trains efficiently, generates images with both high perceptual quality and layout alignment. Our method significantly outperforms other 10 generative models based on GANs, VQ-VAE, and diffusion models.
arXiv Detail & Related papers (2023-02-16T14:20:25Z)
Paint by Example: Exemplar-based Image Editing with Diffusion Models [35.84464684227222]
In this paper, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.
arXiv Detail & Related papers (2022-11-23T18:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.