Related papers: IP-Composer: Semantic Composition of Visual Concepts

IP-Composer: Semantic Composition of Visual Concepts

URL: http://arxiv.org/abs/2502.13951v1
Date: Wed, 19 Feb 2025 18:49:31 GMT
Title: IP-Composer: Semantic Composition of Visual Concepts
Authors: Sara Dorfman, Dana Cohen-Bar, Rinon Gal, Daniel Cohen-Or,
Abstract summary: We present IP-Composer, a training-free approach for compositional image generation.<n>Our method builds on IP-Adapter, which synthesizes novel images conditioned on an input image's CLIP embedding.<n>We extend this approach to multiple visual inputs by crafting composite embeddings, stitched from the projections of multiple input images onto concept-specific CLIP-subspaces identified through text.
Score: 49.18472621931207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Content creators often draw inspiration from multiple visual sources, combining distinct elements to craft new compositions. Modern computational approaches now aim to emulate this fundamental creative process. Although recent diffusion models excel at text-guided compositional synthesis, text as a medium often lacks precise control over visual details. Image-based composition approaches can capture more nuanced features, but existing methods are typically limited in the range of concepts they can capture, and require expensive training procedures or specialized data. We present IP-Composer, a novel training-free approach for compositional image generation that leverages multiple image references simultaneously, while using natural language to describe the concept to be extracted from each image. Our method builds on IP-Adapter, which synthesizes novel images conditioned on an input image's CLIP embedding. We extend this approach to multiple visual inputs by crafting composite embeddings, stitched from the projections of multiple input images onto concept-specific CLIP-subspaces identified through text. Through comprehensive evaluation, we show that our approach enables more precise control over a larger range of visual concept compositions.

Related papers

Zero-Shot Visual Concept Blending Without Text Guidance [0.0]
"Visual Concept Blending" provides fine-grained control over which features from multiple reference images are transferred to a source image. Our method enables the flexible transfer of texture, shape, motion, style, and more abstract conceptual transformations.
arXiv Detail & Related papers (2025-03-27T08:56:33Z)
Piece it Together: Part-Based Concepting with IP-Priors [52.01640707131325]
We introduce a generative framework that seamlessly integrates a partial set of user-provided visual components into a coherent composition. Our approach builds on a strong and underexplored representation space, extracted from IP-Adapter+. We also present a LoRA-based fine-tuning strategy that significantly improves prompt adherence in IP-Adapter+ for a given task.
arXiv Detail & Related papers (2025-03-13T13:46:10Z)
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [49.2208591663092]
FreeCustom is a tuning-free method to generate customized images of multi-concept composition based on reference concepts. We introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy. Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization.
arXiv Detail & Related papers (2024-05-22T17:53:38Z)
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition [47.07564907486087]
Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models.
arXiv Detail & Related papers (2024-02-23T18:55:09Z)
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes. Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts. However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive. We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z)
Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation [5.107886283951882]
We introduce a localized text-to-image model to handle multi-concept input images. Our method incorporates a novel cross-attention guidance to decompose multiple concepts. Notably, our method generates cross-attention maps consistent with the target concept in the generated images.
arXiv Detail & Related papers (2024-02-15T14:19:42Z)
Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition. We propose augmenting the input image with masks that indicate the presence of target concepts. We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.