CustAny: Customizing Anything from A Single Example
- URL: http://arxiv.org/abs/2406.11643v4
- Date: Fri, 22 Nov 2024 09:31:14 GMT
- Title: CustAny: Customizing Anything from A Single Example
- Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Chengming Xu, Donghao Luo, Mengtian Li, Jiangning Zhang, Chengjie Wang, Yanwei Fu,
- Abstract summary: We present a novel pipeline to construct a large dataset of general objects, featuring 315k text-image samples across 10k categories.
With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects.
Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field.
- Score: 73.90939022698399
- License:
- Abstract: Recent advances in diffusion-based text-to-image models have simplified creating high-fidelity images, but preserving the identity (ID) of specific elements, like a personal dog, is still challenging. Object customization, using reference images and textual descriptions, is key to addressing this issue. Current object customization methods are either object-specific, requiring extensive fine-tuning, or object-agnostic, offering zero-shot customization but limited to specialized domains. The primary issue of promoting zero-shot object customization from specific domains to the general domain is to establish a large-scale general ID dataset for model pre-training, which is time-consuming and labor-intensive. In this paper, we propose a novel pipeline to construct a large dataset of general objects and build the Multi-Category ID-Consistent (MC-IDC) dataset, featuring 315k text-image samples across 10k categories. With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects. CustAny features three key components: a general ID extraction module, a dual-level ID injection module, and an ID-aware decoupling module, allowing it to customize any object from a single reference image and text prompt. Experiments demonstrate that CustAny outperforms existing methods in both general object customization and specialized domains like human customization and virtual try-on. Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field. Code and dataset will be released soon in https://github.com/LingjieKong-fdu/CustAny.
Related papers
- AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation [14.68987039472664]
We propose AnyStory, a unified approach for personalized subject generation.
AnyStory achieves high-fidelity personalization for single subjects, but also for multiple subjects, without sacrificing subject fidelity.
arXiv Detail & Related papers (2025-01-16T12:28:39Z) - DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting [63.01425442236011]
We present DreamMix, a diffusion-based generative model adept at inserting target objects into scenes at user-specified locations.
We propose an Attribute Decoupling Mechanism (ADM) and a Textual Attribute Substitution (TAS) module to improve the diversity and discriminative capability of the text-based attribute guidance.
arXiv Detail & Related papers (2024-11-26T08:44:47Z) - UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization [10.760799194716922]
UniPortrait is an innovative human image personalization framework that unifies single- and multi-ID customization.
UniPortrait consists of only two plug-and-play modules: an ID embedding module and an ID routing module.
arXiv Detail & Related papers (2024-08-12T06:27:29Z) - Customizing Text-to-Image Diffusion with Object Viewpoint Control [53.621518249820745]
We introduce a new task -- enabling explicit control of the object viewpoint in the customization of text-to-image diffusion models.
This allows us to modify the custom object's properties and generate it in various background scenes via text prompts.
We propose to condition the diffusion process on the 3D object features rendered from the target viewpoint.
arXiv Detail & Related papers (2024-04-18T16:59:51Z) - LLM Blueprint: Enabling Text-to-Image Generation with Complex and
Detailed Prompts [60.54912319612113]
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts.
We present a novel approach leveraging Large Language Models (LLMs) to extract critical components from text prompts.
Our evaluation on complex prompts featuring multiple objects demonstrates a substantial improvement in recall compared to baseline diffusion models.
arXiv Detail & Related papers (2023-10-16T17:57:37Z) - Conditional Cross Attention Network for Multi-Space Embedding without
Entanglement in Only a SINGLE Network [1.8899300124593648]
We propose a Conditional Cross-Attention Network that induces disentangled multi-space embeddings for various specific attributes with only a single backbone.
Our proposed method achieved consistent state-of-the-art performance on the FashionAI, DARN, DeepFashion, and Zappos50K benchmark datasets.
arXiv Detail & Related papers (2023-07-25T04:48:03Z) - Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning [6.288699905490906]
We propose Subject-Diffusion, a novel open-domain personalized image generation model.
Our method outperforms other SOTA frameworks in single, multiple, and human customized image generation.
arXiv Detail & Related papers (2023-07-21T08:09:47Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.