InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
- URL: http://arxiv.org/abs/2503.16418v1
- Date: Thu, 20 Mar 2025 17:59:34 GMT
- Title: InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
- Authors: Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu,
- Abstract summary: InfU addresses issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics.<n>A multi-stage training strategy, including pretraining and supervised fine-tuning, improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting.
- Score: 19.869955517856273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
Related papers
- Robust ID-Specific Face Restoration via Alignment Learning [18.869593414569206]
We present Robust ID-Specific Face Restoration (RIDFR), a novel ID-specific face restoration framework based on diffusion models.<n>RIDFR incorporates Alignment Learning, which aligns the restoration results from multiple references with the same identity in order to suppress the interference of ID-irrelevant face semantics.<n>Experiments demonstrate that our framework outperforms the state-of-the-art methods, reconstructing high-quality ID-specific results with high identity fidelity and demonstrating strong robustness.
arXiv Detail & Related papers (2025-07-15T03:16:12Z) - Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion [35.67333978414322]
We propose a novel framework that improves the separation of identity-related and identity-unrelated features.<n>Our framework consists of two key components: an Implicit-Explicit foreground-background Decoupling Module and a Feature Fusion Module.
arXiv Detail & Related papers (2025-05-28T13:40:46Z) - Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation [54.588082888166504]
We present Mogao, a unified framework that enables interleaved multi-modal generation through a causal approach.<n>Mogoo integrates a set of key technical improvements in architecture design, including a deep-fusion design, dual vision encoders, interleaved rotary position embeddings, and multi-modal classifier-free guidance.<n>Experiments show that Mogao achieves state-of-the-art performance in multi-modal understanding and text-to-image generation, but also excels in producing high-quality, coherent interleaved outputs.
arXiv Detail & Related papers (2025-05-08T17:58:57Z) - ID-Booth: Identity-consistent Face Generation with Diffusion Models [10.042492056152232]
We present a novel generative diffusion-based framework called ID-Booth.
The framework enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models.
Our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity.
arXiv Detail & Related papers (2025-04-10T02:20:18Z) - Unified Autoregressive Visual Generation and Understanding with Continuous Tokens [52.21981295470491]
We present UniFluid, a unified autoregressive framework for joint visual generation and understanding.<n>Our unified autoregressive architecture processes multimodal image and text inputs, generating discrete tokens for text and continuous tokens for image.<n>We find though there is an inherent trade-off between the image generation and understanding task, a carefully tuned training recipe enables them to improve each other.
arXiv Detail & Related papers (2025-03-17T17:58:30Z) - Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion [13.949209965987308]
All-in-One Degradation-Aware Fusion Models (ADFMs) address complex scenes by mitigating degradations from source images and generating high-quality fused images.<n>Mainstream ADFMs often rely on highly synthetic multi-modal multi-quality images for supervision, limiting their effectiveness in cross-modal and rare degradation scenarios.<n>We present LURE, a Learning-driven Unified Representation model for infrared and visible Image Fusion, which is degradation-aware.
arXiv Detail & Related papers (2025-03-10T08:16:36Z) - Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond [74.96466744512992]
The essence of image fusion is to integrate complementary information from source images.
DeFusion++ produces versatile fused representations that can enhance the quality of image fusion and the effectiveness of downstream high-level vision tasks.
arXiv Detail & Related papers (2024-10-16T06:28:49Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - Diffusion-Based Hierarchical Image Steganography [60.69791384893602]
Hierarchical Image Steganography is a novel method that enhances the security and capacity of embedding multiple images into a single container.
It exploits the robustness of the Diffusion Model alongside the reversibility of the Flow Model.
The innovative structure can autonomously generate a container image, thereby securely and efficiently concealing multiple images and text.
arXiv Detail & Related papers (2024-05-19T11:29:52Z) - DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation [84.0586749616249]
This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing.
For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability.
In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC)
This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
arXiv Detail & Related papers (2024-03-26T12:53:10Z) - MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration [7.087475633143941]
MM-Diff is a tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds.
MM-Diff employs a vision encoder to transform the input image into CLS and patch embeddings.
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings.
arXiv Detail & Related papers (2024-03-22T09:32:31Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping [62.38898610210771]
We present a new single-stage method for subject face swapping and identity transfer, named FaceDancer.
We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR)
arXiv Detail & Related papers (2022-10-19T11:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.