UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
- URL: http://arxiv.org/abs/2509.06818v1
- Date: Mon, 08 Sep 2025 15:54:55 GMT
- Title: UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
- Authors: Yufeng Cheng, Wenxu Wu, Shaojin Wu, Mengqi Huang, Fei Ding, Qian He,
- Abstract summary: We present UMO, a framework designed to maintain high-fidelity identity preservation and alleviate identity confusion with scalability.<n>With "multi-to-multi matching" paradigm, UMO reformulates multi-identity generation as a global assignment optimization problem.<n>We develop a scalable customization dataset with multi-reference images, consisting of both synthesised and real parts.
- Score: 15.094319754425468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in image customization exhibit a wide range of application prospects due to stronger customization capabilities. However, since we humans are more sensitive to faces, a significant challenge remains in preserving consistent identity while avoiding identity confusion with multi-reference images, limiting the identity scalability of customization models. To address this, we present UMO, a Unified Multi-identity Optimization framework, designed to maintain high-fidelity identity preservation and alleviate identity confusion with scalability. With "multi-to-multi matching" paradigm, UMO reformulates multi-identity generation as a global assignment optimization problem and unleashes multi-identity consistency for existing image customization methods generally through reinforcement learning on diffusion models. To facilitate the training of UMO, we develop a scalable customization dataset with multi-reference images, consisting of both synthesised and real parts. Additionally, we propose a new metric to measure identity confusion. Extensive experiments demonstrate that UMO not only improves identity consistency significantly, but also reduces identity confusion on several image customization methods, setting a new state-of-the-art among open-source methods along the dimension of identity preserving. Code and model: https://github.com/bytedance/UMO
Related papers
- Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement [54.199726425201895]
Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks.<n>Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP.<n>We propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration.
arXiv Detail & Related papers (2026-02-21T08:24:42Z) - Towards Generalized Multi-Image Editing for Unified Multimodal Models [56.620038824933566]
Unified Multimodal Models (UMMs) integrate multimodal understanding and generation.<n>UMMs are limited to maintaining visual consistency and disambiguating visual cues when referencing details across multiple input images.<n>We propose a scalable multi-image editing framework for UMMs that explicitly distinguishes image identities and generalizes to variable input counts.
arXiv Detail & Related papers (2026-01-09T06:42:49Z) - A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control [7.810140287905315]
Multi-ID customization is much more difficult and poses two major challenges.<n>It often encounters the copy-paste issue during inference, leading to lower quality.<n>We present an ID-decoupled cross-attention mechanism, injecting distinct ID embeddings into the corresponding image regions.
arXiv Detail & Related papers (2025-11-25T15:28:10Z) - WithAnyone: Towards Controllable and ID Consistent Image Generation [83.55786496542062]
Identity-consistent generation has become an important focus in text-to-image research.<n>We develop a large-scale paired dataset tailored for multi-person scenarios.<n>We propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity.
arXiv Detail & Related papers (2025-10-16T17:59:54Z) - MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion [24.513096225720854]
We introduce a novel task, multi-view customization, which aims to jointly achieve multi-view pose control and customization.<n>We propose MVCustom, a novel diffusion-based framework explicitly designed to achieve both multi-view consistency and customization fidelity.
arXiv Detail & Related papers (2025-10-15T16:00:26Z) - ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models [49.09606704563898]
Person re-identification is a crucial task in computer vision, aiming to recognize individuals across non-overlapping camera views.<n>We propose a novel framework ChatReID, that shifts the focus towards a text-side-dominated retrieval paradigm, enabling flexible and interactive re-identification.<n>We introduce a hierarchical progressive tuning strategy, which endows Re-ID ability through three stages of tuning, i.e., from person attribute understanding to fine-grained image retrieval and to multi-modal task reasoning.
arXiv Detail & Related papers (2025-02-27T10:34:14Z) - Omni-ID: Holistic Identity Representation Designed for Generative Tasks [75.29174595706533]
Omni-ID encodes holistic information about an individual's appearance across diverse expressions.<n>It consolidates information from a varied number of unstructured input images into a structured representation.<n>It demonstrates substantial improvements over conventional representations across various generative tasks.
arXiv Detail & Related papers (2024-12-12T19:21:20Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models [66.05234562835136]
We present MuDI, a novel framework that enables multi-subject personalization.
Our main idea is to utilize segmented subjects generated by a foundation model for segmentation.
Experimental results show that our MuDI can produce high-quality personalized images without identity mixing.
arXiv Detail & Related papers (2024-04-05T17:45:22Z) - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image.
During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z) - X-ReID: Cross-Instance Transformer for Identity-Level Person
Re-Identification [53.047542904329866]
Cross Intra-Identity Instances module (IntraX) fuses different intra-identity instances to transfer Identity-Level knowledge.
Cross Inter-Identity Instances module (InterX) involves hard positive and hard negative instances to improve the attention response to the same identity.
arXiv Detail & Related papers (2023-02-04T03:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.