Fugu-MT 論文翻訳(概要): WithAnyone: Towards Controllable and ID Consistent Image Generation

論文の概要: WithAnyone: Towards Controllable and ID Consistent Image Generation

arxiv url: http://arxiv.org/abs/2510.14975v1
Date: Thu, 16 Oct 2025 17:59:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:15.008752
Title: WithAnyone: Towards Controllable and ID Consistent Image Generation
Title（参考訳）: WithAnyone: 制御可能およびID一貫性の画像生成を目指して
Authors: Hengyuan Xu, Wei Cheng, Peng Xing, Yixiao Fang, Shuhan Wu, Rui Wang, Xianfang Zeng, Daxin Jiang, Gang Yu, Xingjun Ma, Yu-Gang Jiang,
Abstract要約: アイデンティティ・一貫性・ジェネレーションは、テキスト・ツー・イメージ研究において重要な焦点となっている。マルチパーソンシナリオに適した大規模ペアデータセットを開発する。本稿では,データと多様性のバランスをとるためにペアデータを活用する,対照的なアイデンティティ損失を持つ新たなトレーニングパラダイムを提案する。
参考スコア（独自算出の注目度）: 83.55786496542062
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Identity-consistent generation has become an important focus in text-to-image research, with recent models achieving notable success in producing images aligned with a reference identity. Yet, the scarcity of large-scale paired datasets containing multiple images of the same individual forces most approaches to adopt reconstruction-based training. This reliance often leads to a failure mode we term copy-paste, where the model directly replicates the reference face rather than preserving identity across natural variations in pose, expression, or lighting. Such over-similarity undermines controllability and limits the expressive power of generation. To address these limitations, we (1) construct a large-scale paired dataset MultiID-2M, tailored for multi-person scenarios, providing diverse references for each identity; (2) introduce a benchmark that quantifies both copy-paste artifacts and the trade-off between identity fidelity and variation; and (3) propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity. These contributions culminate in WithAnyone, a diffusion-based model that effectively mitigates copy-paste while preserving high identity similarity. Extensive qualitative and quantitative experiments demonstrate that WithAnyone significantly reduces copy-paste artifacts, improves controllability over pose and expression, and maintains strong perceptual quality. User studies further validate that our method achieves high identity fidelity while enabling expressive controllable generation.
Abstract（参考訳）: アイデンティティ一貫性のある生成は、テキスト・ツー・イメージの研究において重要な焦点となり、最近のモデルでは参照アイデンティティと整合した画像の生成に顕著な成功を収めている。しかし、同じ個々の複数の画像を含む大規模なペアデータセットの不足は、ほとんどのアプローチが再構築ベースのトレーニングを採用するのに役立っている。この依存は、しばしばコピーペーストと呼ばれる失敗モードにつながり、モデルがポーズ、表現、照明の自然なバリエーションにまたがってアイデンティティを保存するのではなく、参照顔を直接複製する。このような相似性は制御性を損なうとともに、生成の表現力を制限する。これらの制約に対処するため,(1)複数の個人シナリオに合わせた大規模データセットMultiID-2Mの構築,(2)コピーペーストアーティファクトとアイデンティティの忠実度と変動のトレードオフを定量化するベンチマークの導入,(3)ペアデータを利用して多様性のバランスをとる新しいトレーニングパラダイムを提案する。これらの貢献は、高いアイデンティティの類似性を保ちながら、コピーペーストを効果的に緩和する拡散ベースのモデルであるWithAnyoneで頂点に達した。大規模な質的および定量的実験により、WithAnyoneはコピー・ペースト・アーティファクトを著しく減らし、ポーズや表現の制御性を改善し、知覚的品質を強く維持することを示した。ユーザスタディは、表現力のある制御可能な生成を可能にしながら、高い同一性を達成することをさらに検証する。

論文の概要: WithAnyone: Towards Controllable and ID Consistent Image Generation

関連論文リスト