Fugu-MT 論文翻訳(概要): MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

論文の概要: MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

arxiv url: http://arxiv.org/abs/2509.01977v1
Date: Tue, 02 Sep 2025 05:40:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.91802
Title: MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
Title（参考訳）: MOSAIC:対応性を考慮した多目的パーソナライズ・ジェネレーション
Authors: Dong She, Siming Fu, Mushui Liu, Qiaoqiao Jin, Hualiang Wang, Mu Liu, Jidong Jiang,
Abstract要約: マルチオブジェクト生成を再考する表現中心のフレームワークであるMOSAICを提案する。我々の重要な洞察は、マルチオブジェクト生成は表現レベルで正確にセマンティックアライメントを必要とすることである。本稿では,意味対応型アライメントの精度を高めるために,意味対応型アライメントアライメントの損失を提案する。
参考スコア（独自算出の注目度）: 13.100620283631557
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-subject personalized generation presents unique challenges in maintaining identity fidelity and semantic coherence when synthesizing images conditioned on multiple reference subjects. Existing methods often suffer from identity blending and attribute leakage due to inadequate modeling of how different subjects should interact within shared representation spaces. We present MOSAIC, a representation-centric framework that rethinks multi-subject generation through explicit semantic correspondence and orthogonal feature disentanglement. Our key insight is that multi-subject generation requires precise semantic alignment at the representation level - knowing exactly which regions in the generated image should attend to which parts of each reference. To enable this, we introduce SemAlign-MS, a meticulously annotated dataset providing fine-grained semantic correspondences between multiple reference subjects and target images, previously unavailable in this domain. Building on this foundation, we propose the semantic correspondence attention loss to enforce precise point-to-point semantic alignment, ensuring high consistency from each reference to its designated regions. Furthermore, we develop the multi-reference disentanglement loss to push different subjects into orthogonal attention subspaces, preventing feature interference while preserving individual identity characteristics. Extensive experiments demonstrate that MOSAIC achieves state-of-the-art performance on multiple benchmarks. Notably, while existing methods typically degrade beyond 3 subjects, MOSAIC maintains high fidelity with 4+ reference subjects, opening new possibilities for complex multi-subject synthesis applications.
Abstract（参考訳）: 多目的パーソナライズドジェネレーションは、複数の参照対象に条件付けされた画像の合成において、アイデンティティの忠実さとセマンティックコヒーレンスを維持する上で、ユニークな課題を示す。既存の手法は、共有表現空間内で異なる対象がどのように相互作用すべきかのモデリングが不十分なため、アイデンティティブレンディングや属性リークに悩まされることが多い。我々は,表現中心のフレームワークであるMOSAICについて,明示的な意味的対応と直交的特徴の絡み合いを通じて,多目的生成を再考する。私たちのキーとなる洞察は、マルチオブジェクト生成は表現レベルで正確にセマンティックアライメントを必要とするということです。これを実現するために,複数の参照対象と対象画像との微粒なセマンティック対応を提供するセマンティック・アノテート・データセットであるSemAlign-MSを導入する。本研究は,各参照から指定された領域への高整合性を確保するために,正確なポイント・ツー・ポイント・セマンティックアライメントを強制する意味対応アテンションアテンションアテンションロスを提案する。さらに,異なる対象を直交的注意部分空間にプッシュし,個々のアイデンティティ特性を保ちながら特徴的干渉を防止できるマルチ参照不整合損失を開発した。大規模な実験により、MOSAICは複数のベンチマークで最先端のパフォーマンスを達成することが示された。特に、既存の手法は一般に3つ以上の被写体を分解するが、MOSAICは4つ以上の参照対象を持つ高い忠実度を維持し、複雑な多目的合成の新たな可能性を開く。

論文の概要: MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

関連論文リスト