Fugu-MT 論文翻訳(概要): Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization

論文の概要: Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization

arxiv url: http://arxiv.org/abs/2606.02129v1
Date: Mon, 01 Jun 2026 11:57:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.974787
Title: Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization
Title（参考訳）: 平衡拡散: 平衡画像カスタマイズのための周波数対応テクスチャ埋め込み
Authors: Liyuan Ma, Xueji Fang, Guo-Jun Qi,
Abstract要約: 画像カスタマイズは、基準概念画像から対象対象を学習し、テキストプロンプト毎に条件付き画像を生成する。一般的な方法は、様々な概念属性を統一された潜伏埋め込みにまとめるために微調整を採用する。本稿では、バランスの取れたカスタマイズと一貫したテキスト-視覚的マッチングのために、絡み合った概念機能を切り離す周波数駆動型手法である平衡拡散を提案する。
参考スコア（独自算出の注目度）: 31.67012394425792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image customization learns target subjects from reference concept images and generates conditioned images per text prompts, mainly modifying styles or backgrounds. Prevailing methods adopt fine-tuning to pack diverse concept attributes into a unified latent embedding, yet entangled attributes hinder elimination of irrelevant disturbances from style and background. To address this issue, we propose Equilibrated Diffusion, a frequency-driven approach that disentangles tangled concept features for balanced customization and consistent text-visual matching. Unlike conventional methods learning full concepts with shared embeddings and unified tuning, our work utilizes the inherent link between image frequency components and semantics: low frequencies represent subject content and high frequencies correspond to styles. We decompose concepts in frequency space and optimize each embedding independently. This separate optimization enables the denoiser to capture style detached from subject identity and generalize better to unseen stylistic prompts. Merging multi-frequency embeddings preserves the model's original spatial customization ability. We further deploy mask-guided diffusion to restrict irrelevant background changes and boost text alignment. Residual Reference Attention (RRA) is inserted into spatial attention to retain subject structure and identity consistency. Experiments prove Equilibrated Diffusion exceeds mainstream baselines on subject fidelity and text adherence, verifying our method's superiority.
Abstract（参考訳）: 画像のカスタマイズは、参照概念イメージから対象を学習し、主にスタイルや背景を変更するテキストプロンプト毎に条件付き画像を生成する。一般的な方法は、様々な概念属性を統一された潜伏埋め込みにまとめるために微調整を採用するが、絡み合った属性は、スタイルや背景から無関係な障害を取り除くのを妨げている。この問題に対処するために、バランスの取れたカスタマイズと一貫したテキスト・ビジュアルマッチングのために、絡み合った概念機能を歪ませる周波数駆動型アプローチであるEquilibrated Diffusionを提案する。共有埋め込みと統合チューニングによる完全概念学習とは異なり、我々の研究は、画像の周波数成分と意味論(低頻度は主観的内容を表し、高頻度はスタイルに対応する)の固有のリンクを利用する。周波数空間の概念を分解し、各埋め込みを独立に最適化する。この分離された最適化により、デノイザは主題のアイデンティティから切り離されたスタイルをキャプチャし、目に見えないスタイルのプロンプトをより一般化することができる。多周波埋め込みの融合は、モデルのもともとの空間的カスタマイズ能力を保っている。さらに,無関係な背景変化を制限し,テキストアライメントを高めるためにマスク誘導拡散を展開させる。 Residual Reference Attention (RRA) は、被写体の構造とアイデンティティの整合性を維持するために空間的注意に挿入される。 Equilibrated Diffusion は主観的忠実度とテキストの忠実度に基づき,本手法の優位性を検証した。

論文の概要: Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization

関連論文リスト