Fugu-MT 論文翻訳(概要): CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion

論文の概要: CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion

arxiv url: http://arxiv.org/abs/2509.20775v1
Date: Thu, 25 Sep 2025 06:00:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-26 20:58:12.716152
Title: CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion
Title（参考訳）: CusEnhancer:Resinversionによる写真カスタマイズのためのゼロショットシーンと可制御性向上手法
Authors: Maoye Ren, Praneetha Vaddamanu, Jianjin Xu, Fernando De la Torre Frade,
Abstract要約: 既存のアイデンティティカスタマイズモデルを拡張する新しいフレームワークであるCustomEnhancerを紹介します。当社のパイプラインは、パーソナライズされたモデルの生成プロセスに対する、包括的なトレーニング不要な制御を可能にする。
参考スコア（独自算出の注目度）: 45.07652341517572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently remarkable progress has been made in synthesizing realistic human photos using text-to-image diffusion models. However, current approaches face degraded scenes, insufficient control, and suboptimal perceptual identity. We introduce CustomEnhancer, a novel framework to augment existing identity customization models. CustomEnhancer is a zero-shot enhancement pipeline that leverages face swapping techniques, pretrained diffusion model, to obtain additional representations in a zeroshot manner for encoding into personalized models. Through our proposed triple-flow fused PerGeneration approach, which identifies and combines two compatible counter-directional latent spaces to manipulate a pivotal space of personalized model, we unify the generation and reconstruction processes, realizing generation from three flows. Our pipeline also enables comprehensive training-free control over the generation process of personalized models, offering precise controlled personalization for them and eliminating the need for controller retraining for per-model. Besides, to address the high time complexity of null-text inversion (NTI), we introduce ResInversion, a novel inversion method that performs noise rectification via a pre-diffusion mechanism, reducing the inversion time by 129 times. Experiments demonstrate that CustomEnhancer reach SOTA results at scene diversity, identity fidelity, training-free controls, while also showing the efficiency of our ResInversion over NTI. The code will be made publicly available upon paper acceptance.
Abstract（参考訳）: 近年,テキスト・ツー・イメージ拡散モデルを用いたリアルな人間の写真合成の進歩が目覚ましい。しかし、現在のアプローチでは、劣化したシーン、制御の不十分、そして知覚の至適性に直面する。既存のアイデンティティカスタマイズモデルを拡張する新しいフレームワークであるCustomEnhancerを紹介します。 CustomEnhancerは、顔スワッピング技術、事前訓練された拡散モデルを活用するゼロショット拡張パイプラインで、パーソナライズされたモデルにエンコーディングするためのゼロショット方式で追加表現を取得する。提案手法では,2つの互換性のある逆方向の潜在空間を識別・結合してパーソナライズされたモデルの重要な空間を操作し,生成と再構築のプロセスを統一し,3つのフローから生成を実現する。我々のパイプラインはまた、パーソナライズされたモデルの生成プロセスに対する総合的なトレーニング不要な制御を可能にし、それらのパーソナライズを正確に制御し、モデルごとのコントローラの再トレーニングを不要にする。さらに、NTI(Null-text Inversion)の高次複雑さに対応するために、事前拡散機構によるノイズ修正を行う新しい逆変換法ResInversionを導入し、インバージョン時間を129倍削減する。実験では、CustomEnhancerがSOTAに到達すると、シーンの多様性、アイデンティティの忠実さ、トレーニング不要なコントロールが得られます。コードは、論文の受理時に公開される。

論文の概要: CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion

関連論文リスト