Fugu-MT 論文翻訳(概要): Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing

論文の概要: Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing

arxiv url: http://arxiv.org/abs/2504.13490v1
Date: Fri, 18 Apr 2025 05:59:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-28 18:58:03.589965
Title: Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Title（参考訳）: 指導誘導画像編集のための早期ゼロショット候補選択
Authors: Joowon Kim, Ziseok Lee, Donghyeon Cho, Sanghyun Jo, Yeonsung Jung, Kyungsu Kim, Eunho Yang,
Abstract要約: ELECT (Early-timestep Latent Evaluation for Candidate Selection) は、早期拡散時の背景ミスマッチを推定することにより、信頼性の高い種を選択するフレームワークである。バックグラウンドの不整合スコアによってシード候補をランク付けし、編集性を維持しながら、バックグラウンド一貫性に基づいて、不適切なサンプルを早期にフィルタリングする。実験の結果、ELECTは計算コスト(平均で41%削減)を削減し、バックグラウンドの一貫性と命令の順守を改善し、外部の監督や訓練なしに失敗するケースで約40%の成功率を達成した。
参考スコア（独自算出の注目度）: 32.56049667145546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite recent advances in diffusion models, achieving reliable image generation and editing remains challenging due to the inherent diversity induced by stochastic noise in the sampling process. Instruction-guided image editing with diffusion models offers user-friendly capabilities, yet editing failures, such as background distortion, frequently occur. Users often resort to trial and error, adjusting seeds or prompts to achieve satisfactory results, which is inefficient. While seed selection methods exist for Text-to-Image (T2I) generation, they depend on external verifiers, limiting applicability, and evaluating multiple seeds increases computational complexity. To address this, we first establish a multiple-seed-based image editing baseline using background consistency scores, achieving Best-of-N performance without supervision. Building on this, we introduce ELECT (Early-timestep Latent Evaluation for Candidate Selection), a zero-shot framework that selects reliable seeds by estimating background mismatches at early diffusion timesteps, identifying the seed that retains the background while modifying only the foreground. ELECT ranks seed candidates by a background inconsistency score, filtering unsuitable samples early based on background consistency while preserving editability. Beyond standalone seed selection, ELECT integrates into instruction-guided editing pipelines and extends to Multimodal Large-Language Models (MLLMs) for joint seed and prompt selection, further improving results when seed selection alone is insufficient. Experiments show that ELECT reduces computational costs (by 41 percent on average and up to 61 percent) while improving background consistency and instruction adherence, achieving around 40 percent success rates in previously failed cases - without any external supervision or training.
Abstract（参考訳）: 近年の拡散モデルの発展にもかかわらず、サンプリング過程における確率的ノイズによって引き起こされる固有の多様性のため、信頼性の高い画像生成と編集が困難である。拡散モデルを用いたインストラクション誘導画像編集は、ユーザフレンドリな機能を提供するが、バックグラウンド歪みなどのエラーの編集は頻繁に行われる。ユーザーはしばしば試行錯誤に頼り、種子やプロンプトを調整して満足な結果を得るが、これは非効率である。テキスト・トゥ・イメージ(T2I)生成のためのシード選択法は存在するが、それらは外部検証器に依存し、適用性を制限するとともに、複数のシードを評価することで計算複雑性が増大する。そこで我々はまず,背景整合性スコアを用いたマルチシード画像編集ベースラインを構築し,監督なしにベスト・オブ・Nのパフォーマンスを達成した。 ELECT(Early-timestep Latent Evaluation for Candidate Selection)は,初期拡散時間における背景ミスマッチを推定し,前景のみを修正しながら背景を保持する種子を同定し,信頼性の高い種子を選択するゼロショットフレームワークである。 ELECTは、バックグラウンド不整合スコアでシード候補をランク付けし、編集性を維持しながら、バックグラウンド一貫性に基づいて、不適切なサンプルを早期にフィルタリングする。スタンドアロンのシード選択以外にも、ELECTは命令誘導編集パイプラインに統合され、ジョイントシードとプロンプトセレクションのためのMultimodal Large-Language Models (MLLMs)に拡張され、シード選択だけでは不十分な結果が改善される。実験によると、ELECTは計算コスト(平均で41%、最大で61%)を削減し、バックグラウンドの一貫性と命令の順守を改善し、これまで失敗したケースで約40%の成功率を達成した。

論文の概要: Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing

関連論文リスト