Fugu-MT 論文翻訳(概要): The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

論文の概要: The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

arxiv url: http://arxiv.org/abs/2603.17998v1
Date: Wed, 18 Mar 2026 17:57:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.872788
Title: The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering
Title（参考訳）: 連続画像ステアリングにおけるテキスト埋め込み補間の有効性
Authors: Yigit Ekin, Yossi Gandelsman,
Abstract要約: テキスト条件生成モデルに対するテスト時に連続的かつ制御可能な画像編集のためのトレーニング不要なフレームワークを提案する。テキスト埋め込み空間における単純なステアリングは、スムーズな編集制御を実現するのに十分である。私たちのアプローチは、トレーニングベースの代替手段に匹敵するものであり、他のトレーニングフリーメソッドよりも優れています。
参考スコア（独自算出の注目度）: 18.29130390175963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a training-free framework for continuous and controllable image editing at test time for text-conditioned generative models. In contrast to prior approaches that rely on additional training or manual user intervention, we find that a simple steering in the text-embedding space is sufficient to produce smooth edit control. Given a target concept (e.g., enhancing photorealism or changing facial expression), we use a large language model to automatically construct a small set of debiased contrastive prompt pairs, from which we compute a steering vector in the generator's text-encoder space. We then add this vector directly to the input prompt representation to control generation along the desired semantic axis. To obtain a continuous control, we propose an elastic range search procedure that automatically identifies an effective interval of steering magnitudes, avoiding both under-steering (no-edit) and over-steering (changing other attributes). Adding the scaled versions of the same vector within this interval yields smooth and continuous edits. Since our method modifies only textual representations, it naturally generalizes across text-conditioned modalities, including image and video generation. To quantify the steering continuity, we introduce a new evaluation metric that measures the uniformity of semantic change across edit strengths. We compare the continuous editing behavior across methods and find that, despite its simplicity and lightweight design, our approach is comparable to training-based alternatives, outperforming other training-free methods.
Abstract（参考訳）: テキスト条件生成モデルに対するテスト時に連続的かつ制御可能な画像編集のためのトレーニング不要なフレームワークを提案する。追加のトレーニングや手作業による介入に依存する従来のアプローチとは対照的に,テキスト埋め込み空間における単純なステアリングは,スムーズな編集制御を実現するのに十分である。対象概念(例えば、フォトリアリズムの強化や表情の変化など)が与えられた場合、我々は大きな言語モデルを用いて、デバイアス付きコントラスト付きプロンプトペアの小さなセットを自動構築し、ジェネレータのテキストエンコーダ空間におけるステアリングベクトルを計算する。次に、このベクトルを入力プロンプト表現に直接加算し、所望のセマンティック軸に沿って生成を制御する。本研究では, 操舵の有効間隔を自動的に識別し, 操舵(非操作)とオーバーステアリング(他属性の変更)の両方を回避する弾性範囲探索手法を提案する。この間隔内に同じベクトルのスケールバージョンを追加すると、スムーズで連続的な編集が得られる。本手法はテキスト表現のみを修飾するため,画像生成や映像生成など,テキスト条件付きモーダルを自然に一般化する。ステアリング継続性を定量化するために,編集強度間の意味変化の均一性を測定する新しい評価指標を導入する。メソッド間の継続的な編集動作を比較し、そのシンプルさと軽量さにもかかわらず、我々のアプローチはトレーニングベースの代替手段に匹敵するものであり、他のトレーニング不要なメソッドよりも優れています。

論文の概要: The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

関連論文リスト