Fugu-MT 論文翻訳(概要): Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing

論文の概要: Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing

arxiv url: http://arxiv.org/abs/2510.08532v1
Date: Thu, 09 Oct 2025 17:51:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:15.280181
Title: Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
Title（参考訳）: Kontinuous Kontext: インストラクションベース画像編集のための連続強度制御
Authors: Rishubh Parihar, Or Patashnik, Daniil Ostashev, R. Venkatesh Babu, Daniel Cohen-Or, Kuan-Chieh Wang,
Abstract要約: Kontinuous Kontext は命令駆動の編集モデルであり、編集強度を制御できる新しい次元を提供する。軽量プロジェクタネットワークは、入力スカラーと編集命令をモデルの変調空間の係数にマッピングする。本モデルのトレーニングには,既存の生成モデルを用いて,画像編集・指導・強化四重項の多種多様なデータセットを合成する。
参考スコア（独自算出の注目度）: 76.44219733285898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.
Abstract（参考訳）: インストラクションベースの画像編集は、自然言語で画像を操作するための強力で直感的な方法を提供する。しかし、テキスト命令のみに依存すると、編集範囲の細かい制御が制限される。コンティンラス・コンテクス(Continuous Kontext)は、編集強度の新たな次元を提供する命令駆動編集モデルであり、ユーザーは、変化のない結果から完全に実現された結果へ、滑らかで連続的な方法で、徐々に編集を調整できる。 Kontinuous Kontextは、最新の画像編集モデルを拡張して、追加の入力、スカラー編集強度を編集命令とペアにすることで、編集範囲の明示的な制御を可能にする。このスカラー情報を注入するために、入力スカラーと編集命令をモデルの変調空間の係数にマッピングする軽量プロジェクタネットワークを訓練する。本モデルのトレーニングには,既存の生成モデルを用いて,画像編集指導力の四重項の多種多様なデータセットを合成し,次いで,品質と整合性を確保するためのフィルタリングステージを設ける。 Kontinuous Kontextは、スタイリゼーション、属性、素材、背景、形状の変更など、微妙な操作から強い操作まで、属性固有のトレーニングを必要とせずに、命令駆動編集の編集強度を細かく制御するための統一的なアプローチを提供する。

関連論文リスト

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder [52.754326452329956]
本稿では,テキスト埋め込みのトークンレベルの操作を通じて,アンタングルと連続的な編集を行う手法を提案する。編集は、対象属性の強度を制御する、慎重に選択された方向に沿って埋め込みを操作することで行われる。本手法は,拡散過程を変更せずにテキスト埋め込みを直接操作し,画像のバックボーンに広く適用可能な,非依存なモデルとする。
論文参考訳（メタデータ） (2025-10-06T17:51:04Z)
Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent [38.61468007698179]
そこで我々は,DescriptiveEditという記述型プロンプトベースの編集フレームワークを提案する。中心となる考え方は「参照画像に基づくテキスト・ツー・イメージ生成」としての命令ベースの画像編集を再構築することである。
論文参考訳（メタデータ） (2025-08-28T07:45:08Z)
InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow [19.972879378697215]
本稿では,RectifiedFlowフレームワークに基づくInstantEditと呼ばれる高速テキスト誘導画像編集手法を提案する。提案手法は,PerRFIと呼ばれる特殊反転戦略を導入することにより,RectifiedFlowのストレートサンプリングトラジェクトリを利用する。また、インバージョン中に得られた潜伏情報を効果的に再利用し、よりコヒーレントで詳細な再生を容易にする新しい再生法Inversion Latent Injectionを提案する。
論文参考訳（メタデータ） (2025-08-08T05:38:17Z)
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
我々は、総合的なマルチモーダル命令編集データセットであるAnyEditを提示する。我々は,AnyEditコレクションの多様性と品質を,初期データ多様性,適応編集プロセス,自動編集結果の選択という3つの側面を通じて保証する。 3つのベンチマークデータセットの実験によると、AnyEditは拡散ベースの編集モデルのパフォーマンスを一貫して向上させる。
論文参考訳（メタデータ） (2024-11-24T07:02:56Z)
ControlEdit: A MultiModal Local Clothing Image Editing Method [3.6604114810930946]
マルチモーダル・衣料品画像編集(マルチモーダル・衣料品画像編集、英: Multimodal clothing image editing)とは、テキスト記述や視覚画像を制御条件として用いた衣服画像の精密な調整と修正をいう。衣料品画像のマルチモーダルな局所的塗り絵に衣料品画像の編集を転送する新しい画像編集方法である制御編集を提案する。
論文参考訳（メタデータ） (2024-09-23T05:34:59Z)
Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
本稿では,複数の編集命令型に対応するために,推論時編集の最適化を提案する。各損失関数の影響を調整することで、ユーザの好みに合わせてフレキシブルな編集ソリューションを構築することができる。本手法は,テキスト,ポーズ,スクリブルといった編集条件を用いて評価し,複雑な編集を行う能力を強調した。
論文参考訳（メタデータ） (2023-11-28T15:31:11Z)
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITSはリアルタイム編集のための軽量なアプローチであり、Edit Friendly DDPMインバージョン技術とSemantic Guidanceを統合している。このアプローチは、微妙で広範囲な編集や構成やスタイルの変更といった多彩な編集を実現すると同時に、アーキテクチャの最適化や拡張も必要としない。
論文参考訳（メタデータ） (2023-07-02T09:11:09Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。