Fugu-MT 論文翻訳(概要): Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation

論文の概要: Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation

arxiv url: http://arxiv.org/abs/2510.13084v1
Date: Wed, 15 Oct 2025 01:55:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-16 20:13:28.471217
Title: Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation
Title（参考訳）: 編集-Your-Interest:Feature Most-Similar Propagationによる効率的なビデオ編集
Authors: Yi Zuo, Zitao Wang, Lingling Li, Xu Liu, Fang Liu, Licheng Jiao,
Abstract要約: Edit-Your-Interestはテキスト駆動のゼロショットビデオ編集手法である。フルシーケンス・テンポラル・モデリング手法に比べて計算オーバーヘッドを低減させる。効率性と視覚的忠実性の両方において最先端の手法よりも優れています。
参考スコア（独自算出の注目度）: 53.05471174430247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) diffusion models have recently demonstrated significant progress in video editing. However, existing video editing methods are severely limited by their high computational overhead and memory consumption. Furthermore, these approaches often sacrifice visual fidelity, leading to undesirable temporal inconsistencies and artifacts such as blurring and pronounced mosaic-like patterns. We propose Edit-Your-Interest, a lightweight, text-driven, zero-shot video editing method. Edit-Your-Interest introduces a spatio-temporal feature memory to cache features from previous frames, significantly reducing computational overhead compared to full-sequence spatio-temporal modeling approaches. Specifically, we first introduce a Spatio-Temporal Feature Memory bank (SFM), which is designed to efficiently cache and retain the crucial image tokens processed by spatial attention. Second, we propose the Feature Most-Similar Propagation (FMP) method. FMP propagates the most relevant tokens from previous frames to subsequent ones, preserving temporal consistency. Finally, we introduce an SFM update algorithm that continuously refreshes the cached features, ensuring their long-term relevance and effectiveness throughout the video sequence. Furthermore, we leverage cross-attention maps to automatically extract masks for the instances of interest. These masks are seamlessly integrated into the diffusion denoising process, enabling fine-grained control over target objects and allowing Edit-Your-Interest to perform highly accurate edits while robustly preserving the background integrity. Extensive experiments decisively demonstrate that the proposed Edit-Your-Interest outperforms state-of-the-art methods in both efficiency and visual fidelity, validating its superior effectiveness and practicality.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)拡散モデルは近年,映像編集において顕著な進歩を見せている。しかし、既存のビデオ編集手法は、高い計算オーバーヘッドとメモリ消費によって著しく制限されている。さらに、これらのアプローチはしばしば視覚的忠実さを犠牲にして、望ましくない時間的矛盾や、ぼやけやモザイク的なパターンのような人工物を生み出す。本稿では,軽量でテキスト駆動型ゼロショットビデオ編集手法であるEdit-Your-Interestを提案する。 Edit-Your-Interestは、以前のフレームから機能をキャッシュするための時空間メモリを導入し、フルシーケンスの時空間モデリングアプローチに比べて計算オーバーヘッドを大幅に削減する。具体的には,空間的注意によって処理される重要な画像トークンを効率よくキャッシュし,保持するための,時空間特徴記憶バンク(SFM)を導入する。次に,FMP(Feature Most-Similar Propagation)法を提案する。 FMPは、過去のフレームからその後のトークンへ最も関連性の高いトークンを伝播し、時間的一貫性を保つ。最後に、キャッシュされた機能を継続的に更新し、ビデオシーケンスを通してその長期的関連性と有効性を保証するSFM更新アルゴリズムを導入する。さらに、関心のある場合のマスクを自動的に抽出するために、クロスアテンションマップを活用する。これらのマスクは拡散復調プロセスにシームレスに統合され、ターゲットオブジェクトのきめ細かい制御が可能となり、Edit-Your-Interestは背景の完全性をしっかりと保ちながら、高度に正確な編集を行うことができる。広範囲にわたる実験により,提案手法は効率と視覚的忠実性の両方において最先端の手法より優れており,その優れた効果と実用性が確認されている。

論文の概要: Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation

関連論文リスト