Fugu-MT 論文翻訳(概要): EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

論文の概要: EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

arxiv url: http://arxiv.org/abs/2603.19224v1
Date: Thu, 19 Mar 2026 17:59:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.331094
Title: EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
Title（参考訳）: EffectErase: 高画質エフェクト消去のための共同ビデオオブジェクト除去と挿入
Authors: Yang Fu, Yike Zheng, Ziyun Dai, Henghui Ding,
Abstract要約: ビデオオブジェクトの除去は、動的対象オブジェクトとその変形、影、反射などの視覚的効果をなくし、シームレスな背景を復元することを目的としている。近年の拡散型ビデオ塗装法や物体除去法は、物体を除去するが、これらの効果を消し去ってコヒーレントな背景を合成するのに苦労することが多い。多様なペアビデオを提供する大規模データセットであるVOR(Video Object removal)を紹介する。本稿では,ビデオオブジェクト挿入を相互学習方式における逆補助タスクとして扱う効果を考慮したビデオオブジェクト削除手法であるEffectEraseを提案する。
参考スコア（独自算出の注目度）: 50.43992550991499
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered by the lack of a comprehensive dataset that systematically captures common object effects across varied environments for training and evaluation. To address this, we introduce VOR (Video Object Removal), a large-scale dataset that provides diverse paired videos, each consisting of one video where the target object is present with its effects and a counterpart where the object and effects are absent, with corresponding object masks. VOR contains 60K high-quality video pairs from captured and synthetic sources, covers five effects types, and spans a wide range of object categories as well as complex, dynamic multi-object scenes. Building on VOR, we propose EffectErase, an effect-aware video object removal method that treats video object insertion as the inverse auxiliary task within a reciprocal learning scheme. The model includes task-aware region guidance that focuses learning on affected areas and enables flexible task switching. Then, an insertion-removal consistency objective that encourages complementary behaviors and shared localization of effect regions and structural cues. Trained on VOR, EffectErase achieves superior performance in extensive experiments, delivering high-quality video object effect erasing across diverse scenarios.
Abstract（参考訳）: ビデオオブジェクトの除去は、動的対象オブジェクトとその変形、影、反射などの視覚的効果をなくし、シームレスな背景を復元することを目的としている。近年の拡散型ビデオ塗装法や物体除去法は、物体を除去するが、これらの効果を消し去ってコヒーレントな背景を合成するのに苦労することが多い。メソッドの制限を超えて、トレーニングと評価のためにさまざまな環境にまたがる共通オブジェクト効果を体系的にキャプチャする包括的なデータセットの欠如により、進歩はさらに妨げられている。これを解決するために,VOR (Video Object removal) という,多種多様なペアビデオを提供する大規模データセットを導入し,対象物がその効果を示す1つのビデオと,対象物と効果が欠落しているビデオと,対応するオブジェクトマスクとを組み合わせた。 VORには、キャプチャーと合成ソースから60Kの高品質のビデオペアが含まれ、5つのエフェクトタイプをカバーし、広範囲のオブジェクトカテゴリと、複雑でダイナミックなマルチオブジェクトシーンにまたがる。 VORをベースとして,ビデオオブジェクト挿入を相互学習方式における逆補助タスクとして扱うエフェクト認識型ビデオオブジェクト除去手法であるEffectEraseを提案する。このモデルには、影響のある領域での学習に焦点を当て、柔軟なタスク切り替えを可能にするタスク対応領域ガイダンスが含まれている。そして、補完行動を促進し、効果領域と構造的手がかりの共有局在化を促進する挿入除去整合性目標について検討した。 VORに基づいてトレーニングされたEffectEraseは、幅広い実験において優れたパフォーマンスを実現し、様々なシナリオで高品質なビデオオブジェクトエフェクトを消去する。

論文の概要: EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

関連論文リスト