Fugu-MT 論文翻訳(概要): AlbedoEdit: Unified Instance-Level Video Editing with Albedo Guidance

論文の概要: AlbedoEdit: Unified Instance-Level Video Editing with Albedo Guidance

arxiv url: http://arxiv.org/abs/2606.01362v1
Date: Sun, 31 May 2026 17:33:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:29.656426
Title: AlbedoEdit: Unified Instance-Level Video Editing with Albedo Guidance
Title（参考訳）: AlbedoEdit:Albedo Guidanceを使った統合インスタンスレベルビデオ編集
Authors: Xilong Zhou, Bao-Huy Nguyen, Zheng Zeng, Jacob Munkberg, Jon Hasselgren, Thomas Leimkühler, Nima Kalantari, Miloš Hašan, Christian Theobalt,
Abstract要約: ビデオ生成モデルは、ビデオシーケンスの合成において顕著な進歩を遂げた。オブジェクト挿入、オブジェクト除去、テクスチャ編集など、細かいインスタンスレベルのビデオ編集は、目覚ましいが難しい問題として現れている。本稿では,オブジェクト挿入,オブジェクト削除,テクスチャ編集を共同でサポートする統合ビデオ編集フレームワークであるAlbedoEditを提案する。
参考スコア（独自算出の注目度）: 42.3107762497025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video generative models have achieved remarkable progress in synthesizing photorealistic video sequences. However, enabling broader and more creative downstream applications requires fine-grained instance-level video editing, including object insertion, object removal, and texture editing, which has emerged as a prominent yet challenging problem. Existing approaches either propose unified generative frameworks with only coarse semantic control, or design task-specific frameworks for individual editing tasks, limiting their flexibility and applicability across diverse real-world scenarios. To address these limitations, we propose AlbedoEdit, a unified generative video editing framework that jointly supports object insertion, object removal, and texture editing. Our key insight is that the intrinsic albedo map, which is invariant to lighting and contains no specularity, shadowing and inter-reflection effects, provides an effective and user-friendly mechanism for specifying fine-grained appearance edits. Built upon video foundation models, AlbedoEdit is fine-tuned to translate source RGB videos into edited RGB videos, conditioned on a user-edited first-frame albedo. Trained on a new paired synthetic dataset covering all three editing tasks, AlbedoEdit implicitly learns to harmonize edited contents and simulate complex real-world visual effects triggered by editing operations, including specular highlights, soft shadows, and mirror reflections. AlbedoEdit demonstrates superior performance over state-of-the-art video editing approaches, both qualitatively and quantitatively. Project webpage is https://vcai.mpi-inf.mpg.de/projects/AlbedoEdit/.
Abstract（参考訳）: ビデオ生成モデルは、フォトリアリスティックなビデオシーケンスの合成において顕著な進歩を遂げた。しかし、より広範でクリエイティブなダウンストリームアプリケーションを実現するには、オブジェクト挿入、オブジェクト削除、テクスチャ編集など、細かいインスタンスレベルのビデオ編集が必要である。既存のアプローチでは、粗いセマンティックコントロールのみを備えた統合生成フレームワークの提案や、個々の編集タスクのためのタスク固有のフレームワークの設計が提案されている。これらの制約に対処するために、オブジェクト挿入、オブジェクト削除、テクスチャ編集を共同でサポートする統合生成ビデオ編集フレームワークであるAlbedoEditを提案する。我々の重要な洞察は、光に不変で、特異性、影、反射効果を持たない本質アルベドマップは、きめ細かな外観編集を指定するための効果的でユーザフレンドリなメカニズムを提供するということである。 AlbedoEditはビデオファンデーションモデルに基づいて構築されており、ソースのRGBビデオを編集されたRGBビデオに変換するように微調整されている。 AlbedoEditは、3つの編集タスクすべてをカバーする新しいペア合成データセットに基づいて、暗黙的に編集されたコンテンツを調和させ、スペキュラハイライト、ソフトシャドウ、ミラーリフレクションなどの編集操作によって引き起こされる複雑な現実世界の視覚効果をシミュレートする。 AlbedoEditは、定性的かつ定量的に、最先端のビデオ編集アプローチよりも優れたパフォーマンスを示す。プロジェクトのWebページはhttps://vcai.mpi-inf.mpg.de/projects/AlbedoEdit/。

論文の概要: AlbedoEdit: Unified Instance-Level Video Editing with Albedo Guidance

関連論文リスト