Fugu-MT 論文翻訳(概要): DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

論文の概要: DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

arxiv url: http://arxiv.org/abs/2310.10624v1
Date: Mon, 16 Oct 2023 17:48:10 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-17 12:28:03.265809
Title: DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Title（参考訳）: DynVideo-E: 大規模モーションとビューチェンジ人間中心映像編集のための高調波動的NeRF
Authors: Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou
Abstract要約: 我々は,映像編集問題を3次元空間編集作業に容易に適用するために,ダイナミックニューラルネットワーク場(NeRF)を人間中心のビデオ表現として導入する。我々の手法はDynVideo-Eと呼ばれ、2つの挑戦的データセットに対するSOTAのアプローチを人間の好みで50%の差で大幅に上回っている。
参考スコア（独自算出の注目度）: 48.086102360155856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite remarkable research advances in diffusion-based video editing, existing methods are limited to short-length videos due to the contradiction between long-range consistency and frame-wise editing. Recent approaches attempt to tackle this challenge by introducing video-2D representations to degrade video editing to image editing. However, they encounter significant difficulties in handling large-scale motion- and view-change videos especially for human-centric videos. This motivates us to introduce the dynamic Neural Radiance Fields (NeRF) as the human-centric video representation to ease the video editing problem to a 3D space editing task. As such, editing can be performed in the 3D spaces and propagated to the entire video via the deformation field. To provide finer and direct controllable editing, we propose the image-based 3D space editing pipeline with a set of effective designs. These include multi-view multi-pose Score Distillation Sampling (SDS) from both 2D personalized diffusion priors and 3D diffusion priors, reconstruction losses on the reference image, text-guided local parts super-resolution, and style transfer for 3D background space. Extensive experiments demonstrate that our method, dubbed as DynVideo-E, significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% ~ 95% in terms of human preference. Compelling video comparisons are provided in the project page https://showlab.github.io/DynVideo-E/. Our code and data will be released to the community.
Abstract（参考訳）: 拡散に基づくビデオ編集の顕著な進歩にもかかわらず、既存の手法は長距離一貫性とフレームワイズ編集の矛盾のため、短いビデオに限られている。近年,映像編集にビデオ2D表現を導入する手法が提案されている。しかし、特に人間中心のビデオでは、大規模なモーションビデオやビューチェンジビデオの処理が著しく困難である。これにより,映像編集問題を3次元空間編集作業に容易化するため,人間中心の映像表現として動的ニューラルラジアンス場(NeRF)を導入することができる。これにより、3D空間で編集を行い、変形場を介して全映像に伝搬することができる。より微細で直接制御可能な編集を実現するために,画像に基づく3次元空間編集パイプラインを提案する。マルチビュー多目的スコア蒸留サンプリング(SDS)は2次元個別拡散先行と3次元拡散先行の両方からのものであり、参照画像の再構成損失、テキスト誘導ローカル部分の超解像度化、および3次元背景空間のスタイル転送である。大規模な実験により,我々の手法はDynVideo-Eと呼ばれ,人間の嗜好において50%～95%の差でSOTAアプローチを2つの挑戦的データセットで大幅に上回っていることがわかった。コンパイルされたビデオの比較はプロジェクトページ https://showlab.github.io/DynVideo-E/ で提供されている。私たちのコードとデータはコミュニティにリリースされます。

論文の概要: DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

関連論文リスト