Fugu-MT 論文翻訳(概要): UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

論文の概要: UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

arxiv url: http://arxiv.org/abs/2510.01669v2
Date: Fri, 03 Oct 2025 04:06:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 12:05:48.07451
Title: UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction
Title（参考訳）: UniVerse:ロバスト放射場再構成のためのビデオ拡散モデルより先にシーンを開放する
Authors: Jin Cao, Hongrui Wu, Ziyong Feng, Hujun Bao, Xiaowei Zhou, Sida Peng,
Abstract要約: ビデオ拡散モデルに基づくロバストな再構築のための統一フレームワークUniVerseを紹介する。具体的には、UniVerseはまず、一貫性のない画像を最初のビデオに変換し、その後、特別に設計されたビデオ拡散モデルを使って、それらを一貫したイメージに復元する。合成と実世界の両方のデータセットを用いた実験は,頑健な再構築において,我々の手法の強い一般化能力と優れた性能を示す。
参考スコア（独自算出の注目度）: 73.29048162438797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper tackles the challenge of robust reconstruction, i.e., the task of reconstructing a 3D scene from a set of inconsistent multi-view images. Some recent works have attempted to simultaneously remove image inconsistencies and perform reconstruction by integrating image degradation modeling into neural 3D scene representations. However, these methods rely heavily on dense observations for robustly optimizing model parameters. To address this issue, we propose to decouple robust reconstruction into two subtasks: restoration and reconstruction, which naturally simplifies the optimization process. To this end, we introduce UniVerse, a unified framework for robust reconstruction based on a video diffusion model. Specifically, UniVerse first converts inconsistent images into initial videos, then uses a specially designed video diffusion model to restore them into consistent images, and finally reconstructs the 3D scenes from these restored images. Compared with case-by-case per-view degradation modeling, the diffusion model learns a general scene prior from large-scale data, making it applicable to diverse image inconsistencies. Extensive experiments on both synthetic and real-world datasets demonstrate the strong generalization capability and superior performance of our method in robust reconstruction. Moreover, UniVerse can control the style of the reconstructed 3D scene. Project page: https://jin-cao-tma.github.io/UniVerse.github.io/
Abstract（参考訳）: 本稿では,不整合な多視点画像から3Dシーンを再構成する作業という,頑健な再構築の課題に対処する。いくつかの最近の研究は、画像劣化モデリングをニューラルネットワークによる3Dシーン表現に統合することにより、画像の不整合を同時に除去し、再構成を試みている。しかし、これらの手法はモデルパラメータを頑健に最適化するために密度の高い観測に大きく依存している。この問題に対処するために、我々は頑健な再構築を2つのサブタスクに分離することを提案する。この目的のために,ビデオ拡散モデルに基づく堅牢な再構築のための統合フレームワークUniVerseを紹介する。具体的には、UniVerseはまず、一貫性のない画像を最初のビデオに変換し、その後、特別に設計されたビデオ拡散モデルを使って、それらを一貫した画像に復元し、最終的に復元された画像から3Dシーンを再構築する。ケース・バイ・ケース・バイ・ビューの劣化モデルと比較すると,拡散モデルは大規模データから一般的な場面を学習し,多様な画像の不整合に適応する。合成データセットと実世界のデータセットの両方に対する大規模な実験は、頑健な再構築において、我々の手法の強力な一般化能力と優れた性能を示す。さらに、UniVerseは再構成された3Dシーンのスタイルを制御することができる。プロジェクトページ:https://jin-cao-tma.github.io/UniVerse.github.io/

論文の概要: UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

関連論文リスト