Fugu-MT 論文翻訳(概要): ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

論文の概要: ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

arxiv url: http://arxiv.org/abs/2603.09819v1
Date: Tue, 10 Mar 2026 15:44:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.434174
Title: ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation
Title（参考訳）: ConfCtrl:信頼を意識した補間によるビデオ拡散における精密カメラ制御の実現
Authors: Liudi Yang, George Eskandar, Fengyi Shen, Mohammad Altillawi, Yang Bai, Chi Zhang, Ziyuan Liu, Abhinav Valada,
Abstract要約: ConfCtrlは、カメラ誘導拡散モデルが未確認領域を完了しながら所定のポーズに従うことを可能にする、自信に敏感なビデオフレームワークである。実験により、ConfCtrlは幾何学的に整合性があり、視覚的に可視性のある新しいビューを生成し、大きな視点変化の下で隠蔽領域を効果的に再構築することを示した。
参考スコア（独自算出の注目度）: 24.89894187462497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate from intended trajectories due to noisy point cloud projections or insufficient conditioning from camera poses. To address these issues, we propose ConfCtrl, a confidence-aware video interpolation framework that enables diffusion models to follow prescribed camera poses while completing unseen regions. ConfCtrl initializes the diffusion process by combining a confidence-weighted projected point cloud latent with noise as the conditioning input. It then applies a Kalman-inspired predict-update mechanism, treating the projected point cloud as a noisy measurement and using learned residual corrections to balance pose-driven predictions with noisy geometric observations. This allows the model to rely on reliable projections while down-weighting uncertain regions, yielding stable, geometry-aware generation. Experiments on multiple datasets show that ConfCtrl produces geometrically consistent and visually plausible novel views, effectively reconstructing occluded regions under large viewpoint changes.
Abstract（参考訳）: 大局的な視点変化下では2つの入力画像のみから新規なビュー合成の課題に対処する。既存の回帰ベースの手法では、見えない領域を再構築する能力が欠けているが、カメラ誘導拡散モデルは、ノイズの多い点雲の投影やカメラのポーズからの条件付けが不十分なため、意図した軌道から逸脱することが多い。これらの問題に対処するため, ConfCtrlを提案する。これは信頼を意識したビデオ補間フレームワークで, 拡散モデルが未確認領域を完了しながら所定のカメラポーズに従うことができる。 ConfCtrlは、信頼重み付き投影された点雲を条件入力としてノイズと組み合わせて拡散過程を初期化する。次に、カルマンにインスパイアされた予測更新機構を適用し、投影された点雲をノイズ測定として扱い、学習された残差補正を用いてノイズ幾何学的な観測でポーズ駆動予測のバランスをとる。これにより、モデルは信頼性のある射影に依存し、不確実な領域を低くし、安定した幾何認識の生成をもたらす。複数のデータセットの実験により、ConfCtrlは幾何的に一貫性があり、視覚的に可視な新しいビューを生成し、大きな視点変化の下で隠蔽領域を効果的に再構築することを示した。

論文の概要: ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

関連論文リスト