Dynamic View Synthesis as an Inverse Problem
- URL: http://arxiv.org/abs/2506.08004v1
- Date: Mon, 09 Jun 2025 17:59:47 GMT
- Title: Dynamic View Synthesis as an Inverse Problem
- Authors: Hidir Yesiltepe, Pinar Yanardag,
- Abstract summary: We address dynamic view synthesis from monocular videos as an inverse problem in a training-free setting.<n>We introduce a novel noise representation, termed K-order Recursive Noise Representation.<n>To synthesize newly visible regions from camera motion, we introduce Latent Modulation.
- Score: 3.7599363231894185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we address dynamic view synthesis from monocular videos as an inverse problem in a training-free setting. By redesigning the noise initialization phase of a pre-trained video diffusion model, we enable high-fidelity dynamic view synthesis without any weight updates or auxiliary modules. We begin by identifying a fundamental obstacle to deterministic inversion arising from zero-terminal signal-to-noise ratio (SNR) schedules and resolve it by introducing a novel noise representation, termed K-order Recursive Noise Representation. We derive a closed form expression for this representation, enabling precise and efficient alignment between the VAE-encoded and the DDIM inverted latents. To synthesize newly visible regions resulting from camera motion, we introduce Stochastic Latent Modulation, which performs visibility aware sampling over the latent space to complete occluded regions. Comprehensive experiments demonstrate that dynamic view synthesis can be effectively performed through structured latent manipulation in the noise initialization phase.
Related papers
- DynaSplat: Dynamic-Static Gaussian Splatting with Hierarchical Motion Decomposition for Scene Reconstruction [9.391616497099422]
We present DynaSplat, an approach that extends Gaussian Splatting to dynamic scenes.<n>We classify scene elements as static or dynamic through a novel fusion of deformation offset statistics and 2D motion flow consistency.<n>We then introduce a hierarchical motion modeling strategy that captures both coarse global transformations and fine-grained local movements.
arXiv Detail & Related papers (2025-06-11T15:13:35Z) - SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion [0.0]
We present a novel framework for dynamic 3D scene reconstruction that integrates three key components.<n>An explicit tri-plane deformation field, a view-conditioned canonical field with spherical harmonics (SH) attention, and a temporally-aware latent diffusion prior.<n>Our method encodes 4D scenes using three 2D feature planes that evolve over time, enabling efficient compact representation.
arXiv Detail & Related papers (2025-05-22T11:25:38Z) - JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation [13.168628936598367]
JointTuner is a novel adaptive joint training framework.<n>We develop Adaptive LoRA, which incorporates a context-aware gating mechanism.<n>Appearance-independent Temporal Loss is introduced to decouple motion patterns from intrinsic appearance.
arXiv Detail & Related papers (2025-03-31T11:04:07Z) - Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z) - Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling [70.34875558830241]
We present a way for learning a-temporal (4D) embedding, based on semantic semantic gears to allow for stratified modeling of dynamic regions of rendering the scene.
At the same time, almost for free, our tracking approach enables free-viewpoint of interest - a functionality not yet achieved by existing NeRF-based methods.
arXiv Detail & Related papers (2024-06-06T03:37:39Z) - Enhancing Dynamic CT Image Reconstruction with Neural Fields and Optical Flow [0.0]
We show the benefits of introducing explicit motion regularizers for dynamic inverse problems based on partial differential equations.<n>We also compare neural fields against a grid-based solver and show that the former outperforms the latter in terms of PSNR.
arXiv Detail & Related papers (2024-06-03T13:07:29Z) - NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer [48.57740681957145]
We propose a new novel view synthesis (NVS) paradigm that operates textitwithout the need for training.<n>NVS-r adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences.
arXiv Detail & Related papers (2024-05-24T08:56:19Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable
Novel View Synthesis [90.03590032170169]
We present intrinsic neural radiance fields, dubbed IntrinsicNeRF, which introduce intrinsic decomposition into the NeRF-based neural rendering method.
Our experiments and editing samples on both object-specific/room-scale scenes and synthetic/real-word data demonstrate that we can obtain consistent intrinsic decomposition results.
arXiv Detail & Related papers (2022-10-02T22:45:11Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.