InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem
- URL: http://arxiv.org/abs/2512.05672v1
- Date: Fri, 05 Dec 2025 12:31:09 GMT
- Title: InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem
- Authors: Yeobin Hong, Suhyeon Lee, Hyungjin Chung, Jong Chul Ye,
- Abstract summary: InverseCrafter is an efficient inpainting inverse solver that reformulates the 4D generation task as an inpainting problem solved in the latent space.<n>InverseCrafter achieves comparable novel view generation and superior measurement consistency in camera control tasks with near-zero computational overhead.
- Score: 57.18573487248607
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent approaches to controllable 4D video generation often rely on fine-tuning pre-trained Video Diffusion Models (VDMs). This dominant paradigm is computationally expensive, requiring large-scale datasets and architectural modifications, and frequently suffers from catastrophic forgetting of the model's original generative priors. Here, we propose InverseCrafter, an efficient inpainting inverse solver that reformulates the 4D generation task as an inpainting problem solved in the latent space. The core of our method is a principled mechanism to encode the pixel space degradation operator into a continuous, multi-channel latent mask, thereby bypassing the costly bottleneck of repeated VAE operations and backpropagation. InverseCrafter not only achieves comparable novel view generation and superior measurement consistency in camera control tasks with near-zero computational overhead, but also excels at general-purpose video inpainting with editing. Code is available at https://github.com/yeobinhong/InverseCrafter.
Related papers
- LaVR: Scene Latent Conditioned Generative Video Trajectory Re-Rendering using Large 4D Reconstruction Models [52.656349227001925]
Given a monocular video, the goal of video re-rendering is to generate views of the scene from a novel camera trajectory.<n>Existing methods face two distinct challenges.<n>We propose to address these challenges by using the implicit geometric knowledge embedded in the latent space of a large 4D reconstruction model.
arXiv Detail & Related papers (2026-01-21T05:46:03Z) - Efficiently Reconstructing Dynamic Scenes One D4RT at a Time [54.67332582569525]
This paper introduces D4RT, a simple yet powerful feedforward model designed to efficiently solve this task.<n>Our decoding interface allows the model to independently and flexibly probe the 3D position of any point in space and time.<n>We demonstrate that our approach sets a new state of the art, outperforming previous methods across a wide spectrum of 4D reconstruction tasks.
arXiv Detail & Related papers (2025-12-09T18:57:21Z) - UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction [73.29048162438797]
We introduce UniVerse, a unified framework for robust reconstruction based on a video diffusion model.<n>Specifically, UniVerse first converts inconsistent images into initial videos, then uses a specially designed video diffusion model to restore them into consistent images.<n>Experiments on both synthetic and real-world datasets demonstrate the strong generalization capability and superior performance of our method in robust reconstruction.
arXiv Detail & Related papers (2025-10-02T04:50:18Z) - DiTPainter: Efficient Video Inpainting with Diffusion Transformers [35.1896530415315]
We present DiTPainter, an end-to-end video inpainting model based on Diffusion Transformer (DiT)<n>DiTPainter uses an efficient transformer network designed for video inpainting, which is trained from scratch instead of initializing from any large pretrained models.<n>Experiments show that DiTPainter outperforms existing video inpainting algorithms with higher quality and better spatial-temporal consistency.
arXiv Detail & Related papers (2025-04-22T07:36:45Z) - SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models [10.66567645920237]
Given an input video of a person and a new garment, the objective of this paper is to synthesize a new video where the person is wearing the garment while maintaining temporal consistency.<n>We reconceptualize video virtual try-on as a conditional video inpainting task, with garments serving as input conditions.<n>Specifically, our approach enhances image diffusion models by incorporating temporal attention layers to improve temporal coherence.
arXiv Detail & Related papers (2024-12-13T14:50:26Z) - EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation.
Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z) - EfficientSCI: Densely Connected Network with Space-time Factorization
for Large-scale Video Snapshot Compressive Imaging [6.8372546605486555]
We show that an UHD color video with high compression ratio can be reconstructed from a snapshot 2D measurement using a single end-to-end deep learning model with PSNR above 32 dB.
Our method significantly outperforms all previous SOTA algorithms with better real-time performance.
arXiv Detail & Related papers (2023-05-17T07:28:46Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.