One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
- URL: http://arxiv.org/abs/2511.18922v1
- Date: Mon, 24 Nov 2025 09:31:23 GMT
- Title: One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
- Authors: Zhenxing Mi, Yuxin Wang, Dan Xu,
- Abstract summary: One4D is a unified framework for 4D generation and reconstruction.<n>It produces dynamic 4D content as synchronized RGB frames and pointmaps.<n>One4D is trained on a mixture of synthetic and real 4D datasets under modest computational budgets.
- Score: 15.085082024859142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present One4D, a unified framework for 4D generation and reconstruction that produces dynamic 4D content as synchronized RGB frames and pointmaps. By consistently handling varying sparsities of conditioning frames through a Unified Masked Conditioning (UMC) mechanism, One4D can seamlessly transition between 4D generation from a single image, 4D reconstruction from a full video, and mixed generation and reconstruction from sparse frames. Our framework adapts a powerful video generation model for joint RGB and pointmap generation, with carefully designed network architectures. The commonly used diffusion finetuning strategies for depthmap or pointmap reconstruction often fail on joint RGB and pointmap generation, quickly degrading the base video model. To address this challenge, we introduce Decoupled LoRA Control (DLC), which employs two modality-specific LoRA adapters to form decoupled computation branches for RGB frames and pointmaps, connected by lightweight, zero-initialized control links that gradually learn mutual pixel-level consistency. Trained on a mixture of synthetic and real 4D datasets under modest computational budgets, One4D produces high-quality RGB frames and accurate pointmaps across both generation and reconstruction tasks. This work represents a step toward general, high-quality geometry-based 4D world modeling using video diffusion models. Project page: https://mizhenxing.github.io/One4D
Related papers
- 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere [77.83037497484366]
We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos.<n>4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics.
arXiv Detail & Related papers (2026-02-10T18:57:04Z) - Any4D: Unified Feed-Forward Metric 4D Reconstruction [39.62006179006032]
We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction.<n>Any4D directly generates per-pixel motion and geometry predictions for N frames.<n>We achieve superior performance across diverse setups - both in terms of accuracy (2-3X lower error) and compute efficiency (15X faster)
arXiv Detail & Related papers (2025-12-11T18:57:39Z) - Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image [88.71287865590273]
We introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories.<n>We propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D trajectories.<n>We then propose a 4D View Synthesis Module (4D-Vi) to render videos with arbitrary camera trajectories from 4D point track representations.
arXiv Detail & Related papers (2025-12-04T17:59:10Z) - 4DNeX: Feed-Forward 4D Generative Modeling Made Easy [51.79072580042173]
We present 4DNeX, the first feed-forward framework for generating 4D (i.e., dynamic 3D) scene representations from a single image.<n>In contrast to existing methods that rely on computationally intensive optimization or require multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D generation.
arXiv Detail & Related papers (2025-08-18T17:59:55Z) - Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation [3.1852855132066673]
Current approaches often struggle to maintain view consistency while handling complex scene dynamics.<n>This framework is the first to leverage both rich temporal priors video diffusion models and geometric awareness of the reconstruction models.<n>It significantly facilitates 4D generation and shows higher quality (e.g., mPSNR, mSSIM) over existing methods.
arXiv Detail & Related papers (2025-08-11T08:55:47Z) - MVG4D: Image Matrix-Based Multi-View and Motion Generation for 4D Content Creation from a Single Image [8.22464804794448]
We propose MVG4D, a novel framework that generates dynamic 4D content from a single still image.<n>At its core, MVG4D employs an image matrix module that synthesizes temporally coherent and spatially diverse multi-view images.<n>Our method effectively enhances temporal consistency, geometric fidelity, and visual realism, addressing key challenges in motion discontinuity and background degradation.
arXiv Detail & Related papers (2025-07-24T12:48:14Z) - 4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation [66.20991603309054]
We propose the first framework capable of computing a 4D-temporal grid of video frames and 3D Gaussian particles for each time step using a feed-forward architecture.<n>In the first part, we analyze current 4D video diffusion architectures that perform spatial and temporal attention either sequentially or in parallel within a two-stream design.<n>In the second part, we extend existing 3D reconstruction algorithms by introducing a Gaussian head, a camera token replacement algorithm, and additional dynamic layers and training.
arXiv Detail & Related papers (2025-06-18T23:44:59Z) - TesserAct: Learning 4D Embodied World Models [66.8519958275311]
We learn a 4D world model by training on RGB-DN (RGB, Depth, and Normal) videos.<n>This not only surpasses traditional 2D models by incorporating detailed shape, configuration, and temporal changes into their predictions, but also allows us to effectively learn accurate inverse dynamic models for an embodied agent.
arXiv Detail & Related papers (2025-04-29T17:59:30Z) - Can Video Diffusion Model Reconstruct 4D Geometry? [66.5454886982702]
Sora3R is a novel framework that taps into richtemporals of large dynamic video diffusion models to infer 4D pointmaps from casual videos.<n>Experiments demonstrate that Sora3R reliably recovers both camera poses and detailed scene geometry, achieving performance on par with state-of-the-art methods for dynamic 4D reconstruction.
arXiv Detail & Related papers (2025-03-27T01:44:46Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z) - CT4D: Consistent Text-to-4D Generation with Animatable Meshes [53.897244823604346]
We present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts.
Our framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes.
Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry.
arXiv Detail & Related papers (2024-08-15T14:41:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.