Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis
- URL: http://arxiv.org/abs/2601.14253v1
- Date: Tue, 20 Jan 2026 18:59:48 GMT
- Title: Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis
- Authors: Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, Anpei Chen,
- Abstract summary: Motion 3-to-4 is a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video.<n>Our model learns a compact motion latent representation and predicts per-frame trajectories to recover complete robustness, temporally coherent geometry.
- Score: 53.48281548500864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Motion 3-to-4, a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video and an optional 3D reference mesh. While recent advances have significantly improved 2D, video, and 3D content generation, 4D synthesis remains difficult due to limited training data and the inherent ambiguity of recovering geometry and motion from a monocular viewpoint. Motion 3-to-4 addresses these challenges by decomposing 4D synthesis into static 3D shape generation and motion reconstruction. Using a canonical reference mesh, our model learns a compact motion latent representation and predicts per-frame vertex trajectories to recover complete, temporally coherent geometry. A scalable frame-wise transformer further enables robustness to varying sequence lengths. Evaluations on both standard benchmarks and a new dataset with accurate ground-truth geometry show that Motion 3-to-4 delivers superior fidelity and spatial consistency compared to prior work. Project page is available at https://motion3-to-4.github.io/.
Related papers
- SWiT-4D: Sliding-Window Transformer for Lossless and Parameter-Free Temporal 4D Generation [30.72482055095692]
SWiT-4D is a Sliding-Window Transformer for lossless, parameter-free temporal 4D mesh generation.<n> SWiT-4D integrates seamlessly with any Diffusion Transformer (DiT)-based image-to-3D generator.<n>It achieves high-fidelity geometry and stable temporal consistency, indicating practical deployability under extremely limited 4D supervision.
arXiv Detail & Related papers (2025-12-11T17:54:31Z) - Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image [88.71287865590273]
We introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories.<n>We propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D trajectories.<n>We then propose a 4D View Synthesis Module (4D-Vi) to render videos with arbitrary camera trajectories from 4D point track representations.
arXiv Detail & Related papers (2025-12-04T17:59:10Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - C4D: 4D Made from 3D through Dual Correspondences [77.04731692213663]
We introduce C4D, a framework that leverages temporal correspondences to extend existing 3D reconstruction formulation to 4D.<n>C4D captures two types of correspondences: short-term optical flow and long-term point tracking.<n>We train a dynamic-aware point tracker that provides additional mobility information.
arXiv Detail & Related papers (2025-10-16T17:59:06Z) - ShapeGen4D: Towards High Quality 4D Shape Generation from Videos [85.45517487721257]
We introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video.<n>Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization.
arXiv Detail & Related papers (2025-10-07T17:58:11Z) - MagicPose4D: Crafting Articulated Models with Appearance and Motion Control [17.161695123524563]
We propose MagicPose4D, a framework for refined control over both appearance and motion in 4D generation.<n>Unlike current 4D generation methods, MagicPose4D accepts monocular videos or mesh sequences as motion prompts.<n>We demonstrate that MagicPose4D significantly improves the accuracy and consistency of 4D content generation.
arXiv Detail & Related papers (2024-05-22T21:51:01Z) - SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance.
Our method surpasses existing methods in both quality and efficiency.
We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.