Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions
- URL: http://arxiv.org/abs/2507.07978v1
- Date: Thu, 10 Jul 2025 17:54:27 GMT
- Title: Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions
- Authors: Longfei Li, Zhiwen Fan, Wenyan Cong, Xinhang Liu, Yuyang Yin, Matt Foutter, Panwang Pan, Chenyu You, Yue Wang, Zhangyang Wang, Yao Zhao, Marco Pavone, Yunchao Wei,
- Abstract summary: We propose a data curation pipeline that reconstructs 3D Martian environments from real stereo navigation images.<n>A Martian terrain video generator, MarsGen, synthesizes novel videos visually realistic and geometrically consistent with the 3D structure encoded in the data.<n>Our approach outperforms video synthesis models trained on terrestrial datasets, achieving superior visual fidelity and 3D structural consistency.
- Score: 116.56517155163716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesizing realistic Martian landscape videos is crucial for mission rehearsal and robotic simulation. However, this task poses unique challenges due to the scarcity of high-quality Martian data and the significant domain gap between Martian and terrestrial imagery. To address these challenges, we propose a holistic solution composed of two key components: 1) A data curation pipeline Multimodal Mars Synthesis (M3arsSynth), which reconstructs 3D Martian environments from real stereo navigation images, sourced from NASA's Planetary Data System (PDS), and renders high-fidelity multiview 3D video sequences. 2) A Martian terrain video generator, MarsGen, which synthesizes novel videos visually realistic and geometrically consistent with the 3D structure encoded in the data. Our M3arsSynth engine spans a wide range of Martian terrains and acquisition dates, enabling the generation of physically accurate 3D surface models at metric-scale resolution. MarsGen, fine-tuned on M3arsSynth data, synthesizes videos conditioned on an initial image frame and, optionally, camera trajectories or textual prompts, allowing for video generation in novel environments. Experimental results show that our approach outperforms video synthesis models trained on terrestrial datasets, achieving superior visual fidelity and 3D structural consistency.
Related papers
- Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation [66.95956271144982]
We present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image.<n>Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames.
arXiv Detail & Related papers (2025-06-04T17:59:04Z) - Seeing World Dynamics in a Nutshell [132.79736435144403]
NutWorld is a framework that transforms monocular videos into dynamic 3D representations in a single forward pass.<n>We demonstrate that NutWorld achieves high-fidelity video reconstruction quality while enabling downstream applications in real-time.
arXiv Detail & Related papers (2025-02-05T18:59:52Z) - Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text [61.9973218744157]
We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories.
Experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.
arXiv Detail & Related papers (2024-06-25T14:42:51Z) - MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis [26.710960922302124]
We propose a real-world Multi-Sensor Hybrid Room dataset (MuSHRoom)<n>Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices.<n>We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis.
arXiv Detail & Related papers (2023-11-05T21:46:12Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - GINA-3D: Learning to Generate Implicit Neural Assets in the Wild [38.51391650845503]
GINA-3D is a generative model that uses real-world driving data from camera and LiDAR sensors to create 3D implicit neural assets of diverse vehicles and pedestrians.
We construct a large-scale object-centric dataset containing over 1.2M images of vehicles and pedestrians.
We demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
arXiv Detail & Related papers (2023-04-04T23:41:20Z) - MaRF: Representing Mars as Neural Radiance Fields [1.4680035572775534]
MaRF is a framework able to synthesize the Martian environment using several collections of images from rover cameras.
It addresses key challenges in planetary surface exploration such as: planetary geology, simulated navigation and shape analysis.
In the experimental section, we demonstrate the environments created from actual Mars datasets captured by Curiosity rover, Perseverance rover and Ingenuity helicopter.
arXiv Detail & Related papers (2022-12-03T18:58:00Z) - UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body
Decoupling 3D Model [58.70130563417079]
We introduce a new 3D human-body model with a series of decoupled parameters that could freely control the generation of the body.
Compared to the existing manually annotated DensePose-COCO dataset, the synthetic UltraPose has ultra dense image-to-surface correspondences without annotation cost and error.
arXiv Detail & Related papers (2021-10-28T16:24:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.