WildSmoke: Ready-to-Use Dynamic 3D Smoke Assets from a Single Video in the Wild
- URL: http://arxiv.org/abs/2509.11114v1
- Date: Sun, 14 Sep 2025 06:06:42 GMT
- Title: WildSmoke: Ready-to-Use Dynamic 3D Smoke Assets from a Single Video in the Wild
- Authors: Yuqiu Liu, Jialin Song, Manolis Savva, Wuyang Chen,
- Abstract summary: We propose a pipeline to extract and reconstruct dynamic 3D smoke assets from a single in-the-wild video.<n>Our method outperforms previous reconstruction and generation methods with high-quality smoke reconstructions.
- Score: 15.941164647083696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a pipeline to extract and reconstruct dynamic 3D smoke assets from a single in-the-wild video, and further integrate interactive simulation for smoke design and editing. Recent developments in 3D vision have significantly improved reconstructing and rendering fluid dynamics, supporting realistic and temporally consistent view synthesis. However, current fluid reconstructions rely heavily on carefully controlled clean lab environments, whereas real-world videos captured in the wild are largely underexplored. We pinpoint three key challenges of reconstructing smoke in real-world videos and design targeted techniques, including smoke extraction with background removal, initialization of smoke particles and camera poses, and inferring multi-view videos. Our method not only outperforms previous reconstruction and generation methods with high-quality smoke reconstructions (+2.22 average PSNR on wild videos), but also enables diverse and realistic editing of fluid dynamics by simulating our smoke assets. We provide our models, data, and 4D smoke assets at [https://autumnyq.github.io/WildSmoke](https://autumnyq.github.io/WildSmoke).
Related papers
- Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes [22.450051108066216]
We present C2R (Coarse-to-Real), a generative framework that synthesizes real-style urban crowd videos.<n>Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories.<n>It produces temporally consistent, controllable, and realistic urban scene videos from minimal 3D input.
arXiv Detail & Related papers (2026-01-29T20:29:04Z) - MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos [31.168481928653748]
MoCapAnything is a reference-guided, factorized framework for 3D motion capture.<n>It reconstructs a rotation-based animation that directly drives the specific asset.<n>It delivers high-quality skeletal animations and meaningful cross-species animations.
arXiv Detail & Related papers (2025-12-11T18:09:48Z) - SViM3D: Stable Video Material Diffusion for Single Image 3D Generation [48.986972061812004]
Video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently.<n>We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control.<n>This unique setup allows for relighting and generating a 3D asset using our model as neural prior.
arXiv Detail & Related papers (2025-10-09T14:29:47Z) - Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation [87.91642226587294]
Current learning-based 3D reconstruction methods rely on the availability of captured real-world multi-view data.<n>We propose a self-distillation framework that distills the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation.<n>Our framework achieves state-of-the-art performance in static and dynamic 3D scene generation.
arXiv Detail & Related papers (2025-09-23T17:58:01Z) - SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction [14.475461616365346]
Smoke in real-world scenes can severely degrade the quality of images and hamper visibility.<n>We introduce SmokeSeer, a method for simultaneous 3D scene reconstruction and smoke removal from a video.<n>Our method uses thermal and RGB images, leveraging the fact that the reduced scattering in thermal images enables us to see through the smoke.
arXiv Detail & Related papers (2025-09-22T03:05:22Z) - UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting [54.883935964137706]
We introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs.<n>We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans.<n>Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.
arXiv Detail & Related papers (2025-06-05T13:21:09Z) - VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step [13.168559963356952]
VideoScene aims to distill the video diffusion model to generate 3D scenes in one step.<n>VideoScene achieves faster and superior 3D scene generation results than previous video diffusion models.
arXiv Detail & Related papers (2025-04-02T17:59:21Z) - UVRM: A Scalable 3D Reconstruction Model from Unposed Videos [68.34221167200259]
Training 3D reconstruction models with 2D visual data traditionally requires prior knowledge of camera poses for the training samples.<n>We introduce UVRM, a novel 3D reconstruction model capable of being trained and evaluated on monocular videos without requiring any information about the pose.
arXiv Detail & Related papers (2025-01-16T08:00:17Z) - Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning [41.30923253467854]
Temporal features can be complex and diverse.<n>Spatiotemporal models often lean heavily on one type of artifact and ignore the other.<n>Videos are naturally resource-intensive.
arXiv Detail & Related papers (2024-08-30T07:49:57Z) - Self-Supervised Video Desmoking for Laparoscopic Surgery [48.83900673665993]
We introduce self-supervised surgery video desmoking (SelfSVD)
We observe that the frame captured before the activation of high-energy devices is generally clear (named pre-smoke frame, PS frame)
We further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.
arXiv Detail & Related papers (2024-03-17T12:38:58Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.