Related papers: MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps

MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps

URL: http://arxiv.org/abs/2510.11107v1
Date: Mon, 13 Oct 2025 07:56:19 GMT
Title: MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
Authors: Jiahui Lei, Kyle Genova, George Kopanas, Noah Snavely, Leonidas Guibas,
Abstract summary: This paper addresses the challenge of learning semantically and functionally meaningful 3D motion priors from real-world videos.<n>We propose a pixel-aligned Motion Map representation for 3D scene motion, which can be generated from existing generative image models.
Score: 31.864441290577545
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the challenge of learning semantically and functionally meaningful 3D motion priors from real-world videos, in order to enable prediction of future 3D scene motion from a single input image. We propose a novel pixel-aligned Motion Map (MoMap) representation for 3D scene motion, which can be generated from existing generative image models to facilitate efficient and effective motion prediction. To learn meaningful distributions over motion, we create a large-scale database of MoMaps from over 50,000 real videos and train a diffusion model on these representations. Our motion generation not only synthesizes trajectories in 3D but also suggests a new pipeline for 2D video synthesis: first generate a MoMap, then warp an image accordingly and complete the warped point-based renderings. Experimental results demonstrate that our approach generates plausible and semantically consistent 3D scene motion.

Related papers

DIMO: Diverse 3D Motion Generation for Arbitrary Objects [57.14954351767432]
DIMO is a generative approach capable of generating diverse 3D motions for arbitrary objects from a single image.<n>We leverage the rich priors in well-trained video models to extract the common motion patterns.<n>During inference time with learned latent space, we can instantly sample diverse 3D motions in a single-forward pass.
arXiv Detail & Related papers (2025-11-10T18:56:49Z)
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation [77.79131321983677]
Drag4D is an interactive framework that integrates object motion control within text-driven 3D scene generation.<n>This framework enables users to define 3D trajectories for the 3D objects generated from a single image, seamlessly integrating them into a high-quality 3D background.
arXiv Detail & Related papers (2025-09-26T05:23:45Z)
DreamJourney: Perpetual View Generation with Video Diffusion Models [91.88716097573206]
Perpetual view generation aims to synthesize a long-term video corresponding to an arbitrary camera trajectory solely from a single input image.<n>Recent methods commonly utilize a pre-trained text-to-image diffusion model to synthesize new content of previously unseen regions along camera movement.<n>We present DreamJourney, a two-stage framework that leverages the world simulation capacity of video diffusion models to trigger a new perpetual scene view generation task.
arXiv Detail & Related papers (2025-06-21T12:51:34Z)
Recovering Dynamic 3D Sketches from Videos [30.87733869892925]
Liv3Stroke is a novel approach for abstracting objects in motion with deformable 3D strokes.<n>We first extract noisy, 3D point cloud motion guidance from video frames using semantic features.<n>Our approach deforms a set of curves to abstract essential motion features as a set of explicit 3D representations.
arXiv Detail & Related papers (2025-03-26T08:43:21Z)
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models [71.78723353724493]
Animation of humanoid characters is essential in various graphics applications.<n>We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes.
arXiv Detail & Related papers (2025-03-20T10:00:22Z)
Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization [9.231848716070257]
ATOP (Articulate That Object Part) is a novel few-shot method based on motion personalization to articulate a static 3D object.<n>We show that our method is capable of generating realistic motion videos and predicting 3D motion parameters in a more accurate and generalizable way.
arXiv Detail & Related papers (2025-02-11T05:47:16Z)
Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation [43.915871360698546]
2D human videos offer a vast and accessible source of motion data, covering a wider range of styles and activities.<n>We introduce a novel framework that disentangles local joint motion from global movements, enabling efficient learning of local motion priors from 2D data.<n>Our method efficiently utilizes 2D data, supporting realistic 3D human motion generation and broadening the range of motion types it supports.
arXiv Detail & Related papers (2024-12-17T17:34:52Z)
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together. Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds. To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z)
Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion. We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z)
DEMOS: Dynamic Environment Motion Synthesis in 3D Scenes via Local Spherical-BEV Perception [54.02566476357383]
We propose the first Dynamic Environment MOtion Synthesis framework (DEMOS) to predict future motion instantly according to the current scene. We then use it to dynamically update the latent motion for final motion synthesis. The results show our method outperforms previous works significantly and has great performance in handling dynamic environments.
arXiv Detail & Related papers (2024-03-04T05:38:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.