Related papers: Motion Modes: What Could Happen Next?

Motion Modes: What Could Happen Next?

URL: http://arxiv.org/abs/2412.00148v1
Date: Fri, 29 Nov 2024 01:51:08 GMT
Title: Motion Modes: What Could Happen Next?
Authors: Karran Pandey, Matheus Gadelha, Yannick Hold-Geoffroy, Karan Singh, Niloy J. Mitra, Paul Guerrero,
Abstract summary: Current video generation models often entangle object movement with camera motion and other scene changes.<n>We introduce Motion Modes, a training-free approach that explores a pre-trained image-to-video generator's latent distribution.<n>We achieve this by employing a flow generator guided by energy functions designed to disentangle object and camera motion.
Score: 45.24111039863531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting diverse object motions from a single static image remains challenging, as current video generation models often entangle object movement with camera motion and other scene changes. While recent methods can predict specific motions from motion arrow input, they rely on synthetic data and predefined motions, limiting their application to complex scenes. We introduce Motion Modes, a training-free approach that explores a pre-trained image-to-video generator's latent distribution to discover various distinct and plausible motions focused on selected objects in static images. We achieve this by employing a flow generator guided by energy functions designed to disentangle object and camera motion. Additionally, we use an energy inspired by particle guidance to diversify the generated motions, without requiring explicit training data. Experimental results demonstrate that Motion Modes generates realistic and varied object animations, surpassing previous methods and even human predictions regarding plausibility and diversity. Project Webpage: https://motionmodes.github.io/

Related papers

What Happens Next? Anticipating Future Motion by Generating Point Trajectories [76.16266402727643]
We consider the problem of forecasting motion from a single image, predicting how objects in the world are likely to move.<n>We formulate this task as conditional generation of dense trajectory grids with a model that closely follows the architecture of modern video generators.<n>This approach captures scene-wide dynamics and uncertainty, yielding more accurate and diverse predictions than prior regressors and generators.
arXiv Detail & Related papers (2025-09-25T21:03:56Z)
Move-in-2D: 2D-Conditioned Human Motion Generation [54.067588636155115]
We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image. Our approach accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene.
arXiv Detail & Related papers (2024-12-17T18:58:07Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.<n>We translate high-level user requests into detailed, semi-dense motion prompts.<n>We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation [27.690736225683825]
We introduce Motion Dreamer, a two-stage framework that explicitly separates motion reasoning from visual synthesis. Our approach introduces instance flow, a sparse-to-dense motion representation enabling effective integration of partial user-defined motions. Experiments demonstrate that Motion Dreamer significantly outperforms existing methods, achieving superior motion plausibility and visual realism.
arXiv Detail & Related papers (2024-11-30T17:40:49Z)
ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo) ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z)
Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models. We uncover the roles and interactions of attention elements in capturing and representing motion patterns. We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion. We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z)
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework. It excels at modeling complicated data distribution and generating vivid motion sequences. It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction. Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.