Self-supervised Motion Learning from Static Images
- URL: http://arxiv.org/abs/2104.00240v1
- Date: Thu, 1 Apr 2021 03:55:50 GMT
- Title: Self-supervised Motion Learning from Static Images
- Authors: Ziyuan Huang, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Rong Jin,
Marcelo Ang
- Abstract summary: Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
- Score: 36.85209332144106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motions are reflected in videos as the movement of pixels, and actions are
essentially patterns of inconsistent motions between the foreground and the
background. To well distinguish the actions, especially those with complicated
spatio-temporal interactions, correctly locating the prominent motion areas is
of crucial importance. However, most motion information in existing videos are
difficult to label and training a model with good motion representations with
supervision will thus require a large amount of human labour for annotation. In
this paper, we address this problem by self-supervised learning. Specifically,
we propose to learn Motion from Static Images (MoSI). The model learns to
encode motion information by classifying pseudo motions generated by MoSI. We
furthermore introduce a static mask in pseudo motions to create local motion
patterns, which forces the model to additionally locate notable motion areas
for the correct classification.We demonstrate that MoSI can discover regions
with large motion even without fine-tuning on the downstream datasets. As a
result, the learned motion representations boost the performance of tasks
requiring understanding of complex scenes and motions, i.e., action
recognition. Extensive experiments show the consistent and transferable
improvements achieved by MoSI. Codes will be soon released.
Related papers
- Motion meets Attention: Video Motion Prompts [34.429192862783054]
We propose a modified Sigmoid function with learnable slope and shift parameters as an attention mechanism to modulate motion signals from frame differencing maps.
This approach generates a sequence of attention maps that enhance the processing of motion-related video content.
We show that our lightweight, plug-and-play motion prompt layer seamlessly integrates into models like SlowGym, X3D, and Timeformer.
arXiv Detail & Related papers (2024-07-03T14:59:46Z) - Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models.
We uncover the roles and interactions of attention elements in capturing and representing motion patterns.
We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z) - MotionLLM: Understanding Human Behaviors from Human Motions and Videos [40.132643319573205]
This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding.
We present MotionLLM, a framework for human motion understanding, captioning, and reasoning.
arXiv Detail & Related papers (2024-05-30T17:59:50Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation.
Recent works have adopted frame difference as the source of motion information in video contrastive learning.
We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z) - Masked Motion Encoding for Self-Supervised Video Representation Learning [84.24773072241945]
We present Masked Motion MME, a new pre-training paradigm that reconstructs both appearance and motion information to explore temporal clues.
Motivated by the fact that human is able to recognize an action by tracking objects' position changes and shape changes, we propose to reconstruct a motion trajectory that represents these two kinds of change in the masked regions.
Pre-trained with our MME paradigm, the model is able to anticipate long-term and fine-grained motion details.
arXiv Detail & Related papers (2022-10-12T11:19:55Z) - Differential Motion Evolution for Fine-Grained Motion Deformation in
Unsupervised Image Animation [41.85199775016731]
We introduce DiME, an end-to-end unsupervised motion transfer framework.
By capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field.
We also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available.
arXiv Detail & Related papers (2021-10-09T22:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.