MotionSqueeze: Neural Motion Feature Learning for Video Understanding
- URL: http://arxiv.org/abs/2007.09933v1
- Date: Mon, 20 Jul 2020 08:30:14 GMT
- Title: MotionSqueeze: Neural Motion Feature Learning for Video Understanding
- Authors: Heeseung Kwon, Manjin Kim, Suha Kwak, and Minsu Cho
- Abstract summary: Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information.
In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features.
We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost.
- Score: 46.82376603090792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion plays a crucial role in understanding videos and most state-of-the-art
neural models for video classification incorporate motion information typically
using optical flows extracted by a separate off-the-shelf method. As the
frame-by-frame optical flows require heavy computation, incorporating motion
information has remained a major computational bottleneck for video
understanding. In this work, we replace external and heavy computation of
optical flows with internal and light-weight learning of motion features. We
propose a trainable neural module, dubbed MotionSqueeze, for effective motion
feature extraction. Inserted in the middle of any neural network, it learns to
establish correspondences across frames and convert them into motion features,
which are readily fed to the next downstream layer for better prediction. We
demonstrate that the proposed method provides a significant gain on four
standard benchmarks for action recognition with only a small amount of
additional cost, outperforming the state of the art on
Something-Something-V1&V2 datasets.
Related papers
- Video Diffusion Models are Training-free Motion Interpreter and Controller [20.361790608772157]
This paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models.
We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels.
arXiv Detail & Related papers (2024-05-23T17:59:40Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Moving Object Proposals with Deep Learned Optical Flow for Video Object
Segmentation [1.551271936792451]
We propose a state of art architecture of neural networks to get the moving object proposals (MOP)
We first train an unsupervised convolutional neural network (UnFlow) to generate optical flow estimation.
Then we render the output of optical flow net to a fully convolutional SegNet model.
arXiv Detail & Related papers (2024-02-14T01:13:55Z) - Hierarchical Graph Pattern Understanding for Zero-Shot VOS [102.21052200245457]
This paper proposes a new hierarchical graph neural network (GNN) architecture for zero-shot video object segmentation (ZS-VOS)
Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (ie, optical flow) to enhance the high-order representations from the neighbors of target frames.
arXiv Detail & Related papers (2023-12-15T04:13:21Z) - Self-Supervised Motion Magnification by Backpropagating Through Optical
Flow [16.80592879244362]
This paper presents a self-supervised method for magnifying subtle motions in video.
We manipulate the video such that its new optical flow is scaled by the desired amount.
We propose a loss function that estimates the optical flow of the generated video and penalizes how far if deviates from the given magnification factor.
arXiv Detail & Related papers (2023-11-28T18:59:51Z) - Dynamic Appearance: A Video Representation for Action Recognition with
Joint Training [11.746833714322154]
We introduce a new concept, Dynamic Appearance (DA), summarizing the appearance information relating to movement in a video.
We consider distilling the dynamic appearance from raw video data as a means of efficient video understanding.
We provide extensive experimental results on four action recognition benchmarks.
arXiv Detail & Related papers (2022-11-23T07:16:16Z) - EM-driven unsupervised learning for efficient motion segmentation [3.5232234532568376]
This paper presents a CNN-based fully unsupervised method for motion segmentation from optical flow.
We use the Expectation-Maximization (EM) framework to leverage the loss function and the training procedure of our motion segmentation neural network.
Our method outperforms comparable unsupervised methods and is very efficient.
arXiv Detail & Related papers (2022-01-06T14:35:45Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Hierarchical Contrastive Motion Learning for Video Action Recognition [100.9807616796383]
We present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames.
Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network.
Our motion learning module is lightweight and flexible to be embedded into various backbone networks.
arXiv Detail & Related papers (2020-07-20T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.