Revisiting Learning-based Video Motion Magnification for Real-time
Processing
- URL: http://arxiv.org/abs/2403.01898v1
- Date: Mon, 4 Mar 2024 09:57:08 GMT
- Title: Revisiting Learning-based Video Motion Magnification for Real-time
Processing
- Authors: Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin,
Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh
- Abstract summary: Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye.
We introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art.
- Score: 23.148430647367224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video motion magnification is a technique to capture and amplify subtle
motion in a video that is invisible to the naked eye. The deep learning-based
prior work successfully demonstrates the modelling of the motion magnification
problem with outstanding quality compared to conventional signal
processing-based ones. However, it still lags behind real-time performance,
which prevents it from being extended to various online applications. In this
paper, we investigate an efficient deep learning-based motion magnification
model that runs in real time for full-HD resolution videos. Due to the
specified network design of the prior art, i.e. inhomogeneous architecture, the
direct application of existing neural architecture search methods is
complicated. Instead of automatic search, we carefully investigate the
architecture module by module for its role and importance in the motion
magnification task. Two key findings are 1) Reducing the spatial resolution of
the latent motion representation in the decoder provides a good trade-off
between computational efficiency and task quality, and 2) surprisingly, only a
single linear layer and a single branch in the encoder are sufficient for the
motion magnification task. Based on these findings, we introduce a real-time
deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X
faster than the prior art while maintaining comparable quality.
Related papers
- Flatten: Video Action Recognition is an Image Classification task [15.518011818978074]
A novel video representation architecture, Flatten, serves as a plug-and-play module that can be seamlessly integrated into any image-understanding network.
Experiments on commonly used datasets have demonstrated that embedding Flatten provides significant performance improvements over original model.
arXiv Detail & Related papers (2024-08-17T14:59:58Z) - Self-Supervised Motion Magnification by Backpropagating Through Optical
Flow [16.80592879244362]
This paper presents a self-supervised method for magnifying subtle motions in video.
We manipulate the video such that its new optical flow is scaled by the desired amount.
We propose a loss function that estimates the optical flow of the generated video and penalizes how far if deviates from the given magnification factor.
arXiv Detail & Related papers (2023-11-28T18:59:51Z) - MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot
Action Recognition [50.345327516891615]
We develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.
MoLo can simultaneously learn long-range temporal context and motion cues for comprehensive few-shot matching.
arXiv Detail & Related papers (2023-04-03T13:09:39Z) - STB-VMM: Swin Transformer Based Video Motion Magnification [0.0]
This work presents a new state-of-the-art model based on the Swin Transformer.
It offers better tolerance to noisy inputs as well as higher-quality outputs that exhibit less noise, blurriness, and artifacts than prior-art.
arXiv Detail & Related papers (2023-02-20T14:21:56Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z) - MotionSqueeze: Neural Motion Feature Learning for Video Understanding [46.82376603090792]
Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information.
In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features.
We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost.
arXiv Detail & Related papers (2020-07-20T08:30:14Z) - Knowing What, Where and When to Look: Efficient Video Action Modeling
with Attention [84.83632045374155]
Attentive video modeling is essential for action recognition in unconstrained videos.
What-Where-When (W3) video attention module models all three facets of video attention jointly.
Experiments show that our attention model brings significant improvements to existing action recognition models.
arXiv Detail & Related papers (2020-04-02T21:48:11Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.