A new way of video compression via forward-referencing using deep
learning
- URL: http://arxiv.org/abs/2208.06678v1
- Date: Sat, 13 Aug 2022 16:19:11 GMT
- Title: A new way of video compression via forward-referencing using deep
learning
- Authors: S.M.A.K. Rajin, M. Murshed, M. Paul, S.W. Teng, J. Ma
- Abstract summary: This paper explores a new way of video coding by modelling human pose from the already-encoded frames.
It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames.
Experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% residual savings for high motion video sequences.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To exploit high temporal correlations in video frames of the same scene, the
current frame is predicted from the already-encoded reference frames using
block-based motion estimation and compensation techniques. While this approach
can efficiently exploit the translation motion of the moving objects, it is
susceptible to other types of affine motion and object occlusion/deocclusion.
Recently, deep learning has been used to model the high-level structure of
human pose in specific actions from short videos and then generate virtual
frames in future time by predicting the pose using a generative adversarial
network (GAN). Therefore, modelling the high-level structure of human pose is
able to exploit semantic correlation by predicting human actions and
determining its trajectory. Video surveillance applications will benefit as
stored big surveillance data can be compressed by estimating human pose
trajectories and generating future frames through semantic correlation. This
paper explores a new way of video coding by modelling human pose from the
already-encoded frames and using the generated frame at the current time as an
additional forward-referencing frame. It is expected that the proposed approach
can overcome the limitations of the traditional backward-referencing frames by
predicting the blocks containing the moving objects with lower residuals.
Experimental results show that the proposed approach can achieve on average up
to 2.83 dB PSNR gain and 25.93\% bitrate savings for high motion video
sequences
Related papers
- Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling [67.94143911629143]
We propose a generative Transformer VAE architecture to model hand pose and action.
To faithfully model the semantic dependency and different temporal granularity of hand pose and action, we decompose the framework into two cascaded VAE blocks.
Results show that our joint modeling of recognition and prediction improves over isolated solutions.
arXiv Detail & Related papers (2023-11-29T05:28:39Z) - STDepthFormer: Predicting Spatio-temporal Depth from Video with a
Self-supervised Transformer Model [0.0]
Self-supervised model simultaneously predicts a sequence of future frames from video-input with a spatial-temporal attention network is proposed.
The proposed model leverages prior scene knowledge such as object shape and texture similar to single-image depth inference methods.
It is implicitly capable of forecasting the motion of objects in the scene, rather than requiring complex models involving multi-object detection, segmentation and tracking.
arXiv Detail & Related papers (2023-03-02T12:22:51Z) - Kinematic-aware Hierarchical Attention Network for Human Pose Estimation
in Videos [17.831839654593452]
Previous-based human pose estimation methods have shown promising results by leveraging features of consecutive frames.
Most approaches compromise accuracy to jitter and do not comprehend the temporal aspects of human motion.
We design an architecture that exploits kinematic keypoint features.
arXiv Detail & Related papers (2022-11-29T01:46:11Z) - Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion.
This approach is not optimal when the temporal distance between the input sequence increases.
We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - Robust Motion In-betweening [17.473287573543065]
We present a novel, robust transition generation technique that can serve as a new tool for 3D animators.
The system synthesizes high-quality motions that use temporally-sparsers as animation constraints.
We present a custom MotionBuilder plugin that uses our trained model to perform in-betweening in production scenarios.
arXiv Detail & Related papers (2021-02-09T16:52:45Z) - Motion Segmentation using Frequency Domain Transformer Networks [29.998917158604694]
We propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately.
Our approach can outperform some widely used video prediction methods like Video Ladder Network and Predictive Gated Pyramids on synthetic data.
arXiv Detail & Related papers (2020-04-18T15:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.