Generative Memorize-Then-Recall framework for low bit-rate Surveillance
Video Compression
- URL: http://arxiv.org/abs/1912.12847v3
- Date: Wed, 6 May 2020 14:28:58 GMT
- Title: Generative Memorize-Then-Recall framework for low bit-rate Surveillance
Video Compression
- Authors: Yaojun Wu, Tianyu He, Zhibo Chen
- Abstract summary: Surveillance video feature (memory) for Group of Picture (GoP) and skeleton for each frame (clue)
memory is obtained by sequentially feeding frame inside GoP into a recurrent neural coding, describing appearance for objects that appeared inside GoP.
Experiments indicate that our method effectively generates realistic reconstruction based on appearance and skeleton.
- Score: 29.716388163447345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applications of surveillance video have developed rapidly in recent years to
protect public safety and daily life, which often detect and recognize objects
in video sequences. Traditional coding frameworks remove temporal redundancy in
surveillance video by block-wise motion compensation, lacking the extraction
and utilization of inherent structure information. In this paper, we figure out
this issue by disentangling surveillance video into the structure of a global
spatio-temporal feature (memory) for Group of Picture (GoP) and skeleton for
each frame (clue). The memory is obtained by sequentially feeding frame inside
GoP into a recurrent neural network, describing appearance for objects that
appeared inside GoP. While the skeleton is calculated by a pose estimator, it
is regarded as a clue to recall memory. Furthermore, an attention mechanism is
introduced to obtain the relation between appearance and skeletons. Finally, we
employ generative adversarial network to reconstruct each frame. Experimental
results indicate that our method effectively generates realistic reconstruction
based on appearance and skeleton, which show much higher compression
performance on surveillance video compared with the latest video compression
standard H.265.
Related papers
- Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video
Retrieval [67.52910255064762]
We design a simple dual-stream structure, including a temporal layer and a hash layer.
We first design a simple dual-stream structure, including a temporal layer and a hash layer.
With the help of semantic similarity knowledge obtained from self-supervision, the hash layer learns to capture information for semantic retrieval.
In this way, the model naturally preserves the disentangled semantics into binary codes.
arXiv Detail & Related papers (2023-10-12T03:21:12Z) - A new way of video compression via forward-referencing using deep
learning [0.0]
This paper explores a new way of video coding by modelling human pose from the already-encoded frames.
It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames.
Experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% residual savings for high motion video sequences.
arXiv Detail & Related papers (2022-08-13T16:19:11Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Recurrent Video Restoration Transformer with Guided Deformable Attention [116.1684355529431]
We propose RVRT, which processes local neighboring frames in parallel within a globally recurrent framework.
RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.
arXiv Detail & Related papers (2022-06-05T10:36:09Z) - Recurrence-in-Recurrence Networks for Video Deblurring [58.49075799159015]
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames.
In this paper, we propose recurrence-in-recurrence network architecture to cope with the limitations of short-ranged memory.
arXiv Detail & Related papers (2022-03-12T11:58:13Z) - Unsupervised Video Summarization with a Convolutional Attentive
Adversarial Network [32.90753137435032]
We propose a convolutional attentive adversarial network (CAAN) to build a deep summarizer in an unsupervised way.
Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.
The results show the superiority of our proposed method against other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-24T07:24:39Z) - Reconstructive Sequence-Graph Network for Video Summarization [107.0328985865372]
Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization.
We propose a Reconstructive Sequence-Graph Network (RSGN) to encode the frames and shots as sequence and graph hierarchically.
A reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner.
arXiv Detail & Related papers (2021-05-10T01:47:55Z) - Frame-rate Up-conversion Detection Based on Convolutional Neural Network
for Learning Spatiotemporal Features [7.895528973776606]
This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion.
FCDNet uses a stack of consecutive frames as the input and effectively learns artifacts using network blocks to learn features.
arXiv Detail & Related papers (2021-03-25T08:47:46Z) - End-to-End Learning for Video Frame Compression with Self-Attention [25.23586503813838]
We propose an end-to-end learned system for compressing video frames.
Our system learns deep embeddings of frames and encodes their difference in latent space.
In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality.
arXiv Detail & Related papers (2020-04-20T12:11:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.