Spatial Decomposition and Temporal Fusion based Inter Prediction for
Learned Video Compression
- URL: http://arxiv.org/abs/2401.15864v1
- Date: Mon, 29 Jan 2024 03:30:21 GMT
- Title: Spatial Decomposition and Temporal Fusion based Inter Prediction for
Learned Video Compression
- Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li
- Abstract summary: We propose a spatial decomposition and temporal fusion based inter prediction for learned video compression.
With the SDD-based motion model and long short-term temporal fusion, our proposed learned video can obtain more accurate inter prediction contexts.
- Score: 59.632286735304156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video compression performance is closely related to the accuracy of inter
prediction. It tends to be difficult to obtain accurate inter prediction for
the local video regions with inconsistent motion and occlusion. Traditional
video coding standards propose various technologies to handle motion
inconsistency and occlusion, such as recursive partitions, geometric
partitions, and long-term references. However, existing learned video
compression schemes focus on obtaining an overall minimized prediction error
averaged over all regions while ignoring the motion inconsistency and occlusion
in local regions. In this paper, we propose a spatial decomposition and
temporal fusion based inter prediction for learned video compression. To handle
motion inconsistency, we propose to decompose the video into structure and
detail (SDD) components first. Then we perform SDD-based motion estimation and
SDD-based temporal context mining for the structure and detail components to
generate short-term temporal contexts. To handle occlusion, we propose to
propagate long-term temporal contexts by recurrently accumulating the temporal
information of each historical reference feature and fuse them with short-term
temporal contexts. With the SDD-based motion model and long short-term temporal
contexts fusion, our proposed learned video codec can obtain more accurate
inter prediction. Comprehensive experimental results demonstrate that our codec
outperforms the reference software of H.266/VVC on all common test datasets for
both PSNR and MS-SSIM.
Related papers
- Disentangle and denoise: Tackling context misalignment for video moment retrieval [16.939535169282262]
Video Moment Retrieval aims to locate in-context video moments according to a natural language query.
This paper proposes a cross-modal Context Denoising Network (CDNet) for accurate moment retrieval.
arXiv Detail & Related papers (2024-08-14T15:00:27Z) - Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - Video Diffusion Models with Local-Global Context Guidance [17.040535240422088]
We propose a Local-Global Context guided Video Diffusion model (LGC-VD) to capture multi-perception conditions for producing high-quality videos.
Our experiments demonstrate that the proposed method achieves favorable performance on video prediction, unconditional inference, and video generation.
arXiv Detail & Related papers (2023-06-05T03:32:27Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Temporally Consistent Transformers for Video Generation [80.45230642225913]
To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world.
No established benchmarks on complex data exist for rigorously evaluating video generation with long temporal dependencies.
We introduce the Temporally Consistent Transformer (TECO), a generative model that substantially improves long-term consistency while also reducing sampling time.
arXiv Detail & Related papers (2022-10-05T17:15:10Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.