Optimal Video Compression using Pixel Shift Tracking
- URL: http://arxiv.org/abs/2406.19630v1
- Date: Fri, 28 Jun 2024 03:36:38 GMT
- Title: Optimal Video Compression using Pixel Shift Tracking
- Authors: Hitesh Saai Mananchery Panneerselvam, Smit Anand,
- Abstract summary: This paper introduces the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression.
We call this method Redundancy Removal using Shift (Rtextsuperscript2S)
In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Video comprises approximately ~85\% of all internet traffic, but video encoding/compression is being historically done with hard coded rules, which has worked well but only to a certain limit. We have seen a surge in video compression algorithms using ML-based models in the last few years and many of them have outperformed several legacy codecs. The models range from encoding video end to end using an ML approach or replacing some intermediate steps in legacy codecs using ML models to increase the efficiency of those steps. Optimizing video storage is an essential aspect of video processing, so we are proposing one of the possible approaches to achieve it is by avoiding redundant data at each frame. In this paper, we want to introduce the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression. We call this method Redundancy Removal using Shift (R\textsuperscript2S). This method can be utilized across various Machine Learning model algorithms, and make the compression more accessible and adaptable. In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage.
Related papers
- CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer [55.515836117658985]
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer.
It can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.
arXiv Detail & Related papers (2024-08-12T11:47:11Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Predictive Coding For Animation-Based Video Compression [13.161311799049978]
We propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame.
Our experiments indicate a significant gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC.
arXiv Detail & Related papers (2023-07-09T14:40:54Z) - Rate-Perception Optimized Preprocessing for Video Coding [15.808458228130261]
We propose a rate-perception optimized preprocessing (RPP) method to improve the rate-distortion performance.
Our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding.
In our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the to compress these videos with RPP save about 12%.
arXiv Detail & Related papers (2023-01-25T08:21:52Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Compressed Vision for Efficient Video Understanding [83.97689018324732]
We propose a framework enabling research on hour-long videos with the same hardware that can now process second-long videos.
We replace standard video compression, e.g. JPEG, with neural compression and show that we can directly feed compressed videos as inputs to regular video networks.
arXiv Detail & Related papers (2022-10-06T15:35:49Z) - Speeding Up Action Recognition Using Dynamic Accumulation of Residuals
in Compressed Domain [2.062593640149623]
Temporal redundancy and the sheer size of raw videos are the two most common problematic issues related to video processing algorithms.
This paper presents an approach for using residual data, available in compressed videos directly, which can be obtained by a light partially decoding procedure.
Applying neural networks exclusively for accumulated residuals in the compressed domain accelerates performance, while the classification results are highly competitive with raw video approaches.
arXiv Detail & Related papers (2022-09-29T13:08:49Z) - Microdosing: Knowledge Distillation for GAN based Compression [18.140328230701233]
We show how to leverage knowledge distillation to obtain equally capable image decoders at a fraction of the original number of parameters.
This allows us to reduce the model size by a factor of 20 and to achieve 50% reduction in decoding time.
arXiv Detail & Related papers (2022-01-07T14:27:16Z) - ELF-VC: Efficient Learned Flexible-Rate Video Coding [61.10102916737163]
We propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode.
We benchmark our method, which we call ELF-VC, on popular video test sets UVG and MCL-JCV.
Our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures.
arXiv Detail & Related papers (2021-04-29T17:50:35Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.