LSCD: A Large-Scale Screen Content Dataset for Video Compression
- URL: http://arxiv.org/abs/2308.09332v1
- Date: Fri, 18 Aug 2023 06:27:35 GMT
- Title: LSCD: A Large-Scale Screen Content Dataset for Video Compression
- Authors: Yuhao Cheng, Siru Zhang, Yiqiang Yan, Rong Chen, Yun Zhang
- Abstract summary: We propose the Large-scale Screen Content dataset, which contains 714 source sequences.
We provide the analysis of the proposed dataset to show some features of screen content videos.
We also provide a benchmark containing the performance of both traditional and learning-based methods.
- Score: 5.857003653854907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimedia compression allows us to watch videos, see pictures and hear
sounds within a limited bandwidth, which helps the flourish of the internet.
During the past decades, multimedia compression has achieved great success
using hand-craft features and systems. With the development of artificial
intelligence and video compression, there emerges a lot of research work
related to using the neural network on the video compression task to get rid of
the complicated system. Not only producing the advanced algorithms, but
researchers also spread the compression to different content, such as User
Generated Content(UGC). With the rapid development of mobile devices, screen
content videos become an important part of multimedia data. In contrast, we
find community lacks a large-scale dataset for screen content video
compression, which impedes the fast development of the corresponding
learning-based algorithms. In order to fulfill this blank and accelerate the
research of this special type of videos, we propose the Large-scale Screen
Content Dataset(LSCD), which contains 714 source sequences. Meanwhile, we
provide the analysis of the proposed dataset to show some features of screen
content videos, which will help researchers have a better understanding of how
to explore new algorithms. Besides collecting and post-processing the data to
organize the dataset, we also provide a benchmark containing the performance of
both traditional codec and learning-based methods.
Related papers
- BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression [14.109939177281069]
BVI-CR contains 18 multi-view RGB-D captures and their corresponding textured polygonal meshes.
Each video sequence contains 10 views in 1080p resolution with durations between 10-15 seconds at 30FPS.
Results show the great potential of neural representation based methods in volumetric video compression.
arXiv Detail & Related papers (2024-11-17T23:22:48Z) - DMVC: Multi-Camera Video Compression Network aimed at Improving Deep Learning Accuracy [22.871591373774802]
We introduce a cutting-edge video compression framework tailored for the age of ubiquitous video data.
Unlike traditional compression methods that prioritize human visual perception, our innovative approach focuses on preserving semantic information critical for deep learning accuracy.
Based on a designed deep learning algorithms, it adeptly segregates essential information from redundancy, ensuring machine learning tasks are fed with data of the highest relevance.
arXiv Detail & Related papers (2024-10-24T03:29:57Z) - Learned Compression for Images and Point Clouds [1.7404865362620803]
This thesis provides three primary contributions to this new field of learned compression.
First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information.
Secondly, we propose a novel lightweight low-complexity point cloud that is highly specialized for classification, attaining significant reductions in compared to non-specialized codecs.
arXiv Detail & Related papers (2024-09-12T19:57:44Z) - EasyVolcap: Accelerating Neural Volumetric Video Research [69.59671164891725]
Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations.
EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
arXiv Detail & Related papers (2023-12-11T17:59:46Z) - InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
and Generation [90.71796406228265]
InternVid is a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations.
The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words.
arXiv Detail & Related papers (2023-07-13T17:58:32Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - A Feature-space Multimodal Data Augmentation Technique for Text-video
Retrieval [16.548016892117083]
Text-video retrieval methods have received increased attention over the past few years.
Data augmentation techniques were introduced to increase the performance on unseen test examples.
We propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions by mixing semantically similar samples.
arXiv Detail & Related papers (2022-08-03T14:05:20Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.