Related papers: LSCD: A Large-Scale Screen Content Dataset for Video Compression

LSCD: A Large-Scale Screen Content Dataset for Video Compression

URL: http://arxiv.org/abs/2308.09332v1
Date: Fri, 18 Aug 2023 06:27:35 GMT
Title: LSCD: A Large-Scale Screen Content Dataset for Video Compression
Authors: Yuhao Cheng, Siru Zhang, Yiqiang Yan, Rong Chen, Yun Zhang
Abstract summary: We propose the Large-scale Screen Content dataset, which contains 714 source sequences. We provide the analysis of the proposed dataset to show some features of screen content videos. We also provide a benchmark containing the performance of both traditional and learning-based methods.
Score: 5.857003653854907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimedia compression allows us to watch videos, see pictures and hear sounds within a limited bandwidth, which helps the flourish of the internet. During the past decades, multimedia compression has achieved great success using hand-craft features and systems. With the development of artificial intelligence and video compression, there emerges a lot of research work related to using the neural network on the video compression task to get rid of the complicated system. Not only producing the advanced algorithms, but researchers also spread the compression to different content, such as User Generated Content(UGC). With the rapid development of mobile devices, screen content videos become an important part of multimedia data. In contrast, we find community lacks a large-scale dataset for screen content video compression, which impedes the fast development of the corresponding learning-based algorithms. In order to fulfill this blank and accelerate the research of this special type of videos, we propose the Large-scale Screen Content Dataset(LSCD), which contains 714 source sequences. Meanwhile, we provide the analysis of the proposed dataset to show some features of screen content videos, which will help researchers have a better understanding of how to explore new algorithms. Besides collecting and post-processing the data to organize the dataset, we also provide a benchmark containing the performance of both traditional codec and learning-based methods.

Related papers

BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression [14.109939177281069]
BVI-CR contains 18 multi-view RGB-D captures and their corresponding textured polygonal meshes. Each video sequence contains 10 views in 1080p resolution with durations between 10-15 seconds at 30FPS. Results show the great potential of neural representation based methods in volumetric video compression.
arXiv Detail & Related papers (2024-11-17T23:22:48Z)
DMVC: Multi-Camera Video Compression Network aimed at Improving Deep Learning Accuracy [22.871591373774802]
We introduce a cutting-edge video compression framework tailored for the age of ubiquitous video data. Unlike traditional compression methods that prioritize human visual perception, our innovative approach focuses on preserving semantic information critical for deep learning accuracy. Based on a designed deep learning algorithms, it adeptly segregates essential information from redundancy, ensuring machine learning tasks are fed with data of the highest relevance.
arXiv Detail & Related papers (2024-10-24T03:29:57Z)
Learned Compression for Images and Point Clouds [1.7404865362620803]
This thesis provides three primary contributions to this new field of learned compression. First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information. Secondly, we propose a novel lightweight low-complexity point cloud that is highly specialized for classification, attaining significant reductions in compared to non-specialized codecs.
arXiv Detail & Related papers (2024-09-12T19:57:44Z)
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach [56.610806615527885]
A key challenge in text-video retrieval (TVR) is the information asymmetry between video and text. This paper introduces a data-centric framework to bridge this gap by enriching textual representations to better match the richness of video content. We propose a query selection mechanism that identifies the most relevant and diverse queries, reducing computational cost while improving accuracy.
arXiv Detail & Related papers (2024-08-14T01:24:09Z)
EasyVolcap: Accelerating Neural Volumetric Video Research [69.59671164891725]
Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations. EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
arXiv Detail & Related papers (2023-12-11T17:59:46Z)
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation [90.71796406228265]
InternVid is a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words.
arXiv Detail & Related papers (2023-07-13T17:58:32Z)
Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images. Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content. This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z)
A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval [16.548016892117083]
Text-video retrieval methods have received increased attention over the past few years. Data augmentation techniques were introduced to increase the performance on unseen test examples. We propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions by mixing semantically similar samples.
arXiv Detail & Related papers (2022-08-03T14:05:20Z)
Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ) HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos. Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z)
Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system. Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame. Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.