ViVo: A Dataset for Volumetric Video Reconstruction and Compression
- URL: http://arxiv.org/abs/2506.00558v2
- Date: Mon, 09 Jun 2025 16:40:15 GMT
- Title: ViVo: A Dataset for Volumetric Video Reconstruction and Compression
- Authors: Adrian Azzarelli, Ge Gao, Ho Man Kwan, Fan Zhang, Nantheera Anantrasirichai, Ollie Moolan-Feroze, David Bull,
- Abstract summary: We propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression.<n>The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity.<n>To demonstrate the use of this database, we have benchmarked three state-of-the-art 3-D reconstruction methods and two volumetric video compression algorithms.
- Score: 13.827241444266308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As research on neural volumetric video reconstruction and compression flourishes, there is a need for diverse and realistic datasets, which can be used to develop and validate reconstruction and compression models. However, existing volumetric video datasets lack diverse content in terms of both semantic and low-level features that are commonly present in real-world production pipelines. In this context, we propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression. The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity to include both human-centric characteristics (skin, hair, etc.) and dynamic visual phenomena (transparent, reflective, liquid, etc.). Each video sequence in this database contains raw data including fourteen multi-view RGB and depth video pairs, synchronized at 30FPS with per-frame calibration and audio data, and their associated 2-D foreground masks and 3-D point clouds. To demonstrate the use of this database, we have benchmarked three state-of-the-art (SotA) 3-D reconstruction methods and two volumetric video compression algorithms. The obtained results evidence the challenging nature of the proposed dataset and the limitations of existing datasets for both volumetric video reconstruction and compression tasks, highlighting the need to develop more effective algorithms for these applications. The database and the associated results are available at https://vivo-bvicr.github.io/
Related papers
- CI-VID: A Coherent Interleaved Text-Video Dataset [23.93099552431937]
CI-VID is a dataset that moves beyond isolated text-to-video (T2V) generation toward text-and-video-to-video (TV2V) generation.<n>It contains over 340,000 samples, each featuring a coherent sequence of video clips with text captions.<n>We show that models trained on CI-VID exhibit significant improvements in both accuracy and content consistency when generating video sequences.
arXiv Detail & Related papers (2025-07-02T17:48:01Z) - UVG-VPC: Voxelized Point Cloud Dataset for Visual Volumetric Video-based Coding [42.999580283729614]
This paper presents a new open dataset called UVG-VPC for the development, evaluation, and validation of MPEG Visual Volumetric Video-based Coding (V3C) technology.<n>The dataset is distributed under its own non-commercial license.
arXiv Detail & Related papers (2025-04-08T10:27:53Z) - Unified Dense Prediction of Video Diffusion [91.16237431830417]
We present a unified network for simultaneously generating videos and their corresponding entity segmentation and depth maps from text prompts.<n>We utilize colormap to represent entity masks and depth maps, tightly integrating dense prediction with RGB video generation.
arXiv Detail & Related papers (2025-03-12T12:41:02Z) - Video Set Distillation: Information Diversification and Temporal Densification [68.85010825225528]
Video textbfsets have two dimensions of redundancies: within-sample and inter-sample redundancies.<n>We are the first to study Video Set Distillation, which synthesizes optimized video data by addressing within-sample and inter-sample redundancies.
arXiv Detail & Related papers (2024-11-28T05:37:54Z) - BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression [14.109939177281069]
BVI-CR contains 18 multi-view RGB-D captures and their corresponding textured polygonal meshes.
Each video sequence contains 10 views in 1080p resolution with durations between 10-15 seconds at 30FPS.
Results show the great potential of neural representation based methods in volumetric video compression.
arXiv Detail & Related papers (2024-11-17T23:22:48Z) - Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding [61.89781979702939]
This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets.
Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations.
We introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods.
arXiv Detail & Related papers (2024-09-29T03:33:35Z) - Implicit-explicit Integrated Representations for Multi-view Video
Compression [40.86402535896703]
We propose an implicit-explicit integrated representation for multi-view video compression.
The proposed framework combines the strengths of both implicit neural representation and explicit 2D datasets.
Our proposed framework can achieve comparable or even superior performance to the latest multi-view video compression standard MIV.
arXiv Detail & Related papers (2023-11-29T04:15:57Z) - NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
Video depth estimation aims to infer temporally consistent depth.
We introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner.
We also elaborate a large-scale Video Depth in the Wild dataset, which contains 14,203 videos with over two million frames.
arXiv Detail & Related papers (2023-07-17T17:57:01Z) - FSVVD: A Dataset of Full Scene Volumetric Video [2.9151420469958533]
In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a full-scene volumetric video dataset.
Comprehensive dataset description and analysis are conducted, with potential usage of this dataset.
arXiv Detail & Related papers (2023-03-07T02:31:08Z) - Multi-Attention Network for Compressed Video Referring Object
Segmentation [103.18477550023513]
Referring video object segmentation aims to segment the object referred by a given language expression.
Existing works typically require compressed video bitstream to be decoded to RGB frames before being segmented.
This may hamper its application in real-world computing resource limited scenarios, such as autonomous cars and drones.
arXiv Detail & Related papers (2022-07-26T03:00:52Z) - PVDD: A Practical Video Denoising Dataset with Real-World Dynamic Scenes [56.4361151691284]
"Practical Video Denoising dataset" (PVDD) contains 200 noisy-clean dynamic video pairs in both sRGB and RAW format.
Compared with existing datasets consisting of limited motion information,PVDD covers dynamic scenes with varying natural motion.
arXiv Detail & Related papers (2022-07-04T12:30:22Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.