Related papers: FSVVD: A Dataset of Full Scene Volumetric Video

FSVVD: A Dataset of Full Scene Volumetric Video

URL: http://arxiv.org/abs/2303.03599v2
Date: Mon, 17 Apr 2023 08:50:55 GMT
Title: FSVVD: A Dataset of Full Scene Volumetric Video
Authors: Kaiyuan Hu, Yili Jin, Haowen Yang, Junhua Liu, Fangxin Wang
Abstract summary: In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a full-scene volumetric video dataset. Comprehensive dataset description and analysis are conducted, with potential usage of this dataset.
Score: 2.9151420469958533
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent years have witnessed a rapid development of immersive multimedia which bridges the gap between the real world and virtual space. Volumetric videos, as an emerging representative 3D video paradigm that empowers extended reality, stand out to provide unprecedented immersive and interactive video watching experience. Despite the tremendous potential, the research towards 3D volumetric video is still in its infancy, relying on sufficient and complete datasets for further exploration. However, existing related volumetric video datasets mostly only include a single object, lacking details about the scene and the interaction between them. In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a full-scene volumetric video dataset that includes multiple people and their daily activities interacting with the external environments. Comprehensive dataset description and analysis are conducted, with potential usage of this dataset. The dataset and additional tools can be accessed via the following website: https://cuhksz-inml.github.io/full_scene_volumetric_video_dataset/.

Related papers

ViVo: A Dataset for Volumetric Video Reconstruction and Compression [13.827241444266308]
We propose a new dataset, ViVo, for VolumetrIc VideO reconstruction and compression.<n>The dataset is faithful to real-world volumetric video production and is the first dataset to extend the definition of diversity.<n>To demonstrate the use of this database, we have benchmarked three state-of-the-art 3-D reconstruction methods and two volumetric video compression algorithms.
arXiv Detail & Related papers (2025-05-31T13:30:21Z)
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement [34.450247091615395]
Next frontier in VR/AR technologies lies in immersive volumetric videos with complete scene capture, large 6-DoF interaction space, multi-modal feedback, and high resolution & frame-rate contents. We introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios. Our capture rig supports multi-view video-audio capture while on the move, significantly enhancing the completeness, flexibility, and efficiency of data capture.
arXiv Detail & Related papers (2025-03-18T15:42:22Z)
Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI. We introduce an expertly curated dataset in the Universal Scene Description (USD) format featuring high-quality manual annotations. With its broad and high-quality annotations, the data provides the basis for holistic 3D scene understanding models.
arXiv Detail & Related papers (2024-12-02T11:33:55Z)
CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding. Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects. We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z)
Panonut360: A Head and Eye Tracking Dataset for Panoramic Video [0.0]
We present a head and eye tracking dataset involving 50 users watching 15 panoramic videos. The dataset provides details on the viewport and gaze attention locations of users. Our analysis reveals a consistent downward offset in gaze fixations relative to the Field of View.
arXiv Detail & Related papers (2024-03-26T13:54:52Z)
EasyVolcap: Accelerating Neural Volumetric Video Research [69.59671164891725]
Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations. EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
arXiv Detail & Related papers (2023-12-11T17:59:46Z)
VEATIC: Video-based Emotion and Affect Tracking in Context Dataset [34.77364955121413]
We introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context dataset (VEATIC) VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame.
arXiv Detail & Related papers (2023-09-13T06:31:35Z)
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions [109.84031235538002]
We present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets.
arXiv Detail & Related papers (2021-12-01T11:47:09Z)
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions [75.77044856100349]
We present the Spoken Moments dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events. We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.
arXiv Detail & Related papers (2021-05-10T16:30:46Z)
QuerYD: A video dataset with high-quality text and audio narrations [85.6468286746623]
We introduce QuerYD, a new large-scale dataset for retrieval and event localisation in video. A unique feature of our dataset is the availability of two audio tracks for each video: the original audio, and a high-quality spoken description. The dataset is based on YouDescribe, a volunteer project that assists visually-impaired people by attaching voiced narrations to existing YouTube videos.
arXiv Detail & Related papers (2020-11-22T17:33:44Z)
The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose. We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z)
Learning Disentangled Representations of Video with Missing Data [17.34839550557689]
We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin.
arXiv Detail & Related papers (2020-06-23T23:54:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.