FSVVD: A Dataset of Full Scene Volumetric Video
- URL: http://arxiv.org/abs/2303.03599v2
- Date: Mon, 17 Apr 2023 08:50:55 GMT
- Title: FSVVD: A Dataset of Full Scene Volumetric Video
- Authors: Kaiyuan Hu, Yili Jin, Haowen Yang, Junhua Liu, Fangxin Wang
- Abstract summary: In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a full-scene volumetric video dataset.
Comprehensive dataset description and analysis are conducted, with potential usage of this dataset.
- Score: 2.9151420469958533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed a rapid development of immersive multimedia which
bridges the gap between the real world and virtual space. Volumetric videos, as
an emerging representative 3D video paradigm that empowers extended reality,
stand out to provide unprecedented immersive and interactive video watching
experience. Despite the tremendous potential, the research towards 3D
volumetric video is still in its infancy, relying on sufficient and complete
datasets for further exploration. However, existing related volumetric video
datasets mostly only include a single object, lacking details about the scene
and the interaction between them. In this paper, we focus on the current most
widely used data format, point cloud, and for the first time release a
full-scene volumetric video dataset that includes multiple people and their
daily activities interacting with the external environments. Comprehensive
dataset description and analysis are conducted, with potential usage of this
dataset. The dataset and additional tools can be accessed via the following
website: https://cuhksz-inml.github.io/full_scene_volumetric_video_dataset/.
Related papers
- Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI.
We introduce an expertly curated dataset in the Universal Scene Description (USD) format featuring high-quality manual annotations.
With its broad and high-quality annotations, the data provides the basis for holistic 3D scene understanding models.
arXiv Detail & Related papers (2024-12-02T11:33:55Z) - CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding.
Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects.
We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z) - Panonut360: A Head and Eye Tracking Dataset for Panoramic Video [0.0]
We present a head and eye tracking dataset involving 50 users watching 15 panoramic videos.
The dataset provides details on the viewport and gaze attention locations of users.
Our analysis reveals a consistent downward offset in gaze fixations relative to the Field of View.
arXiv Detail & Related papers (2024-03-26T13:54:52Z) - EasyVolcap: Accelerating Neural Volumetric Video Research [69.59671164891725]
Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations.
EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
arXiv Detail & Related papers (2023-12-11T17:59:46Z) - MAD: A Scalable Dataset for Language Grounding in Videos from Movie
Audio Descriptions [109.84031235538002]
We present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations.
MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets.
arXiv Detail & Related papers (2021-12-01T11:47:09Z) - Spoken Moments: Learning Joint Audio-Visual Representations from Video
Descriptions [75.77044856100349]
We present the Spoken Moments dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events.
We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.
arXiv Detail & Related papers (2021-05-10T16:30:46Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z) - Learning Disentangled Representations of Video with Missing Data [17.34839550557689]
We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data.
Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object.
On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin.
arXiv Detail & Related papers (2020-06-23T23:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.