Related papers: EasyVolcap: Accelerating Neural Volumetric Video Research

EasyVolcap: Accelerating Neural Volumetric Video Research

URL: http://arxiv.org/abs/2312.06575v1
Date: Mon, 11 Dec 2023 17:59:46 GMT
Title: EasyVolcap: Accelerating Neural Volumetric Video Research
Authors: Zhen Xu, Tao Xie, Sida Peng, Haotong Lin, Qing Shuai, Zhiyuan Yu, Guangzhao He, Jiaming Sun, Hujun Bao, Xiaowei Zhou
Abstract summary: Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations. EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
Score: 69.59671164891725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations. When acquired, such volumography can be viewed from any viewpoint and timestamp on flat screens, 3D displays, or VR headsets, enabling immersive viewing experiences and more flexible content creation in a variety of applications such as sports broadcasting, video conferencing, gaming, and movie productions. With the recent advances and fast-growing interest in neural scene representations for volumetric video, there is an urgent need for a unified open-source library to streamline the process of volumetric video capturing, reconstruction, and rendering for both researchers and non-professional users to develop various algorithms and applications of this emerging technology. In this paper, we present EasyVolcap, a Python & Pytorch library for accelerating neural volumetric video research with the goal of unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering. Our source code is available at https://github.com/zju3dv/EasyVolcap.

Related papers

Transforming faces into video stories -- VideoFace2.0 [0.0]
VideoFace2.0 is the name of the developed system for spatial and temporal localization of each unique face in the input video.<n>The proposed algorithm brings relative gain in the reduction of number of false identities in the range of 73%-93%.<n>The system is envisioned as a compact and modular extensions of the existing video production equipment.
arXiv Detail & Related papers (2025-05-04T10:36:58Z)
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement [34.450247091615395]
Next frontier in VR/AR technologies lies in immersive volumetric videos with complete scene capture, large 6-DoF interaction space, multi-modal feedback, and high resolution & frame-rate contents. We introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios. Our capture rig supports multi-view video-audio capture while on the move, significantly enhancing the completeness, flexibility, and efficiency of data capture.
arXiv Detail & Related papers (2025-03-18T15:42:22Z)
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions [0.562479170374811]
This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text. Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations.
arXiv Detail & Related papers (2025-01-02T09:21:03Z)
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline. Our model does not require depth as input, and does not explicitly model 3D scene geometry. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z)
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video. In this paper, we address such limitations in video pre-training with an efficient video decomposition. Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z)
Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision. Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats. In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z)
DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
Playable Environments: Video Manipulation in Space and Time [98.0621309257937]
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering.
arXiv Detail & Related papers (2022-03-03T18:51:05Z)
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z)
A Good Image Generator Is What You Need for High-Resolution Video Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
Human action recognition with a large-scale brain-inspired photonic computer [0.774229787612056]
Recognition of human actions in video streams is a challenging task in computer vision. Deep learning has shown remarkable results recently, but can be found hard to use in practice. We propose a scalable photonic neuro-inspired architecture, capable of recognising video-based human actions with state-of-the-art accuracy.
arXiv Detail & Related papers (2020-04-06T10:39:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.