EasyVolcap: Accelerating Neural Volumetric Video Research
- URL: http://arxiv.org/abs/2312.06575v1
- Date: Mon, 11 Dec 2023 17:59:46 GMT
- Title: EasyVolcap: Accelerating Neural Volumetric Video Research
- Authors: Zhen Xu, Tao Xie, Sida Peng, Haotong Lin, Qing Shuai, Zhiyuan Yu,
Guangzhao He, Jiaming Sun, Hujun Bao, Xiaowei Zhou
- Abstract summary: Volumetric video is a technology that digitally records dynamic events such as artistic performances, sporting events, and remote conversations.
EasyVolcap is a Python & Pytorch library for unifying the process of multi-view data processing, 4D scene reconstruction, and efficient dynamic volumetric video rendering.
- Score: 69.59671164891725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Volumetric video is a technology that digitally records dynamic events such
as artistic performances, sporting events, and remote conversations. When
acquired, such volumography can be viewed from any viewpoint and timestamp on
flat screens, 3D displays, or VR headsets, enabling immersive viewing
experiences and more flexible content creation in a variety of applications
such as sports broadcasting, video conferencing, gaming, and movie productions.
With the recent advances and fast-growing interest in neural scene
representations for volumetric video, there is an urgent need for a unified
open-source library to streamline the process of volumetric video capturing,
reconstruction, and rendering for both researchers and non-professional users
to develop various algorithms and applications of this emerging technology. In
this paper, we present EasyVolcap, a Python & Pytorch library for accelerating
neural volumetric video research with the goal of unifying the process of
multi-view data processing, 4D scene reconstruction, and efficient dynamic
volumetric video rendering. Our source code is available at
https://github.com/zju3dv/EasyVolcap.
Related papers
- Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline.
Our model does not require depth as input, and does not explicitly model 3D scene geometry.
We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision.
Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats.
In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Playable Environments: Video Manipulation in Space and Time [98.0621309257937]
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.
With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions.
Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering.
arXiv Detail & Related papers (2022-03-03T18:51:05Z) - Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio.
Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z) - Human action recognition with a large-scale brain-inspired photonic
computer [0.774229787612056]
Recognition of human actions in video streams is a challenging task in computer vision.
Deep learning has shown remarkable results recently, but can be found hard to use in practice.
We propose a scalable photonic neuro-inspired architecture, capable of recognising video-based human actions with state-of-the-art accuracy.
arXiv Detail & Related papers (2020-04-06T10:39:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.