Related papers: HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling

HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling

URL: http://arxiv.org/abs/2301.02238v2
Date: Mon, 29 May 2023 18:35:21 GMT
Title: HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling
Authors: Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollhoefer, Johannes Kopf, Matthew O'Toole, Changil Kim
Abstract summary: We present HyperReel -- a novel 6-DoF video representation. The two core components of HyperReel are: (1) a ray-conditioned sample prediction network that enables high-fidelity, high frame rate rendering at high resolutions and (2) a compact and memory-efficient dynamic volume representation.
Score: 60.90470761333465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Volumetric scene representations enable photorealistic view synthesis for static scenes and form the basis of several existing 6-DoF video techniques. However, the volume rendering procedures that drive these representations necessitate careful trade-offs in terms of quality, rendering speed, and memory efficiency. In particular, existing methods fail to simultaneously achieve real-time performance, small memory footprint, and high-quality rendering for challenging real-world scenes. To address these issues, we present HyperReel -- a novel 6-DoF video representation. The two core components of HyperReel are: (1) a ray-conditioned sample prediction network that enables high-fidelity, high frame rate rendering at high resolutions and (2) a compact and memory-efficient dynamic volume representation. Our 6-DoF video pipeline achieves the best performance compared to prior and contemporary approaches in terms of visual quality with small memory requirements, while also rendering at up to 18 frames-per-second at megapixel resolution without any custom CUDA code.

Related papers

Representing Long Volumetric Video with Temporal Gaussian Hierarchy [80.51373034419379]
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. We propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos. This work is the first approach capable of efficiently handling minutes of volumetric video data while maintaining state-of-the-art rendering quality.
arXiv Detail & Related papers (2024-12-12T18:59:34Z)
Elevating Flow-Guided Video Inpainting with Reference Generation [50.03502211226332]
Video inpainting (VI) is a challenging task that requires effective propagation of observable content across frames while simultaneously generating new content not present in the original video. We propose a robust and practical VI framework that leverages a large generative model for reference generation in combination with an advanced pixel propagation algorithm. Our method not only significantly enhances frame-level quality for object removal but also synthesizes new content in the missing areas based on user-provided text prompts.
arXiv Detail & Related papers (2024-12-12T06:13:00Z)
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference [41.505829393818274]
Current video diffusion models exhibit demanding computational requirements and high peak memory usage. We present Streamlined Inference, which leverages the temporal and spatial properties of video diffusion models. Our approach significantly reduces peak memory and computational overhead, making it feasible to generate high-quality videos on a single consumer GPU.
arXiv Detail & Related papers (2024-11-02T07:52:18Z)
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction [59.40711222096875]
We present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets.
arXiv Detail & Related papers (2024-02-27T11:40:50Z)
Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields [42.926554334378984]
High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications. We propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing.
arXiv Detail & Related papers (2024-02-26T14:29:13Z)
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams [56.00479598817949]
VideoRF is the first approach to enable real-time streaming and rendering of dynamic radiance fields on mobile platforms. We show that the feature image stream can be efficiently compressed by 2D video codecs. We have developed a real-time interactive player that enables online streaming and rendering of dynamic scenes.
arXiv Detail & Related papers (2023-12-03T14:14:35Z)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos. Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z)
SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory [20.798605661240355]
We propose a new way to speed up rendering using 2D neural networks. A low-resolution feature map is rendered first by volume rendering, then a lightweight 2D neural is applied to generate the image at target resolution. We show that the proposed method can achieve competitive rendering quality while reducing the rendering time with little memory overhead, enabling 30FPS at 1080P image resolution with a low memory footprint.
arXiv Detail & Related papers (2022-12-15T00:02:36Z)
NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z)
Super-Resolution Appearance Transfer for 4D Human Performances [29.361342747786164]
A common problem in the 4D reconstruction of people from multi-view video is the quality of the captured dynamic texture appearance. We propose a solution through super-resolution appearance transfer from a static high-resolution appearance capture rig.
arXiv Detail & Related papers (2021-08-31T10:53:11Z)
Deep Slow Motion Video Reconstruction with Hybrid Imaging System [12.340049542098148]
Current techniques increase the frame rate of standard videos through frame by assuming linear object motion which is not valid in challenging cases. We propose a two-stage deep learning system consisting of alignment and appearance estimation. We train our model on synthetically generated hybrid videos and show high-quality results on a variety of test scenes.
arXiv Detail & Related papers (2020-02-27T14:18:12Z)
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR) temporalsynthesis and spatial super-resolution are intra-related in this task. We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.