Sampling Based Scene-Space Video Processing
- URL: http://arxiv.org/abs/2102.03011v1
- Date: Fri, 5 Feb 2021 05:55:04 GMT
- Title: Sampling Based Scene-Space Video Processing
- Authors: Felix Klose and Oliver Wang and Jean-Charles Bazin and Marcus Magnor
and Alexander Sorkine-Hornung
- Abstract summary: We present a novel, sampling-based framework for processing video.
It enables high-quality scene-space video effects in the presence of inevitable errors in depth and camera pose estimation.
We present results for various casually captured, hand-held, moving, compressed, monocular videos.
- Score: 89.49726406622842
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many compelling video processing effects can be achieved if per-pixel depth
information and 3D camera calibrations are known. However, the success of such
methods is highly dependent on the accuracy of this "scene-space" information.
We present a novel, sampling-based framework for processing video that enables
high-quality scene-space video effects in the presence of inevitable errors in
depth and camera pose estimation. Instead of trying to improve the explicit 3D
scene representation, the key idea of our method is to exploit the high
redundancy of approximate scene information that arises due to most scene
points being visible multiple times across many frames of video. Based on this
observation, we propose a novel pixel gathering and filtering approach. The
gathering step is general and collects pixel samples in scene-space, while the
filtering step is application-specific and computes a desired output video from
the gathered sample sets. Our approach is easily parallelizable and has been
implemented on GPU, allowing us to take full advantage of large volumes of
video data and facilitating practical runtimes on HD video using a standard
desktop computer. Our generic scene-space formulation is able to
comprehensively describe a multitude of video processing applications such as
denoising, deblurring, super resolution, object removal, computational shutter
functions, and other scene-space camera effects. We present results for various
casually captured, hand-held, moving, compressed, monocular videos depicting
challenging scenes recorded in uncontrolled environments.
Related papers
- Scene Summarization: Clustering Scene Videos into Spatially Diverse
Frames [24.614476456145255]
We propose summarization as a new video-based scene understanding task.
It aims to summarize a long video walkthrough of a scene into a small set of frames that are spatially diverse in the scene.
Our solution is a two-stage self-supervised pipeline named SceneSum.
arXiv Detail & Related papers (2023-11-28T22:18:26Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Playable Environments: Video Manipulation in Space and Time [98.0621309257937]
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.
With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions.
Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering.
arXiv Detail & Related papers (2022-03-03T18:51:05Z) - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS.
The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes.
We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z) - Consistent Depth of Moving Objects in Video [52.72092264848864]
We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera.
We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction over the entire input video.
We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars) as well as camera motion.
arXiv Detail & Related papers (2021-08-02T20:53:18Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Real-time dense 3D Reconstruction from monocular video data captured by
low-cost UAVs [0.3867363075280543]
Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency.
In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor.
By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content.
arXiv Detail & Related papers (2021-04-21T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.