T-Code: Simple Temporal Latent Code for Efficient Dynamic View Synthesis
- URL: http://arxiv.org/abs/2312.11015v1
- Date: Mon, 18 Dec 2023 08:31:40 GMT
- Title: T-Code: Simple Temporal Latent Code for Efficient Dynamic View Synthesis
- Authors: Zhenhuan Liu, Shuai Liu, Jie Yang, Wei Liu
- Abstract summary: This paper presents T-Code, the efficient decoupled latent code for the time dimension only.
We propose our highly compact hybrid neural graphics primitives (HybridNGP) for multi-camera setting and neural graphics primitives with T-Code (DNGP-T) for monocular scenario.
- Score: 10.80308375955974
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Novel view synthesis for dynamic scenes is one of the spotlights in computer
vision. The key to efficient dynamic view synthesis is to find a compact
representation to store the information across time. Though existing methods
achieve fast dynamic view synthesis by tensor decomposition or hash grid
feature concatenation, their mixed representations ignore the structural
difference between time domain and spatial domain, resulting in sub-optimal
computation and storage cost. This paper presents T-Code, the efficient
decoupled latent code for the time dimension only. The decomposed feature
design enables customizing modules to cater for different scenarios with
individual specialty and yielding desired results at lower cost. Based on
T-Code, we propose our highly compact hybrid neural graphics primitives
(HybridNGP) for multi-camera setting and deformation neural graphics primitives
with T-Code (DNGP-T) for monocular scenario. Experiments show that HybridNGP
delivers high fidelity results at top processing speed with much less storage
consumption, while DNGP-T achieves state-of-the-art quality and high training
speed for monocular reconstruction.
Related papers
- D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis [31.90503003079933]
We introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks.
Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics.
arXiv Detail & Related papers (2024-02-27T09:56:15Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Fast Non-Rigid Radiance Fields from Monocularized Data [66.74229489512683]
This paper proposes a new method for full 360deg inward-facing novel view synthesis of non-rigidly deforming scenes.
At the core of our method are 1) An efficient deformation module that decouples the processing of spatial and temporal information for accelerated training and inference; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field.
In both cases, our method is significantly faster than previous methods, converging in less than 7 minutes and achieving real-time framerates at 1K resolution, while obtaining a higher visual accuracy for generated novel views.
arXiv Detail & Related papers (2022-12-02T18:51:10Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network [21.432274505770394]
We propose an exemplar-based visual pattern synthesis framework that aims to model inner statistics of visual patterns and generate new, versatile patterns.
An implicit network based on generative adversarial network (GAN) and periodic encoding, thus calling our network the Implicit Periodic Network (IPFN)
arXiv Detail & Related papers (2022-04-04T17:36:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.