CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes
- URL: http://arxiv.org/abs/2409.05166v4
- Date: Wed, 23 Oct 2024 02:10:29 GMT
- Title: CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes
- Authors: Zhenhuan Liu, Shuai Liu, Zhiwei Ning, Jie Yang, Wei Liu,
- Abstract summary: We propose continual dynamic neural graphics primitives (CD-NGP) for view synthesis.
Our approach synergizes features from both temporal and spatial hash encodings to achieve high rendering quality.
We introduce a novel dataset comprising multi-view, exceptionally long video sequences with substantial rigid and non-rigid motion.
- Score: 9.217592165862762
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current methodologies for novel view synthesis (NVS) in dynamic scenes encounter significant challenges in harmonizing memory consumption, model complexity, training efficiency, and rendering fidelity. Existing offline techniques, while delivering high-quality results, are often characterized by substantial memory demands and limited scalability. In contrast, online methods grapple with the challenge of balancing rapid convergence with model compactness. To address these issues, we propose continual dynamic neural graphics primitives (CD-NGP). Our approach synergizes features from both temporal and spatial hash encodings to achieve high rendering quality, employs parameter reuse to enhance scalability, and leverages a continual learning framework to mitigate memory overhead. Furthermore, we introduce a novel dataset comprising multi-view, exceptionally long video sequences with substantial rigid and non-rigid motion, thereby substantiating the scalability of our method.
Related papers
- Low Resource Video Super-resolution using Memory and Residual Deformable Convolutions [3.018928786249079]
Transformer-based video super-resolution (VSR) models have set new benchmarks in recent years, but their substantial computational demands make most of them unsuitable for deployment on resource-constrained devices.
We propose a novel lightweight, parameter-efficient deep residual deformable convolution network for VSR.
With just 2.3 million parameters, our model achieves state-of-the-art SSIM of 0.9175 on the REDS4 dataset.
arXiv Detail & Related papers (2025-02-03T20:46:15Z) - SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis [52.050036778325094]
We introduce SALOVA: Segment-Augmented Video Assistant, a novel video-LLM framework designed to enhance the comprehension of lengthy video content.
We present a high-quality collection of 87.8K long videos, each densely captioned at the segment level to enable models to capture scene continuity and maintain rich context.
Our framework mitigates the limitations of current video-LMMs by allowing for precise identification and retrieval of relevant video segments in response to queries.
arXiv Detail & Related papers (2024-11-25T08:04:47Z) - Multimodal Instruction Tuning with Hybrid State Space Models [25.921044010033267]
Long context is crucial for enhancing the recognition and understanding capabilities of multimodal large language models.
We propose a novel approach using a hybrid transformer-MAMBA model to efficiently handle long contexts in multimodal applications.
Our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models.
arXiv Detail & Related papers (2024-11-13T18:19:51Z) - Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction [3.9363268745580426]
AT-GS is a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization.
We reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames.
Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis.
arXiv Detail & Related papers (2024-11-10T21:30:16Z) - Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design [18.57172631588624]
We propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one.
Our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone.
arXiv Detail & Related papers (2024-07-03T05:17:26Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames [57.758863967770594]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation [55.36617538438858]
We propose a novel approach that strengthens the interaction between spatial and temporal perceptions.
We curate a large-scale and open-source video dataset called HD-VG-130M.
arXiv Detail & Related papers (2023-05-18T11:06:15Z) - Fast Non-Rigid Radiance Fields from Monocularized Data [66.74229489512683]
This paper proposes a new method for full 360deg inward-facing novel view synthesis of non-rigidly deforming scenes.
At the core of our method are 1) An efficient deformation module that decouples the processing of spatial and temporal information for accelerated training and inference; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field.
In both cases, our method is significantly faster than previous methods, converging in less than 7 minutes and achieving real-time framerates at 1K resolution, while obtaining a higher visual accuracy for generated novel views.
arXiv Detail & Related papers (2022-12-02T18:51:10Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.