Related papers: Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

URL: http://arxiv.org/abs/2211.14310v3
Date: Tue, 13 Feb 2024 15:04:42 GMT
Title: Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments
Authors: Leif Van Holland, Patrick Stotko, Stefan Krumpen, Reinhard Klein, Michael Weinmann
Abstract summary: We aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities. Our system is able to achieve VR-based live-telepresence at close to real-time rates.
Score: 6.543101569579952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices.

Related papers

Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations [37.61183525419993]
We propose CDScene: Vision-based Robust Semantic Scene Completion via Capturing Dynamic Representations. We leverage a multimodal large-scale model to extract 2D explicit semantics and align them into 3D space. We exploit the characteristics of monocular and stereo depth to decouple scene information into dynamic and static features.
arXiv Detail & Related papers (2025-03-08T13:49:43Z)
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features [14.03066701768256]
We propose a dynamic-static feature decoupling module (DSFD) to enhance dynamic representations. We acquire decoupled features driven by dynamic features and current frame features. Along spatial axes, it adaptively selects similar information of dynamic regions. Our method achieves state-of-the-art (SOTA) results in video-to-4D.
arXiv Detail & Related papers (2025-02-12T13:08:35Z)
Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos [101.48581851337703]
We present BTimer, the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets.
arXiv Detail & Related papers (2024-12-04T18:15:06Z)
UrbanGS: Semantic-Guided Gaussian Splatting for Urban Scene Reconstruction [86.4386398262018]
UrbanGS uses 2D semantic maps and an existing dynamic Gaussian approach to distinguish static objects from the scene. For potentially dynamic objects, we aggregate temporal information using learnable time embeddings. Our approach outperforms state-of-the-art methods in reconstruction quality and efficiency.
arXiv Detail & Related papers (2024-12-04T16:59:49Z)
DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments [0.0]
We propose DENSER, a framework that significantly enhances the representation of dynamic objects. The proposed approach significantly outperforms state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2024-09-16T07:11:58Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z)
Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation [126.12940972028012]
We present HVC, a framework for self-supervised video object segmentation. HVC extracts pseudo-dynamic signals from static images, enabling an efficient and scalable VOS model. We propose a hybrid visual correspondence loss to learn joint static and dynamic consistency representations.
arXiv Detail & Related papers (2024-04-21T02:21:30Z)
Multi-Level Neural Scene Graphs for Dynamic Urban Environments [64.26401304233843]
We present a novel, decomposable radiance field approach for dynamic urban environments. We propose a multi-level neural scene graph representation that scales to thousands of images from dozens of sequences with hundreds of fast-moving objects.
arXiv Detail & Related papers (2024-03-29T21:52:01Z)
RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes [3.1224202646855903]
We present RoDUS, a pipeline for decomposing static and dynamic elements in urban scenes. Our approach utilizes a robust kernel-based initialization coupled with 4D semantic information to selectively guide the learning process. Notably, experimental evaluations on KITTI-360 and Pandaset datasets demonstrate the effectiveness of our method in decomposing challenging urban scenes into precise static and dynamic components.
arXiv Detail & Related papers (2024-03-14T14:08:59Z)
DEMOS: Dynamic Environment Motion Synthesis in 3D Scenes via Local Spherical-BEV Perception [54.02566476357383]
We propose the first Dynamic Environment MOtion Synthesis framework (DEMOS) to predict future motion instantly according to the current scene. We then use it to dynamically update the latent motion for final motion synthesis. The results show our method outperforms previous works significantly and has great performance in handling dynamic environments.
arXiv Detail & Related papers (2024-03-04T05:38:16Z)
DytanVO: Joint Refinement of Visual Odometry and Motion Segmentation in Dynamic Environments [6.5121327691369615]
We present DytanVO, the first supervised learning-based VO method that deals with dynamic environments. Our method achieves an average improvement of 27.7% in ATE over state-of-the-art VO solutions in real-world dynamic environments.
arXiv Detail & Related papers (2022-09-17T23:56:03Z)
STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding [68.96574451918458]
We propose a framework named STVG, which models visual-linguistic dependencies with a static branch and a dynamic branch. Both the static and dynamic branches are designed as cross-modal transformers. Our proposed method achieved 39.6% vIoU and won the first place in the HC-STVG of the Person in Context Challenge.
arXiv Detail & Related papers (2022-07-06T15:48:58Z)
FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow [17.040818114071833]
We present a novel dense RGB-D SLAM solution that simultaneously accomplishes the dynamic/static segmentation and camera ego-motion estimation. Our novelty is using optical flow residuals to highlight the dynamic semantics in the RGB-D point clouds.
arXiv Detail & Related papers (2020-03-11T04:00:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.