Memory-based Adapters for Online 3D Scene Perception
- URL: http://arxiv.org/abs/2403.06974v1
- Date: Mon, 11 Mar 2024 17:57:41 GMT
- Title: Memory-based Adapters for Online 3D Scene Perception
- Authors: Xiuwei Xu and Chong Xia and Ziwei Wang and Linqing Zhao and Yueqi Duan
and Jie Zhou and Jiwen Lu
- Abstract summary: Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input.
We propose an adapter-based plug-and-play module for the backbone of 3D scene perception model.
Our adapters can be easily inserted into mainstream offline architectures of different tasks and significantly boost their performance on online tasks.
- Score: 71.71645534899905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new framework for online 3D scene perception.
Conventional 3D scene perception methods are offline, i.e., take an already
reconstructed 3D scene geometry as input, which is not applicable in robotic
applications where the input data is streaming RGB-D videos rather than a
complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with
online 3D scene perception tasks where data collection and perception should be
performed simultaneously, the model should be able to process 3D scenes frame
by frame and make use of the temporal information. To this end, we propose an
adapter-based plug-and-play module for the backbone of 3D scene perception
model, which constructs memory to cache and aggregate the extracted RGB-D
features to empower offline models with temporal learning ability.
Specifically, we propose a queued memory mechanism to cache the supporting
point cloud and image features. Then we devise aggregation modules which
directly perform on the memory and pass temporal information to current frame.
We further propose 3D-to-2D adapter to enhance image features with strong
global context. Our adapters can be easily inserted into mainstream offline
architectures of different tasks and significantly boost their performance on
online tasks. Extensive experiments on ScanNet and SceneNN datasets demonstrate
our approach achieves leading performance on three 3D scene perception tasks
compared with state-of-the-art online methods by simply finetuning existing
offline models, without any model and task-specific designs.
\href{https://xuxw98.github.io/Online3D/}{Project page}.
Related papers
- EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - ODIN: A Single Model for 2D and 3D Segmentation [34.612953668151036]
ODIN is a model that segment and label both 2D RGB images and 3D point clouds.
It achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D segmentation benchmarks.
arXiv Detail & Related papers (2024-01-04T18:59:25Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Prompt-guided Scene Generation for 3D Zero-Shot Learning [8.658191774247944]
We propose a prompt-guided 3D scene generation and supervision method to augment 3D data to learn the network better.
First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene.
We have achieved state-of-the-art ZSL and generalized ZSL performance on synthetic (ModelNet40, ModelNet10) and real-scanned (ScanOjbectNN) 3D object datasets.
arXiv Detail & Related papers (2022-09-29T11:24:33Z) - CoCoNets: Continuous Contrastive 3D Scene Representations [21.906643302668716]
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos.
We show the resulting 3D visual feature representations effectively scale across objects and scenes, imagine information occluded or missing from the input viewpoints, track objects over time, align semantically related objects in 3D, and improve 3D object detection.
arXiv Detail & Related papers (2021-04-08T15:50:47Z) - Interactive Annotation of 3D Object Geometry using 2D Scribbles [84.51514043814066]
In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
arXiv Detail & Related papers (2020-08-24T21:51:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.