Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D
Image Representations
- URL: http://arxiv.org/abs/2209.03494v1
- Date: Wed, 7 Sep 2022 23:24:09 GMT
- Title: Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D
Image Representations
- Authors: Vadim Tschernezki, Iro Laina, Diane Larlus and Andrea Vedaldi
- Abstract summary: We present a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene.
We show that our method not only enables semantic understanding in the context of scene-specific neural fields without the use of manual labels, but also consistently improves over the self-supervised 2D baselines.
- Score: 92.88108411154255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Neural Feature Fusion Fields (N3F), a method that improves dense
2D image feature extractors when the latter are applied to the analysis of
multiple images reconstructible as a 3D scene. Given an image feature
extractor, for example pre-trained using self-supervision, N3F uses it as a
teacher to learn a student network defined in 3D space. The 3D student network
is similar to a neural radiance field that distills said features and can be
trained with the usual differentiable rendering machinery. As a consequence,
N3F is readily applicable to most neural rendering formulations, including
vanilla NeRF and its extensions to complex dynamic scenes. We show that our
method not only enables semantic understanding in the context of scene-specific
neural fields without the use of manual labels, but also consistently improves
over the self-supervised 2D baselines. This is demonstrated by considering
various tasks, such as 2D object retrieval, 3D segmentation, and scene editing,
in diverse sequences, including long egocentric videos in the EPIC-KITCHENS
benchmark.
Related papers
- ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images [19.02348585677397]
Open-vocabulary 3D object detection (OV-3Det) aims to generalize beyond the limited number of base categories labeled during the training phase.
The biggest bottleneck is the scarcity of annotated 3D data, whereas 2D image datasets are abundant and richly annotated.
We propose a novel framework ImOV3D to leverage pseudo multimodal representation containing both images and point clouds (PC) to close the modality gap.
arXiv Detail & Related papers (2024-10-31T15:02:05Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z) - 3inGAN: Learning a 3D Generative Model from Images of a Self-similar
Scene [34.2144933185175]
3inGAN is an unconditional 3D generative model trained from 2D images of a single self-similar 3D scene.
We show results on semi-stochastic scenes of varying scale and complexity, obtained from real and synthetic sources.
arXiv Detail & Related papers (2022-11-27T18:03:21Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.