Unsupervised Continual Semantic Adaptation through Neural Rendering
- URL: http://arxiv.org/abs/2211.13969v2
- Date: Fri, 24 Mar 2023 12:11:49 GMT
- Title: Unsupervised Continual Semantic Adaptation through Neural Rendering
- Authors: Zhizheng Liu, Francesco Milano, Jonas Frey, Roland Siegwart, Hermann
Blum, Cesar Cadena
- Abstract summary: We study continual multi-scene adaptation for the task of semantic segmentation.
We propose training a Semantic-NeRF network for each scene by fusing the predictions of a segmentation model.
We evaluate our approach on ScanNet, where we outperform both a voxel-based baseline and a state-of-the-art unsupervised domain adaptation method.
- Score: 32.099350613956716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An increasing amount of applications rely on data-driven models that are
deployed for perception tasks across a sequence of scenes. Due to the mismatch
between training and deployment data, adapting the model on the new scenes is
often crucial to obtain good performance. In this work, we study continual
multi-scene adaptation for the task of semantic segmentation, assuming that no
ground-truth labels are available during deployment and that performance on the
previous scenes should be maintained. We propose training a Semantic-NeRF
network for each scene by fusing the predictions of a segmentation model and
then using the view-consistent rendered semantic labels as pseudo-labels to
adapt the model. Through joint training with the segmentation model, the
Semantic-NeRF model effectively enables 2D-3D knowledge transfer. Furthermore,
due to its compact size, it can be stored in a long-term memory and
subsequently used to render data from arbitrary viewpoints to reduce
forgetting. We evaluate our approach on ScanNet, where we outperform both a
voxel-based baseline and a state-of-the-art unsupervised domain adaptation
method.
Related papers
- Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer)
We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments.
We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z) - Self-supervised Pre-training for Semantic Segmentation in an Indoor
Scene [8.357801312689622]
We propose RegConsist, a method for self-supervised pre-training of a semantic segmentation model.
We use a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the target environment.
The proposed method outperforms models pre-trained on ImageNet and achieves competitive performance when using models that are trained for exactly the same task but on a different dataset.
arXiv Detail & Related papers (2022-10-04T20:10:14Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Self-Supervised Contrastive Learning for Unsupervised Phoneme
Segmentation [37.054709598792165]
The model is a convolutional neural network that operates directly on the raw waveform.
It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle.
At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries.
arXiv Detail & Related papers (2020-07-27T12:10:21Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.