Few-Shot Panoptic Segmentation With Foundation Models
- URL: http://arxiv.org/abs/2309.10726v3
- Date: Fri, 1 Mar 2024 13:48:34 GMT
- Title: Few-Shot Panoptic Segmentation With Foundation Models
- Authors: Markus K\"appeler, K\"ursat Petek, Niclas V\"odisch, Wolfram Burgard,
Abhinav Valada
- Abstract summary: We propose to leverage task-agnostic image features to enable few-shot panoptic segmentation by presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO)
In detail, our method combines a DINOv2 backbone with lightweight network heads for semantic segmentation and boundary estimation.
We show that our approach, albeit being trained with only ten annotated images, predicts high-quality pseudo-labels that can be used with any existing panoptic segmentation method.
- Score: 23.231014713335664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art methods for panoptic segmentation require an immense
amount of annotated training data that is both arduous and expensive to obtain
posing a significant challenge for their widespread adoption. Concurrently,
recent breakthroughs in visual representation learning have sparked a paradigm
shift leading to the advent of large foundation models that can be trained with
completely unlabeled images. In this work, we propose to leverage such
task-agnostic image features to enable few-shot panoptic segmentation by
presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO). In
detail, our method combines a DINOv2 backbone with lightweight network heads
for semantic segmentation and boundary estimation. We show that our approach,
albeit being trained with only ten annotated images, predicts high-quality
pseudo-labels that can be used with any existing panoptic segmentation method.
Notably, we demonstrate that SPINO achieves competitive results compared to
fully supervised baselines while using less than 0.3% of the ground truth
labels, paving the way for learning complex visual recognition tasks leveraging
foundation models. To illustrate its general applicability, we further deploy
SPINO on real-world robotic vision systems for both outdoor and indoor
environments. To foster future research, we make the code and trained models
publicly available at http://spino.cs.uni-freiburg.de.
Related papers
- A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation [22.440065488051047]
Key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data.
We exploit the groundwork paved by visual foundation models to train two lightweight network heads for semantic segmentation and object boundary detection.
We demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotations.
arXiv Detail & Related papers (2024-05-29T12:23:29Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - Image Understands Point Cloud: Weakly Supervised 3D Semantic
Segmentation via Association Learning [59.64695628433855]
We propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images.
Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels.
Our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
arXiv Detail & Related papers (2022-09-16T07:59:04Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic
Segmentation [40.27705176115985]
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
We propose a novel meta-learning framework, which predicts pseudo pixel-level segmentation masks from a limited amount of data and their semantic labels.
Our proposed learning model can be viewed as a pixel-level meta-learner.
arXiv Detail & Related papers (2021-11-02T08:28:11Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.