Related papers: Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

URL: http://arxiv.org/abs/2108.03032v2
Date: Mon, 9 Aug 2021 14:53:33 GMT
Title: Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
Authors: Zhihe lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang
Abstract summary: A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier. Most existing methods meta-learn all three model components for fast adaptation to a new class. In this work we propose to simplify the meta-learning task by focusing solely on the simplest component, the classifier.
Score: 112.95747173442754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier (separating foreground and background pixels). Most existing methods meta-learn all three model components for fast adaptation to a new class. However, given that as few as a single support set image is available, effective model adaption of all three components to the new class is extremely challenging. In this work we propose to simplify the meta-learning task by focusing solely on the simplest component, the classifier, whilst leaving the encoder and decoder to pre-training. We hypothesize that if we pre-train an off-the-shelf segmentation model over a set of diverse training classes with sufficient annotations, the encoder and decoder can capture rich discriminative features applicable for any unseen classes, rendering the subsequent meta-learning stage unnecessary. For the classifier meta-learning, we introduce a Classifier Weight Transformer (CWT) designed to dynamically adapt the supportset trained classifier's weights to each query image in an inductive way. Extensive experiments on two standard benchmarks show that despite its simplicity, our method outperforms the state-of-the-art alternatives, often by a large margin.Code is available on https://github.com/zhiheLu/CWT-for-FSS.

Related papers

Center-guided Classifier for Semantic Segmentation of Remote Sensing Images [2.803715177543843]
CenterSeg is a novel classifier for semantic segmentation of remote sensing images. It solves problems with multiple prototypes, direct supervision under Grassmann manifold, and interpretability strategy. Besides the superior performance, CenterSeg has the advantages of simplicity, lightweight, compatibility, and interpretability.
arXiv Detail & Related papers (2025-03-21T09:21:37Z)
Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation. Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z)
Multi-Class Unlearning for Image Classification via Weight Filtering [44.707144011189335]
Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. We achieve this by modulating the network's components using memory matrices, enabling the network to demonstrate selective unlearning behavior for any class after training. We test the proposed framework on small- and medium-scale image classification datasets, with both convolution- and Transformer-based backbones.
arXiv Detail & Related papers (2023-04-04T18:01:59Z)
Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier. Our method is model-agnostic and can be easily applied to generic segmentation models. With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z)
Prediction Calibration for Generalized Few-shot Semantic Segmentation [101.69940565204816]
Generalized Few-shot Semantic (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class. We build a cross-attention module that guides the classifier's final prediction using the fused multi-level features. Our PCN outperforms the state-the-art alternatives by large margins.
arXiv Detail & Related papers (2022-10-15T13:30:12Z)
CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples. Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning. We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation [63.910211095033596]
Recently few-shot segmentation (FSS) has been extensively developed. This paper proposes a fresh and straightforward insight to alleviate the problem. In light of the unique nature of the proposed approach, we also extend it to a more realistic but challenging setting.
arXiv Detail & Related papers (2022-03-15T03:08:27Z)
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning [41.07029317930986]
We propose a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier. We further extend this approach to a transductive learning setting, proposing Transductive CNAPS.
arXiv Detail & Related papers (2022-01-13T18:59:02Z)
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark [33.86872697028233]
We present an in-depth study on few-shot video classification by making three contributions. First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning. Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting. Third, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.
arXiv Detail & Related papers (2021-10-24T06:01:46Z)
Few-Shot Temporal Action Localization with Query Adaptive Transformer [105.84328176530303]
TAL works rely on a large number of training videos with exhaustive segment-level annotation. Few-shot TAL aims to adapt a model to a new class represented by as few as a single video.
arXiv Detail & Related papers (2021-10-20T13:18:01Z)
Meta Learning for Few-Shot One-class Classification [0.0]
We formulate the learning of meaningful features for one-class classification as a meta-learning problem. To learn these representations, we require only multiclass data from similar tasks. We validate our approach by adapting few-shot classification datasets to the few-shot one-class classification scenario.
arXiv Detail & Related papers (2020-09-11T11:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.