Prototype Mixture Models for Few-shot Semantic Segmentation
- URL: http://arxiv.org/abs/2008.03898v2
- Date: Tue, 1 Sep 2020 11:23:17 GMT
- Title: Prototype Mixture Models for Few-shot Semantic Segmentation
- Authors: Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, and Qixiang Ye
- Abstract summary: Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose.
We propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation.
PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82% with only a moderate cost for model size and inference speed.
- Score: 50.866870384596446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot segmentation is challenging because objects within the support and
query images could significantly differ in appearance and pose. Using a single
prototype acquired directly from the support image to segment the query image
causes semantic ambiguity. In this paper, we propose prototype mixture models
(PMMs), which correlate diverse image regions with multiple prototypes to
enforce the prototype-based semantic representation. Estimated by an
Expectation-Maximization algorithm, PMMs incorporate rich channel-wised and
spatial semantics from limited support images. Utilized as representations as
well as classifiers, PMMs fully leverage the semantics to activate objects in
the query image while depressing background regions in a duplex manner.
Extensive experiments on Pascal VOC and MS-COCO datasets show that PMMs
significantly improve upon state-of-the-arts. Particularly, PMMs improve 5-shot
segmentation performance on MS-COCO by up to 5.82\% with only a moderate cost
for model size and inference speed.
Related papers
- Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset [66.15872913664407]
This study introduces textbfRS-4M, a large-scale dataset designed to enable highly efficient MIM training on RS images.
We propose an efficient MIM method, termed textbfSelectiveMAE, which dynamically encodes and reconstructs a subset of patch tokens selected based on their semantic richness.
Experiments show that SelectiveMAE significantly boosts training efficiency by 2.2-2.7 times and enhances the classification, detection, and segmentation performance of the baseline MIM model.
arXiv Detail & Related papers (2024-06-17T15:41:57Z) - Matryoshka Multimodal Models [92.41824727506751]
We propose M3: Matryoshka Multimodal Models, which learns to represent visual content as nested sets of visual tokens.
We find that COCO-style benchmarks only need around 9 visual tokens to obtain accuracy similar to that of using all 576 tokens.
arXiv Detail & Related papers (2024-05-27T17:59:56Z) - Mask Matching Transformer for Few-Shot Segmentation [71.32725963630837]
Mask Matching Transformer (MM-Former) is a new paradigm for the few-shot segmentation task.
First, the MM-Former follows the paradigm of decompose first and then blend, allowing our method to benefit from the advanced potential objects segmenter.
We conduct extensive experiments on the popular COCO-$20i$ and Pascal-$5i$ benchmarks.
arXiv Detail & Related papers (2022-12-05T11:00:32Z) - Breaking Immutable: Information-Coupled Prototype Elaboration for
Few-Shot Object Detection [15.079980293820137]
We propose an Information-Coupled Prototype Elaboration (ICPE) method to generate specific and representative prototypes for each query image.
Our method achieves state-of-the-art performance in almost all settings.
arXiv Detail & Related papers (2022-11-27T10:33:11Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z) - SimPropNet: Improved Similarity Propagation for Few-shot Image
Segmentation [14.419517737536706]
Recent deep neural network based FSS methods leverage high-dimensional feature similarity between the foreground features of the support images and the query image features.
We propose to jointly predict the support and query masks to force the support features to share characteristics with the query features.
Our method achieves state-of-the-art results for one-shot and five-shot segmentation on the PASCAL-5i dataset.
arXiv Detail & Related papers (2020-04-30T17:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.