Masked Cross-image Encoding for Few-shot Segmentation
- URL: http://arxiv.org/abs/2308.11201v1
- Date: Tue, 22 Aug 2023 05:36:39 GMT
- Title: Masked Cross-image Encoding for Few-shot Segmentation
- Authors: Wenbo Xu, Huaxi Huang, Ming Cheng, Litao Yu, Qiang Wu, Jian Zhang
- Abstract summary: Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images.
We propose a joint learning method termed Masked Cross-Image MCE, which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction.
- Score: 16.445813548503708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot segmentation (FSS) is a dense prediction task that aims to infer the
pixel-wise labels of unseen classes using only a limited number of annotated
images. The key challenge in FSS is to classify the labels of query pixels
using class prototypes learned from the few labeled support exemplars. Prior
approaches to FSS have typically focused on learning class-wise descriptors
independently from support images, thereby ignoring the rich contextual
information and mutual dependencies among support-query features. To address
this limitation, we propose a joint learning method termed Masked Cross-Image
Encoding (MCE), which is designed to capture common visual properties that
describe object details and to learn bidirectional inter-image dependencies
that enhance feature interaction. MCE is more than a visual representation
enrichment module; it also considers cross-image mutual dependencies and
implicit guidance. Experiments on FSS benchmarks PASCAL-$5^i$ and COCO-$20^i$
demonstrate the advanced meta-learning ability of the proposed method.
Related papers
- Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Multi-Label Self-Supervised Learning with Scene Images [21.549234013998255]
This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem.
The proposed method is named Multi-Label Self-supervised learning (MLS)
arXiv Detail & Related papers (2023-08-07T04:04:22Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Iterative Few-shot Semantic Segmentation from Image Label Text [36.53926941601841]
Few-shot semantic segmentation aims to learn to segment unseen class objects with the guidance of only a few support images.
We propose a general framework to generate coarse masks with the help of the powerful vision-language model CLIP.
Our method owns an excellent generalization ability for the images in the wild and uncommon classes.
arXiv Detail & Related papers (2023-03-10T01:48:14Z) - A Joint Framework Towards Class-aware and Class-agnostic Alignment for
Few-shot Segmentation [11.47479526463185]
Few-shot segmentation aims to segment objects of unseen classes given only a few annotated support images.
Most existing methods simply stitch query features with independent support prototypes and segment the query image by feeding the mixed features to a decoder.
We propose a joint framework that combines more valuable class-aware and class-agnostic alignment guidance to facilitate the segmentation.
arXiv Detail & Related papers (2022-11-02T17:33:25Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - MFNet: Multi-class Few-shot Segmentation Network with Pixel-wise Metric
Learning [34.059257121606336]
This work focuses on few-shot semantic segmentation, which is still a largely unexplored field.
We first present a novel multi-way encoding and decoding architecture which effectively fuses multi-scale query information and multi-class support information into one query-support embedding.
Experiments on standard benchmarks PASCAL-5i and COCO-20i show clear benefits of our method over the state of the art in few-shot segmentation.
arXiv Detail & Related papers (2021-10-30T11:37:36Z) - Learning Meta-class Memory for Few-Shot Semantic Segmentation [90.28474742651422]
We introduce the concept of meta-class, which is the meta information shareable among all classes.
We propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings.
Our proposed MM-Net achieves 37.5% mIoU on the COCO dataset in 1-shot setting, which is 5.1% higher than the previous state-of-the-art.
arXiv Detail & Related papers (2021-08-06T06:29:59Z) - Semantically Meaningful Class Prototype Learning for One-Shot Image
Semantic Segmentation [58.96902899546075]
One-shot semantic image segmentation aims to segment the object regions for the novel class with only one annotated image.
Recent works adopt the episodic training strategy to mimic the expected situation at testing time.
We propose to leverage the multi-class label information during the episodic training. It will encourage the network to generate more semantically meaningful features for each category.
arXiv Detail & Related papers (2021-02-22T12:07:35Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.