Multi-Content Interaction Network for Few-Shot Segmentation
- URL: http://arxiv.org/abs/2303.06304v2
- Date: Tue, 2 May 2023 15:45:47 GMT
- Title: Multi-Content Interaction Network for Few-Shot Segmentation
- Authors: Hao Chen, Yunlong Yu, Yonghan Dong, Zheming Lu, Yingming Li, and
Zhongfei Zhang
- Abstract summary: Few-Shot COCO is challenging for limited support images and large intra-class appearance discrepancies.
We propose a Multi-Content Interaction Network (MCINet) to remedy this issue.
MCINet improves FSS by incorporating the low-level structural information from another query branch into the high-level semantic features.
- Score: 37.80624074068096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-Shot Segmentation (FSS) is challenging for limited support images and
large intra-class appearance discrepancies. Most existing approaches focus on
extracting high-level representations of the same layers for support-query
correlations, neglecting the shift issue between different layers and scales,
due to the huge difference between support and query samples. In this paper, we
propose a Multi-Content Interaction Network (MCINet) to remedy this issue by
fully exploiting and interacting with the multi-scale contextual information
contained in the support-query pairs to supplement the same-layer correlations.
Specifically, MCINet improves FSS from the perspectives of boosting the query
representations by incorporating the low-level structural information from
another query branch into the high-level semantic features, enhancing the
support-query correlations by exploiting both the same-layer and adjacent-layer
features, and refining the predicted results by a multi-scale mask prediction
strategy, with which the different scale contents have bidirectionally
interacted. Experiments on two benchmarks demonstrate that our approach reaches
SOTA performances and outperforms the best competitors with many desirable
advantages, especially on the challenging COCO dataset.
Related papers
- Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Few-shot Semantic Segmentation with Support-induced Graph Convolutional
Network [28.46908214462594]
Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples.
We propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images.
arXiv Detail & Related papers (2023-01-09T08:00:01Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - Progressive Multi-scale Consistent Network for Multi-class Fundus Lesion
Segmentation [28.58972084293778]
We propose a progressive multi-scale consistent network (PMCNet) that integrates the proposed progressive feature fusion (PFF) block and dynamic attention block (DAB)
PFF block progressively integrates multi-scale features from adjacent encoding layers, facilitating feature learning of each layer by aggregating fine-grained details and high-level semantics.
DAB is designed to dynamically learn the attentive cues from the fused features at different scales, thus aiming to smooth the essential conflicts existing in multi-scale features.
arXiv Detail & Related papers (2022-05-31T12:10:01Z) - CATrans: Context and Affinity Transformer for Few-Shot Segmentation [36.802347383825705]
Few-shot segmentation (FSS) aims to segment novel categories given scarce annotated support images.
In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer.
We conduct experiments to demonstrate the effectiveness of the proposed model, outperforming the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-27T10:20:47Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.