Related papers: Multi-Content Interaction Network for Few-Shot Segmentation

Multi-Content Interaction Network for Few-Shot Segmentation

URL: http://arxiv.org/abs/2303.06304v2
Date: Tue, 2 May 2023 15:45:47 GMT
Title: Multi-Content Interaction Network for Few-Shot Segmentation
Authors: Hao Chen, Yunlong Yu, Yonghan Dong, Zheming Lu, Yingming Li, and Zhongfei Zhang
Abstract summary: Few-Shot COCO is challenging for limited support images and large intra-class appearance discrepancies. We propose a Multi-Content Interaction Network (MCINet) to remedy this issue. MCINet improves FSS by incorporating the low-level structural information from another query branch into the high-level semantic features.
Score: 37.80624074068096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-Shot Segmentation (FSS) is challenging for limited support images and large intra-class appearance discrepancies. Most existing approaches focus on extracting high-level representations of the same layers for support-query correlations, neglecting the shift issue between different layers and scales, due to the huge difference between support and query samples. In this paper, we propose a Multi-Content Interaction Network (MCINet) to remedy this issue by fully exploiting and interacting with the multi-scale contextual information contained in the support-query pairs to supplement the same-layer correlations. Specifically, MCINet improves FSS from the perspectives of boosting the query representations by incorporating the low-level structural information from another query branch into the high-level semantic features, enhancing the support-query correlations by exploiting both the same-layer and adjacent-layer features, and refining the predicted results by a multi-scale mask prediction strategy, with which the different scale contents have bidirectionally interacted. Experiments on two benchmarks demonstrate that our approach reaches SOTA performances and outperforms the best competitors with many desirable advantages, especially on the challenging COCO dataset.

Related papers

Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence [83.15764564701706]
We propose a novel framework that performs distributional vision-language alignment by integrating Cauchy-Schwarz divergence with mutual information. In the proposed framework, we find that the CS divergence and mutual information serve complementary roles in multimodal alignment, capturing both the global distribution information of each modality and the pairwise semantic relationships. Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment.
arXiv Detail & Related papers (2025-02-24T10:29:15Z)
Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks. MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization. MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z)
Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering. To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view. Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z)
M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition. It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making. Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z)
Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network [28.46908214462594]
Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples. We propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images.
arXiv Detail & Related papers (2023-01-09T08:00:01Z)
Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM) In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug. Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z)
Progressive Multi-scale Consistent Network for Multi-class Fundus Lesion Segmentation [28.58972084293778]
We propose a progressive multi-scale consistent network (PMCNet) that integrates the proposed progressive feature fusion (PFF) block and dynamic attention block (DAB) PFF block progressively integrates multi-scale features from adjacent encoding layers, facilitating feature learning of each layer by aggregating fine-grained details and high-level semantics. DAB is designed to dynamically learn the attentive cues from the fused features at different scales, thus aiming to smooth the essential conflicts existing in multi-scale features.
arXiv Detail & Related papers (2022-05-31T12:10:01Z)
CATrans: Context and Affinity Transformer for Few-Shot Segmentation [36.802347383825705]
Few-shot segmentation (FSS) aims to segment novel categories given scarce annotated support images. In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer. We conduct experiments to demonstrate the effectiveness of the proposed model, outperforming the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-27T10:20:47Z)
CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images. One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships. We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework. In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations. Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.