Related papers: AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation

AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation

URL: http://arxiv.org/abs/2412.17601v2
Date: Wed, 25 Dec 2024 01:42:31 GMT
Title: AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation
Authors: Jiaqi Ma, Guo-Sen Xie, Fang Zhao, Zechao Li,
Abstract summary: Few-shot learning aims to recognize novel concepts by leveraging prior knowledge learned from a few samples.<n>We propose an adaptive frequency-aware network (AFANet) for weakly-supervised few-shot semantic segmentation.
Score: 37.9826204492371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-shot learning aims to recognize novel concepts by leveraging prior knowledge learned from a few samples. However, for visually intensive tasks such as few-shot semantic segmentation, pixel-level annotations are time-consuming and costly. Therefore, in this paper, we utilize the more challenging image-level annotations and propose an adaptive frequency-aware network (AFANet) for weakly-supervised few-shot semantic segmentation (WFSS). Specifically, we first propose a cross-granularity frequency-aware module (CFM) that decouples RGB images into high-frequency and low-frequency distributions and further optimizes semantic structural information by realigning them. Unlike most existing WFSS methods using the textual information from the multi-modal language-vision model, e.g., CLIP, in an offline learning manner, we further propose a CLIP-guided spatial-adapter module (CSM), which performs spatial domain adaptive transformation on textual information through online learning, thus providing enriched cross-modal semantic information for CFM. Extensive experiments on the Pascal-5\textsuperscript{i} and COCO-20\textsuperscript{i} datasets demonstrate that AFANet has achieved state-of-the-art performance. The code is available at https://github.com/jarch-ma/AFANet.

Related papers

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation [2.7624021966289605]
Few-shot semantic segmentation (FSS) aims to enable models to segment novel/unseen object classes using only a limited number of labeled examples. We propose a novel framework that utilizes large language models (LLMs) to adapt general class semantic information to the query image. Our framework achieves state-of-the-art performance-by a significant margin-demonstrating superior generalization to novel classes and robustness across diverse scenarios.
arXiv Detail & Related papers (2025-03-06T01:42:28Z)
LMS-Net: A Learned Mumford-Shah Network For Few-Shot Medical Image Segmentation [16.384916751377794]
We propose a novel deep unfolding network, called the Learned Mumford-Shah Network (LMS-Net) We leverage our prototypical learned Mumford-Shah model (LMS model) as a mathematical foundation to integrate insights into a unified framework. Comprehensive experiments on three publicly available medical segmentation datasets verify the effectiveness of our method.
arXiv Detail & Related papers (2025-02-08T07:15:44Z)
DiffCLIP: Few-shot Language-driven Multimodal Classifier [19.145645804307566]
DiffCLIP is a novel framework that extends Contrastive Language-Image Pretraining.<n>It conveys comprehensive language-driven semantic information for accurate classification of high-dimensional multimodal remote sensing images.<n>DiffCLIP achieves an overall accuracy improvement of 10.65% across three remote sensing datasets compared with CLIP.
arXiv Detail & Related papers (2024-12-10T02:21:39Z)
Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning [86.99944014645322]
We introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning. We decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network. Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.
arXiv Detail & Related papers (2024-11-03T04:02:35Z)
FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background [9.970265640589966]
Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios. We propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages. Our experimental results demonstrate the state-of-the-art performance compared to existing methods.
arXiv Detail & Related papers (2024-07-12T15:57:52Z)
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing [33.69721994194684]
We propose a novel framework for openvocabulary semantic segmentation called EBSeg. AdaB Decoder is designed to generate different image embeddings for both training and new classes. SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP.
arXiv Detail & Related papers (2024-06-14T08:34:20Z)
Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL) GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval. Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z)
FLIP: Cross-domain Face Anti-spoofing with Language Guidance [19.957293190322332]
Face anti-spoofing (FAS) or presentation attack detection is an essential component of face recognition systems. Recent vision transformer (ViT) models have been shown to be effective for the FAS task. We propose a novel approach for robust cross-domain FAS by grounding visual representations with the help of natural language.
arXiv Detail & Related papers (2023-09-28T17:53:20Z)
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement [69.47143451986067]
Low-light image enhancement (LLIE) investigates how to improve illumination and produce normal-light images. The majority of existing methods improve low-light images via a global and uniform manner, without taking into account the semantic information of different regions. We propose a novel semantic-aware knowledge-guided framework that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.
arXiv Detail & Related papers (2023-04-14T10:22:28Z)
Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS) Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network. A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features. The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.