Object-level Correlation for Few-Shot Segmentation
- URL: http://arxiv.org/abs/2509.07917v1
- Date: Tue, 09 Sep 2025 16:58:28 GMT
- Title: Object-level Correlation for Few-Shot Segmentation
- Authors: Chunlin Wen, Yu Zhang, Jie Fan, Hongyuan Zhu, Xiu-Shen Wei, Yijun Wang, Zhiqiang Kou, Shuzhou Sun,
- Abstract summary: Few-shot semantic segmentation (FSS) aims to segment objects of novel categories in the query images given only a few annotated support samples.<n>We propose an Object-level Correlation Network (OCNet) by establishing the object-level correlation between the support target object and query general objects.<n>The generated object-level correlation can mine the query target feature and suppress the hard pixel noise for the final prediction.
- Score: 37.247888341920984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot semantic segmentation (FSS) aims to segment objects of novel categories in the query images given only a few annotated support samples. Existing methods primarily build the image-level correlation between the support target object and the entire query image. However, this correlation contains the hard pixel noise, \textit{i.e.}, irrelevant background objects, that is intractable to trace and suppress, leading to the overfitting of the background. To address the limitation of this correlation, we imitate the biological vision process to identify novel objects in the object-level information. Target identification in the general objects is more valid than in the entire image, especially in the low-data regime. Inspired by this, we design an Object-level Correlation Network (OCNet) by establishing the object-level correlation between the support target object and query general objects, which is mainly composed of the General Object Mining Module (GOMM) and Correlation Construction Module (CCM). Specifically, GOMM constructs the query general object feature by learning saliency and high-level similarity cues, where the general objects include the irrelevant background objects and the target foreground object. Then, CCM establishes the object-level correlation by allocating the target prototypes to match the general object feature. The generated object-level correlation can mine the query target feature and suppress the hard pixel noise for the final prediction. Extensive experiments on PASCAL-${5}^{i}$ and COCO-${20}^{i}$ show that our model achieves the state-of-the-art performance.
Related papers
- PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning [49.66437612420291]
PoseGAM is a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images.<n>We construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions.
arXiv Detail & Related papers (2025-12-11T17:29:25Z) - Object-Centric Cropping for Visual Few-Shot Classification [5.199807441687141]
In the domain of Few-Shot Image Classification, operating with as little as one example per class, the presence of image ambiguities stemming from multiple objects or complex backgrounds can significantly deteriorate performance.<n>Our research demonstrates that incorporating additional information about the local positioning of an object within its image markedly enhances classification across established benchmarks.
arXiv Detail & Related papers (2025-07-31T23:44:06Z) - Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation [47.047267066525265]
We introduce a novel approach that incorporates object-level contextual knowledge within images.<n>Our proposed approach achieves state-of-the-art performance with strong generalizability across diverse datasets.
arXiv Detail & Related papers (2024-11-26T06:34:48Z) - Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework that learns spatial-semantic features and discriminative object queries.<n>The proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS 2017 test (textbf87.8%), YoutubeVOS 2019 (textbf88.1%), MOSE val (textbf74.0%), and LVOS test (textbf73.0%)
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Referring Camouflaged Object Detection [88.63744004901655]
Ref-COD aims to segment specified camouflaged objects based on a small set of referring images with salient target objects.<n>We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios.
arXiv Detail & Related papers (2023-06-13T04:15:37Z) - An Object SLAM Framework for Association, Mapping, and High-Level Tasks [12.62957558651032]
We present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks.
A range of public datasets and real-world results have been used to evaluate the proposed object SLAM framework for its efficient performance.
arXiv Detail & Related papers (2023-05-12T08:10:14Z) - LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation [69.70498875887611]
LocPoseNet is able to robustly learn location prior for unseen objects.
Our method outperforms existing works by a large margin on LINEMOD and GenMOP.
arXiv Detail & Related papers (2022-11-29T15:21:34Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.