Related papers: HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

URL: http://arxiv.org/abs/2312.17492v2
Date: Thu, 4 Jan 2024 05:57:15 GMT
Title: HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping
Authors: Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan
Abstract summary: Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP)
Score: 29.678756772610797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at a broader scale. Moreover, these methods cannot differentiate various semantics from multiple instances. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP). Specifically, a novel lightweight head with cross-attention mechanism is designed to adaptively group intra-image patches into semantically coherent regions based on correlation among self-supervised features. Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images. Also, an image-level contrastive loss is present to push foreground and background representations apart, with which foreground objects and background are accordingly discovered. HEAP facilitates efficient hierarchical image decomposition, which contributes to more accurate object discovery while also enabling differentiation among objects of various classes. Extensive experimental results on semantic segmentation retrieval, unsupervised object discovery, and saliency detection tasks demonstrate that HEAP achieves state-of-the-art performance.

Related papers

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing [11.626527403157922]
We present the Pattern Integration and Enhancement Vision Transformer (PIEViT), a novel self-supervised learning framework for remote sensing imagery. PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for remote sensing image interpretation tasks.
arXiv Detail & Related papers (2024-11-09T07:06:31Z)
Improving Object Detection via Local-global Contrastive Learning [27.660633883387753]
We present a novel image-to-image translation method that specifically targets cross-domain object detection. We learn to represent objects by contrasting local-global information. This affords investigation of an under-explored challenge: obtaining performant detection, under domain shifts.
arXiv Detail & Related papers (2024-10-07T14:18:32Z)
Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection. The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features. Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z)
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation [5.476136494434766]
We introduce EiCue, a technique providing semantic and structural cues through an eigenbasis derived from semantic similarity matrix. We guide our model to learn object-level representations with intra- and inter-image object-feature consistency. Experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results.
arXiv Detail & Related papers (2024-03-03T11:24:16Z)
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out. Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Instance-aware Remote Sensing Image Captioning with Cross-hierarchy Attention [11.23821696220285]
spatial attention is a straightforward approach to enhance the performance for remote sensing image captioning. We propose a remote sensing image caption generator with instance-awareness and cross-hierarchy attention.
arXiv Detail & Related papers (2021-05-11T12:59:07Z)
Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences. In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference. Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation [62.29076080124199]
This paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection. At the coarse-grained stage, foreground regions are extracted by adopting the attention mechanism, and aligned according to their marginal distributions. At the fine-grained stage, we conduct conditional distribution alignment of foregrounds by minimizing the distance of global prototypes with the same category but from different domains.
arXiv Detail & Related papers (2020-03-23T13:40:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.