Related papers: Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

URL: http://arxiv.org/abs/2510.18437v1
Date: Tue, 21 Oct 2025 09:12:26 GMT
Title: Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Authors: Ji Du, Xin Wang, Fangwei Hao, Mingyang Yu, Chunyuan Chen, Jiesheng Wu, Bin Wang, Jing Xu, Ping Li,
Abstract summary: We propose RISE, a paradigm that exploits the entire training dataset to generate pseudo-labels for single images.<n>It is important to recognize that using only training images without annotations exerts a pronounced challenge in crafting high-quality prototype libraries.<n>In the KNN retrieval stage, to alleviate the effect of artifacts in feature maps, we propose Multi-View KNN Retrieval.
Score: 18.382178646073474
License: http://creativecommons.org/licenses/by/4.0/
Abstract: At the core of Camouflaged Object Detection (COD) lies segmenting objects from their highly similar surroundings. Previous efforts navigate this challenge primarily through image-level modeling or annotation-based optimization. Despite advancing considerably, this commonplace practice hardly taps valuable dataset-level contextual information or relies on laborious annotations. In this paper, we propose RISE, a RetrIeval SElf-augmented paradigm that exploits the entire training dataset to generate pseudo-labels for single images, which could be used to train COD models. RISE begins by constructing prototype libraries for environments and camouflaged objects using training images (without ground truth), followed by K-Nearest Neighbor (KNN) retrieval to generate pseudo-masks for each image based on these libraries. It is important to recognize that using only training images without annotations exerts a pronounced challenge in crafting high-quality prototype libraries. In this light, we introduce a Clustering-then-Retrieval (CR) strategy, where coarse masks are first generated through clustering, facilitating subsequent histogram-based image filtering and cross-category retrieval to produce high-confidence prototypes. In the KNN retrieval stage, to alleviate the effect of artifacts in feature maps, we propose Multi-View KNN Retrieval (MVKR), which integrates retrieval results from diverse views to produce more robust and precise pseudo-masks. Extensive experiments demonstrate that RISE outperforms state-of-the-art unsupervised and prompt-based methods. Code is available at https://github.com/xiaohainku/RISE.

Related papers

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction [47.01100029571904]
We study the task of establishing object-level visual correspondence across different viewpoints in videos, focusing on the challenging egocentric-to-exocentric and exocentric-to-egocentric scenarios.<n>We propose a simple yet effective framework based on conditional binary segmentation, where an object query mask is encoded into a latent representation to guide the localization of the corresponding object in a target video.<n> Experiments on the Ego-Exo4D and HANDAL-X benchmarks demonstrate the effectiveness of our optimization objective and TTT strategy, achieving state-of-the-art performance.
arXiv Detail & Related papers (2026-02-22T00:53:03Z)
Leveraging Hierarchical Image-Text Misalignment for Universal Fake Image Detection [58.927873049646024]
We show that fake images cannot be properly aligned with corresponding captions compared to real images.<n>We propose a simple yet effective ITEM by leveraging the image-text misalignment in a joint visual-language space as discriminative clues.
arXiv Detail & Related papers (2025-11-01T06:51:14Z)
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection. CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z)
Transformer-based Clipped Contrastive Quantization Learning for Unsupervised Image Retrieval [15.982022297570108]
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image. In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing. Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
arXiv Detail & Related papers (2024-01-27T09:39:11Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
ASIC: Aligning Sparse in-the-wild Image Collections [86.66498558225625]
We present a method for joint alignment of sparse in-the-wild image collections of an object category. We use pairwise nearest neighbors obtained from deep features of a pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches. Experiments on CUB and SPair-71k benchmarks demonstrate that our method can produce globally consistent and higher quality correspondences.
arXiv Detail & Related papers (2023-03-28T17:59:28Z)
Detecting Images Generated by Diffusers [12.986394431694206]
We consider images generated from captions in the MSCOCO and Wikimedia datasets using two state-of-the-art models: Stable Diffusion and GLIDE. Our experiments show that it is possible to detect the generated images using simple Multi-Layer Perceptrons. We find that incorporating the associated textual information with the images rarely leads to significant improvement in detection results.
arXiv Detail & Related papers (2023-03-09T14:14:29Z)
Single-pass Object-adaptive Data Undersampling and Reconstruction for MRI [6.599344783327054]
We propose a data-driven sampler using a convolutional neural network, MNet, to provide object-specific sampling patterns adaptive to each scanned object. The network observes very limited low-frequency k-space data for each object and rapidly predicts the desired undersampling pattern. Experimental results on the fastMRI knee dataset demonstrate the ability of the proposed learned undersampling network to generate object-specific masks at fourfold and eightfold acceleration.
arXiv Detail & Related papers (2021-11-17T16:06:06Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization. We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning. Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.