${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations
- URL: http://arxiv.org/abs/2512.20260v1
- Date: Tue, 23 Dec 2025 11:16:16 GMT
- Title: ${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations
- Authors: Jiawei Ge, Jiuxin Cao, Xinyi Li, Xuelin Zhu, Chang Liu, Bo Liu, Chen Feng, Ioannis Patras,
- Abstract summary: $D3$ETOR is a two-stage WSCOD framework consisting of Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing.<n>We introduce an adaptive entropy-driven point sampling method and a multi-agent debate mechanism to enhance the capability of SAM for COD.<n>In the second stage, we design FADeNet, which fuses multi-level frequency-aware features to balance global semantic understanding with local detail modeling.
- Score: 35.83125554386894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly-Supervised Camouflaged Object Detection (WSCOD) aims to locate and segment objects that are visually concealed within their surrounding scenes, relying solely on sparse supervision such as scribble annotations. Despite recent progress, existing WSCOD methods still lag far behind fully supervised ones due to two major limitations: (1) the pseudo masks generated by general-purpose segmentation models (e.g., SAM) and filtered via rules are often unreliable, as these models lack the task-specific semantic understanding required for effective pseudo labeling in COD; and (2) the neglect of inherent annotation bias in scribbles, which hinders the model from capturing the global structure of camouflaged objects. To overcome these challenges, we propose ${D}^{3}$ETOR, a two-stage WSCOD framework consisting of Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing. In the first stage, we introduce an adaptive entropy-driven point sampling method and a multi-agent debate mechanism to enhance the capability of SAM for COD, improving the interpretability and precision of pseudo masks. In the second stage, we design FADeNet, which progressively fuses multi-level frequency-aware features to balance global semantic understanding with local detail modeling, while dynamically reweighting supervision strength across regions to alleviate scribble bias. By jointly exploiting the supervision signals from both the pseudo masks and scribble semantics, ${D}^{3}$ETOR significantly narrows the gap between weakly and fully supervised COD, achieving state-of-the-art performance on multiple benchmarks.
Related papers
- Learning Accurate Segmentation Purely from Self-Supervision [87.78965637247107]
Selfment is a fully self-supervised framework that segments objects directly from raw images without human labels.<n>Selfment sets new state-of-the-art (SoTA) results across multiple benchmarks.
arXiv Detail & Related papers (2026-02-27T07:36:32Z) - Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking [10.223105883919278]
We introduce a Granularity-Consistent automatic 2D Mask Tracking approach that maintains temporal correspondences across frames.<n>Our method effectively generated consistent and accurate 3D segmentations.
arXiv Detail & Related papers (2025-11-02T03:52:42Z) - BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation [28.97274092946373]
3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations.<n>Box-level annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging.<n>Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage.<n>We propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation.
arXiv Detail & Related papers (2025-10-14T06:23:18Z) - First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection [14.070196423996045]
Existing approaches often rely on heavy training and large computational resources.<n>We propose RAG-SEG, a training-free paradigm that decouples COD into two stages: Retrieval-Augmented Generation (RAG) for generating coarse masks as prompts, followed by SAM-based segmentation (SEG) for refinement.<n>RAG-SEG constructs a compact retrieval database via unsupervised clustering, enabling fast and effective feature retrieval.<n>Experiments on benchmark COD datasets demonstrate that RAG-SEG performs on par with or surpasses state-of-the-art methods.
arXiv Detail & Related papers (2025-08-21T07:14:18Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection [73.85890512959861]
We propose a task-agnostic framework to unify Salient Object Detection (SOD) and Camouflaged Object Detection (COD)<n>We design a simple yet effective contextual decoder involving the interval-layer and global context, which achieves an inference speed of 67 fps.<n> Experiments on public SOD and COD datasets demonstrate the superiority of our proposed framework in both supervised and unsupervised settings.
arXiv Detail & Related papers (2024-12-22T03:25:43Z) - Just a Hint: Point-Supervised Camouflaged Object Detection [4.38858748263547]
Camouflaged Object Detection (COD) demands models to expeditiously and accurately distinguish objects seamlessly in the environment.
We propose to fulfill this task with the help of only one point supervision.
Specifically, by swiftly clicking on each object, we first adaptively expand the original point-based annotation to a reasonable hint area.
Then, to avoid partial localization around discriminative parts, we propose an attention regulator to scatter model attention to the whole object.
arXiv Detail & Related papers (2024-08-20T12:17:25Z) - 3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior [62.80458034704989]
Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics.
Previous 3D face modeling methods face a challenge as they demand specific labels to effectively disentangle these factors.
This paper introduces a Weakly-Supervised Disentanglement Framework, denoted as WSDF, to facilitate the training of controllable 3D face models without an overly stringent labeling requirement.
arXiv Detail & Related papers (2024-04-25T11:50:47Z) - DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation [6.775785126617824]
We propose a dual student framework with trustworthy progressive learning (DuPL)
Experiment results demonstrate the superiority of the proposed DuPL over the recent state-of-the-art alternatives on PASCAL VOC 2012 and MS datasets.
arXiv Detail & Related papers (2024-03-17T12:14:34Z) - SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework.
Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.