A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection
- URL: http://arxiv.org/abs/2501.06038v1
- Date: Fri, 10 Jan 2025 15:17:02 GMT
- Title: A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection
- Authors: Tsui Qin Mok, Shuyong Gao, Haozhe Xing, Miaoyang He, Yan Wang, Wenqiang Zhang,
- Abstract summary: Weakly-Supervised Camouflaged Object Detection (WSCOD) has gained popularity for its promise to train models with weak labels.
This paper introduces a novel holistically point-guided text framework for WSCOD by decomposing into three phases: segment, choose, train.
- Score: 23.606879684161957
- License:
- Abstract: Weakly-Supervised Camouflaged Object Detection (WSCOD) has gained popularity for its promise to train models with weak labels to segment objects that visually blend into their surroundings. Recently, some methods using sparsely-annotated supervision shown promising results through scribbling in WSCOD, while point-text supervision remains underexplored. Hence, this paper introduces a novel holistically point-guided text framework for WSCOD by decomposing into three phases: segment, choose, train. Specifically, we propose Point-guided Candidate Generation (PCG), where the point's foreground serves as a correction for the text path to explicitly correct and rejuvenate the loss detection object during the mask generation process (SEGMENT). We also introduce a Qualified Candidate Discriminator (QCD) to choose the optimal mask from a given text prompt using CLIP (CHOOSE), and employ the chosen pseudo mask for training with a self-supervised Vision Transformer (TRAIN). Additionally, we developed a new point-supervised dataset (P2C-COD) and a text-supervised dataset (T-COD). Comprehensive experiments on four benchmark datasets demonstrate our method outperforms state-of-the-art methods by a large margin, and also outperforms some existing fully-supervised camouflaged object detection methods.
Related papers
- Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - CamoFormer: Masked Separable Attention for Camouflaged Object Detection [94.2870722866853]
We present a simple masked separable attention (MSA) for camouflaged object detection.
We first separate the multi-head self-attention into three parts, which are responsible for distinguishing the camouflaged objects from the background using different mask strategies.
We propose to capture high-resolution semantic representations progressively based on a simple top-down decoder with the proposed MSA to attain precise segmentation results.
arXiv Detail & Related papers (2022-12-10T10:03:27Z) - Pointly-Supervised Panoptic Segmentation [106.68888377104886]
We propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation.
Instead of the dense pixel-level labels used by fully supervised methods, point-level labels only provide a single point for each target as supervision.
We formulate the problem in an end-to-end framework by simultaneously generating panoptic pseudo-masks from point-level labels and learning from them.
arXiv Detail & Related papers (2022-10-25T12:03:51Z) - ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object
Detection [114.54835359657707]
ProposalContrast is an unsupervised point cloud pre-training framework.
It learns robust 3D representations by contrasting region proposals.
ProposalContrast is verified on various 3D detectors.
arXiv Detail & Related papers (2022-07-26T04:45:49Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - Boundary-Guided Camouflaged Object Detection [20.937071658007255]
We propose a novel boundary-guided network (BGNet) for camouflaged object detection.
Our method explores valuable and extra object-related edge semantics to guide representation learning of COD.
Our method promotes camouflaged object detection of accurate boundary localization.
arXiv Detail & Related papers (2022-07-02T10:48:35Z) - Weakly-Supervised Salient Object Detection Using Point Supervison [17.88596733603456]
Current state-of-the-art saliency detection models rely heavily on large datasets of accurate pixel-wise annotations.
We propose a novel weakly-supervised salient object detection method using point supervision.
Our method outperforms the previous state-of-the-art methods trained with the stronger supervision.
arXiv Detail & Related papers (2022-03-22T12:16:05Z) - Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly
Supervised Semantic Segmentation [16.560870740946275]
Explicit Pseudo-pixel Supervision (EPS) learns from pixel-level feedback by combining two weak supervisions.
We devise a joint training strategy to fully utilize the complementary relationship between both information.
Our method can obtain accurate object boundaries and discard co-occurring pixels, thereby significantly improving the quality of pseudo-masks.
arXiv Detail & Related papers (2021-05-19T07:31:11Z) - Unsupervised Object Detection with LiDAR Clues [70.73881791310495]
We present the first practical method for unsupervised object detection with the aid of LiDAR clues.
In our approach, candidate object segments based on 3D point clouds are firstly generated.
Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network.
The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution.
arXiv Detail & Related papers (2020-11-25T18:59:54Z) - DGST : Discriminator Guided Scene Text detector [11.817428636084305]
This paper proposes a detector framework based on the conditional generative adversarial networks to improve the segmentation effect of scene text detection.
Experiments on standard datasets demonstrate that the proposed D GST brings noticeable gain and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-02-28T01:47:36Z) - Learning Object Scale With Click Supervision for Object Detection [29.421113887739413]
We propose a simple yet effective method which incorporatesCNN visualization with click supervision to generate the pseudoground-truths.
These pseudo ground-truthscans be used to train a fully-supervised detector.
Experimental results on the PASCAL VOC2007 and VOC 2012 datasets show that the proposed methodcan obtain much higher accuracy for estimating the object scale.
arXiv Detail & Related papers (2020-02-20T03:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.