Distribution-Specific Learning for Joint Salient and Camouflaged Object Detection
- URL: http://arxiv.org/abs/2508.06063v1
- Date: Fri, 08 Aug 2025 06:52:54 GMT
- Title: Distribution-Specific Learning for Joint Salient and Camouflaged Object Detection
- Authors: Chao Hao, Zitong Yu, Xin Liu, Yuhao Wang, Weicheng Xie, Jingang Shi, Huanjing Yue, Jingyu Yang,
- Abstract summary: Salient object detection (SOD) and camouflaged object detection (COD) are two closely related but distinct computer vision tasks.<n>Previous works have mostly believed that joint learning of these two tasks would confuse the network, reducing its performance on both tasks.<n>We propose SCJoint a joint learning scheme for SOD and COD tasks, assuming that the decoding processes of SOD and COD have different distribution characteristics.
- Score: 36.80522951291785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Salient object detection (SOD) and camouflaged object detection (COD) are two closely related but distinct computer vision tasks. Although both are class-agnostic segmentation tasks that map from RGB space to binary space, the former aims to identify the most salient objects in the image, while the latter focuses on detecting perfectly camouflaged objects that blend into the background in the image. These two tasks exhibit strong contradictory attributes. Previous works have mostly believed that joint learning of these two tasks would confuse the network, reducing its performance on both tasks. However, here we present an opposite perspective: with the correct approach to learning, the network can simultaneously possess the capability to find both salient and camouflaged objects, allowing both tasks to benefit from joint learning. We propose SCJoint, a joint learning scheme for SOD and COD tasks, assuming that the decoding processes of SOD and COD have different distribution characteristics. The key to our method is to learn the respective means and variances of the decoding processes for both tasks by inserting a minimal amount of task-specific learnable parameters within a fully shared network structure, thereby decoupling the contradictory attributes of the two tasks at a minimal cost. Furthermore, we propose a saliency-based sampling strategy (SBSS) to sample the training set of the SOD task to balance the training set sizes of the two tasks. In addition, SBSS improves the training set quality and shortens the training time. Based on the proposed SCJoint and SBSS, we train a powerful generalist network, named JoNet, which has the ability to simultaneously capture both ``salient" and ``camouflaged". Extensive experiments demonstrate the competitive performance and effectiveness of our proposed method. The code is available at https://github.com/linuxsino/JoNet.
Related papers
- Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection [73.85890512959861]
We propose a task-agnostic framework to unify Salient Object Detection (SOD) and Camouflaged Object Detection (COD)<n>We design a simple yet effective contextual decoder involving the interval-layer and global context, which achieves an inference speed of 67 fps.<n> Experiments on public SOD and COD datasets demonstrate the superiority of our proposed framework in both supervised and unsupervised settings.
arXiv Detail & Related papers (2024-12-22T03:25:43Z) - A Simple yet Effective Network based on Vision Transformer for
Camouflaged Object and Salient Object Detection [33.30644598646274]
We propose a simple yet effective network (SENet) based on vision Transformer (ViT)
To enhance the Transformer's ability to model local information, we propose a local information capture module (LICM)
We also propose a dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and Intersection over Union (IoU) loss, which guides the network to pay more attention to those smaller and more difficult-to-find target objects.
arXiv Detail & Related papers (2024-02-29T07:29:28Z) - VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning [104.74705190239119]
We introduce VSCode, a model with novel 2D prompt learning to jointly address four SOD tasks and three COD tasks.
We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge.
VSCode outperforms state-of-the-art methods across six tasks on 26 datasets.
arXiv Detail & Related papers (2023-11-25T12:34:02Z) - Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged
Object Detection [38.5505943598037]
We propose a novel pre-train, adapt and detect' paradigm to detect camouflaged objects.
By introducing a large pre-trained model, abundant knowledge learned from massive multi-modal data can be directly transferred to COD.
Our method outperforms existing state-of-the-art COD models by large margins.
arXiv Detail & Related papers (2023-07-20T08:25:38Z) - Improving Long-tailed Object Detection with Image-Level Supervision by
Multi-Task Collaborative Learning [18.496765732728164]
We propose a novel framework, CLIS, which leverage image-level supervision to enhance the detection ability in a multi-task collaborative way.
CLIS achieves an overall AP of 31.1 with 10.1 point improvement on tail categories, establishing a new state-of-the-art.
arXiv Detail & Related papers (2022-10-11T16:02:14Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - TOOD: Task-aligned One-stage Object Detection [41.43371563426291]
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization.
We propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner.
Experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing.
arXiv Detail & Related papers (2021-08-17T17:00:01Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.