A Visual Representation-guided Framework with Global Affinity for Weakly
Supervised Salient Object Detection
- URL: http://arxiv.org/abs/2302.10697v2
- Date: Fri, 9 Jun 2023 01:30:00 GMT
- Title: A Visual Representation-guided Framework with Global Affinity for Weakly
Supervised Salient Object Detection
- Authors: Binwei Xu, Haoran Liang, Weihua Gong, Ronghua Liang, Peng Chen
- Abstract summary: We propose a framework guided by general visual representations with rich contextual semantic knowledge for scribble-based SOD.
These general visual representations are generated by self-supervised learning based on large-scale unlabeled datasets.
Our method achieves comparable or even superior performance to the state-of-the-art fully supervised models.
- Score: 8.823804648745487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully supervised salient object detection (SOD) methods have made
considerable progress in performance, yet these models rely heavily on
expensive pixel-wise labels. Recently, to achieve a trade-off between labeling
burden and performance, scribble-based SOD methods have attracted increasing
attention. Previous scribble-based models directly implement the SOD task only
based on SOD training data with limited information, it is extremely difficult
for them to understand the image and further achieve a superior SOD task. In
this paper, we propose a simple yet effective framework guided by general
visual representations with rich contextual semantic knowledge for
scribble-based SOD. These general visual representations are generated by
self-supervised learning based on large-scale unlabeled datasets. Our framework
consists of a task-related encoder, a general visual module, and an information
integration module to efficiently combine the general visual representations
with task-related features to perform the SOD task based on understanding the
contextual connections of images. Meanwhile, we propose a novel global semantic
affinity loss to guide the model to perceive the global structure of the
salient objects. Experimental results on five public benchmark datasets
demonstrate that our method, which only utilizes scribble annotations without
introducing any extra label, outperforms the state-of-the-art weakly supervised
SOD methods. Specifically, it outperforms the previous best scribble-based
method on all datasets with an average gain of 5.5% for max f-measure, 5.8% for
mean f-measure, 24% for MAE, and 3.1% for E-measure. Moreover, our method
achieves comparable or even superior performance to the state-of-the-art fully
supervised models.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - The Pursuit of Human Labeling: A New Perspective on Unsupervised
Learning [6.17147517649596]
We present HUME, a model-agnostic framework for inferring human labeling of a given dataset without any external supervision.
HUME utilizes this insight to guide the search over all possible labelings of a dataset to discover an underlying human labeling.
We show that the proposed optimization objective is strikingly well-correlated with the ground truth labeling of the dataset.
arXiv Detail & Related papers (2023-11-06T08:16:41Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Semantic Distillation Guided Salient Object Detection [17.653600212923223]
CNN-based salient object detection methods often misinterpret the real saliency due to the subjectiveness of the SOD task and the locality of convolution layers.
We propose a semantic distillation guided SOD (SDG-SOD) method that produces accurate results by fusing semantically distilled knowledge from generated image captioning into the Vision-Transformer-based SOD framework.
arXiv Detail & Related papers (2022-03-08T13:40:51Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Structure-Consistent Weakly Supervised Salient Object Detection with
Local Saliency Coherence [14.79639149658596]
We propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations.
Our method achieves a new state-of-the-art performance on six benchmarks.
arXiv Detail & Related papers (2020-12-08T12:49:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.