Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter
- URL: http://arxiv.org/abs/2008.05397v1
- Date: Mon, 10 Aug 2020 07:12:43 GMT
- Title: Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter
- Authors: Zhenyu Wu, Shuai Li, Chenglizhao Chen, Aimin Hao, Hong Qin
- Abstract summary: We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
- Score: 62.26677215668959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The real human attention is an interactive activity between our visual system
and our brain, using both low-level visual stimulus and high-level semantic
information. Previous image salient object detection (SOD) works conduct their
saliency predictions in a multi-task manner, i.e., performing pixel-wise
saliency regression and segmentation-like saliency refinement at the same time,
which degenerates their feature backbones in revealing semantic information.
However, given an image, we tend to pay more attention to those regions which
are semantically salient even in the case that these regions are perceptually
not the most salient ones at first glance. In this paper, we divide the SOD
problem into two sequential tasks: 1) we propose a lightweight, weakly
supervised deep network to coarsely locate those semantically salient regions
first; 2) then, as a post-processing procedure, we selectively fuse multiple
off-the-shelf deep models on these semantically salient regions as the
pixel-wise saliency refinement. In sharp contrast to the state-of-the-art
(SOTA) methods that focus on learning pixel-wise saliency in "single image"
using perceptual clues mainly, our method has investigated the "object-level
semantic ranks between multiple images", of which the methodology is more
consistent with the real human attention mechanism. Our method is simple yet
effective, which is the first attempt to consider the salient object detection
mainly as an object-level semantic re-ranking problem.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
Traditional Models to Foundation Models [33.690846523358836]
Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid pixel-level labels.
We focus on the WSSS with image-level labels, which is the most challenging form of WSSS.
We investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS.
arXiv Detail & Related papers (2023-10-19T07:16:54Z) - Learning to search for and detect objects in foveal images using deep
learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image.
The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene.
We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Detect and Locate: A Face Anti-Manipulation Approach with Semantic and
Noise-level Supervision [67.73180660609844]
We propose a conceptually simple but effective method to efficiently detect forged faces in an image.
The proposed scheme relies on a segmentation map that delivers meaningful high-level semantic information clues about the image.
The proposed model achieves state-of-the-art detection accuracy and remarkable localization performance.
arXiv Detail & Related papers (2021-07-13T02:59:31Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Instance-aware Remote Sensing Image Captioning with Cross-hierarchy
Attention [11.23821696220285]
spatial attention is a straightforward approach to enhance the performance for remote sensing image captioning.
We propose a remote sensing image caption generator with instance-awareness and cross-hierarchy attention.
arXiv Detail & Related papers (2021-05-11T12:59:07Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.