Sharp Eyes: A Salient Object Detector Working The Same Way as Human
Visual Characteristics
- URL: http://arxiv.org/abs/2301.07431v1
- Date: Wed, 18 Jan 2023 11:00:45 GMT
- Title: Sharp Eyes: A Salient Object Detector Working The Same Way as Human
Visual Characteristics
- Authors: Ge Zhu, Jinbao Li and Yahong Guo
- Abstract summary: We propose a sharp eyes network (SENet) that first seperates the object from scene, and then finely segments it.
The proposed method aims to utilize the expanded objects to guide the network obtain complete prediction.
- Score: 3.222802562733787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current methods aggregate multi-level features or introduce edge and skeleton
to get more refined saliency maps. However, little attention is paid to how to
obtain the complete salient object in cluttered background, where the targets
are usually similar in color and texture to the background. To handle this
complex scene, we propose a sharp eyes network (SENet) that first seperates the
object from scene, and then finely segments it, which is in line with human
visual characteristics, i.e., to look first and then focus. Different from
previous methods which directly integrate edge or skeleton to supplement the
defects of objects, the proposed method aims to utilize the expanded objects to
guide the network obtain complete prediction. Specifically, SENet mainly
consists of target separation (TS) brach and object segmentation (OS) branch
trained by minimizing a new hierarchical difference aware (HDA) loss. In the TS
branch, we construct a fractal structure to produce saliency features with
expanded boundary via the supervision of expanded ground truth, which can
enlarge the detail difference between foreground and background. In the OS
branch, we first aggregate multi-level features to adaptively select
complementary components, and then feed the saliency features with expanded
boundary into aggregated features to guide the network obtain complete
prediction. Moreover, we propose the HDA loss to further improve the structural
integrity and local details of the salient objects, which assigns weight to
each pixel according to its distance from the boundary hierarchically. Hard
pixels with similar appearance in border region will be given more attention
hierarchically to emphasize their importance in completeness prediction.
Comprehensive experimental results on five datasets demonstrate that the
proposed approach outperforms the state-of-the-art methods both quantitatively
and qualitatively.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model [19.800353299691277]
This paper presents a more challenging gaze object segmentation (GOS) task, which involves inferring the pixel-level mask corresponding to the object captured by human gaze behavior.
We propose to automatically obtain head features from scene features to ensure the model's inference efficiency and flexibility in the real world.
arXiv Detail & Related papers (2024-08-02T06:32:45Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - GaTector: A Unified Framework for Gaze Object Prediction [11.456242421204298]
We build a novel framework named GaTector to tackle the gaze object prediction problem in a unified way.
To better consider the specificity of inputs and tasks, GaTector introduces two input-specific blocks before the shared backbone and three task-specific blocks after the shared backbone.
In the end, we propose a novel wUoC metric that can reveal the difference between boxes even when they share no overlapping area.
arXiv Detail & Related papers (2021-12-07T07:50:03Z) - Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection.
The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.