Salient Object Ranking with Position-Preserved Attention
- URL: http://arxiv.org/abs/2106.05047v2
- Date: Thu, 10 Jun 2021 02:23:59 GMT
- Title: Salient Object Ranking with Position-Preserved Attention
- Authors: Hao Fang, Daoxin Zhang, Yi Zhang, Minghao Chen, Jiawei Li, Yao Hu,
Deng Cai and Xiaofei He
- Abstract summary: We study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency.
We propose the first end-to-end framework of the SOR task and solve it in a multi-task learning fashion.
We also introduce a Position-Preserved Attention (PPA) module tailored for the SOR branch.
- Score: 44.94722064885407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance segmentation can detect where the objects are in an image, but hard
to understand the relationship between them. We pay attention to a typical
relationship, relative saliency. A closely related task, salient object
detection, predicts a binary map highlighting a visually salient region while
hard to distinguish multiple objects. Directly combining two tasks by
post-processing also leads to poor performance. There is a lack of research on
relative saliency at present, limiting the practical applications such as
content-aware image cropping, video summary, and image labeling.
In this paper, we study the Salient Object Ranking (SOR) task, which manages
to assign a ranking order of each detected object according to its visual
saliency. We propose the first end-to-end framework of the SOR task and solve
it in a multi-task learning fashion. The framework handles instance
segmentation and salient object ranking simultaneously. In this framework, the
SOR branch is independent and flexible to cooperate with different detection
methods, so that easy to use as a plugin. We also introduce a
Position-Preserved Attention (PPA) module tailored for the SOR branch. It
consists of the position embedding stage and feature interaction stage.
Considering the importance of position in saliency comparison, we preserve
absolute coordinates of objects in ROI pooling operation and then fuse
positional information with semantic features in the first stage. In the
feature interaction stage, we apply the attention mechanism to obtain
proposals' contextualized representations to predict their relative ranking
orders. Extensive experiments have been conducted on the ASR dataset. Without
bells and whistles, our proposed method outperforms the former state-of-the-art
method significantly. The code will be released publicly available.
Related papers
- Order-aware Interactive Segmentation [29.695857327102647]
OIS: order-aware interactive segmentation, where we explicitly encode the relative depth between objects into order maps.
We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions (in the form of clicks) to attend to the image features.
Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works.
arXiv Detail & Related papers (2024-10-16T04:19:28Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Rethinking Video Salient Object Ranking [39.091162729266294]
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image.
Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map.
We propose an end-to-end method for video salient object ranking (VSOR), with two novel modules.
arXiv Detail & Related papers (2022-03-31T17:55:54Z) - Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene
Recognition [19.503027767462605]
We propose an Object-to-Scene (OTS) method, which extracts object features and learns object relations to recognize indoor scenes.
OTS outperforms the state-of-the-art methods by more than 2% on indoor scene recognition without using any additional streams.
arXiv Detail & Related papers (2021-08-01T08:37:08Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.