M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient
Object Detection
- URL: http://arxiv.org/abs/2309.08365v1
- Date: Fri, 15 Sep 2023 12:46:14 GMT
- Title: M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient
Object Detection
- Authors: Yao Yuan, Pan Gao, XiaoYang Tan
- Abstract summary: M$3$Net is an attention network for Salient Object Detection.
Cross-attention approach to achieve the interaction between multilevel features.
Mixed Attention Block aims at modeling context at both global and local levels.
Multilevel supervision strategy to optimize the aggregated feature stage-by-stage.
- Score: 22.60675416709486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing salient object detection methods mostly use U-Net or feature
pyramid structure, which simply aggregates feature maps of different scales,
ignoring the uniqueness and interdependence of them and their respective
contributions to the final prediction. To overcome these, we propose the
M$^3$Net, i.e., the Multilevel, Mixed and Multistage attention network for
Salient Object Detection (SOD). Firstly, we propose Multiscale Interaction
Block which innovatively introduces the cross-attention approach to achieve the
interaction between multilevel features, allowing high-level features to guide
low-level feature learning and thus enhancing salient regions. Secondly,
considering the fact that previous Transformer based SOD methods locate salient
regions only using global self-attention while inevitably overlooking the
details of complex objects, we propose the Mixed Attention Block. This block
combines global self-attention and window self-attention, aiming at modeling
context at both global and local levels to further improve the accuracy of the
prediction map. Finally, we proposed a multilevel supervision strategy to
optimize the aggregated feature stage-by-stage. Experiments on six challenging
datasets demonstrate that the proposed M$^3$Net surpasses recent CNN and
Transformer-based SOD arts in terms of four metrics. Codes are available at
https://github.com/I2-Multimedia-Lab/M3Net.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z) - DFNet: Discriminative feature extraction and integration network for
salient object detection [6.959742268104327]
We focus on two aspects of challenges in saliency detection using Convolutional Neural Networks.
Firstly, since salient objects appear in various sizes, using single-scale convolution would not capture the right size.
Secondly, using multi-level features helps the model use both local and global context.
arXiv Detail & Related papers (2020-04-03T13:56:41Z) - Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection.
The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.