MPI: Multi-receptive and Parallel Integration for Salient Object
Detection
- URL: http://arxiv.org/abs/2108.03618v1
- Date: Sun, 8 Aug 2021 12:01:44 GMT
- Title: MPI: Multi-receptive and Parallel Integration for Salient Object
Detection
- Authors: Han Sun, Jun Cen, Ningzhong Liu, Dong Liang, Huiyu Zhou
- Abstract summary: The semantic representation of deep features is essential for image context understanding.
In this paper, a novel method called MPI is proposed for salient object detection.
The proposed method outperforms state-of-the-art methods under different evaluation metrics.
- Score: 17.32228882721628
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The semantic representation of deep features is essential for image context
understanding, and effective fusion of features with different semantic
representations can significantly improve the model's performance on salient
object detection. In this paper, a novel method called MPI is proposed for
salient object detection. Firstly, a multi-receptive enhancement module (MRE)
is designed to effectively expand the receptive fields of features from
different layers and generate features with different receptive fields. MRE can
enhance the semantic representation and improve the model's perception of the
image context, which enables the model to locate the salient object accurately.
Secondly, in order to reduce the reuse of redundant information in the complex
top-down fusion method and weaken the differences between semantic features, a
relatively simple but effective parallel fusion strategy (PFS) is proposed. It
allows multi-scale features to better interact with each other, thus improving
the overall performance of the model. Experimental results on multiple datasets
demonstrate that the proposed method outperforms state-of-the-art methods under
different evaluation metrics.
Related papers
- EMMA: Efficient Visual Alignment in Multi-Modal LLMs [56.03417732498859]
EMMA is a lightweight cross-modality module designed to efficiently fuse visual and textual encodings.
EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations.
arXiv Detail & Related papers (2024-10-02T23:00:31Z) - Cross-domain Multi-modal Few-shot Object Detection via Rich Text [21.36633828492347]
Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks.
We study the Cross-Domain few-shot generalization of MM-OD (CDMM-FSOD) and propose a meta-learning based multi-modal few-shot object detection method.
arXiv Detail & Related papers (2024-03-24T15:10:22Z) - Self-Supervised Representation Learning with Meta Comprehensive
Regularization [11.387994024747842]
We introduce a module called CompMod with Meta Comprehensive Regularization (MCR), embedded into existing self-supervised frameworks.
We update our proposed model through a bi-level optimization mechanism, enabling it to capture comprehensive features.
We provide theoretical support for our proposed method from information theory and causal counterfactual perspective.
arXiv Detail & Related papers (2024-03-03T15:53:48Z) - ICAFusion: Iterative Cross-Attention Guided Feature Fusion for
Multispectral Object Detection [25.66305300362193]
A novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction.
This framework enhances the discriminability of object features through the query-guided cross-attention mechanism.
The proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
arXiv Detail & Related papers (2023-08-15T00:02:10Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Progressive Multi-scale Fusion Network for RGB-D Salient Object
Detection [9.099589602551575]
We discuss about the advantages of the so-called progressive multi-scale fusion method and propose a mask-guided feature aggregation module.
The proposed framework can effectively combine the two features of different modalities and alleviate the impact of erroneous depth features.
We further introduce a mask-guided refinement module(MGRM) to complement the high-level semantic features and reduce the irrelevant features from multi-scale fusion.
arXiv Detail & Related papers (2021-06-07T20:02:39Z) - Towards Accurate Camouflaged Object Detection with Mixture Convolution and Interactive Fusion [45.45231015502287]
We propose a novel deep learning based COD approach, which integrates the large receptive field and effective feature fusion into a unified framework.
Our method detects camouflaged objects with an effective fusion strategy, which aggregates the rich context information from a large receptive field.
arXiv Detail & Related papers (2021-01-14T16:06:08Z) - Centralized Information Interaction for Salient Object Detection [68.8587064889475]
The U-shape structure has shown its advantage in salient object detection for efficiently combining multi-scale features.
This paper shows that by centralizing these connections, we can achieve the cross-scale information interaction among them.
Our approach can cooperate with various existing U-shape-based salient object detection methods by substituting the connections between the bottom-up and top-down pathways.
arXiv Detail & Related papers (2020-12-21T12:42:06Z) - Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance.
Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.