RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection
- URL: http://arxiv.org/abs/2306.12621v1
- Date: Thu, 22 Jun 2023 01:27:00 GMT
- Title: RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection
- Authors: Jin Ma, Jinlong Li, Qing Guo, Tianyun Zhang, Yuewei Lin, Hongkai Yu
- Abstract summary: A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities.
We propose RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously.
Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.
- Score: 22.53413063906737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy
for the limited application scenarios of traditional RGB camera. The RGB-X
tasks, which rely on RGB input and another type of data input to resolve
specific problems, have become a popular research topic in multimedia. A
crucial part in two-branch RGB-X deep neural networks is how to fuse
information across modalities. Given the tremendous information inside RGB-X
networks, previous works typically apply naive fusion (e.g., average or max
fusion) or only focus on the feature fusion at the same scale(s). While in this
paper, we propose a novel method called RXFOOD for the fusion of features
across different scales within the same modality branch and from different
modality branches simultaneously in a unified attention mechanism. An Energy
Exchange Module is designed for the interaction of each feature map's energy
matrix, who reflects the inter-relationship of different positions and
different channels inside a feature map. The RXFOOD method can be easily
incorporated to any dual-branch encoder-decoder network as a plug-in module,
and help the original backbone network better focus on important positions and
channels for object of interest detection. Experimental results on RGB-NIR
salient object detection, RGB-D salient object detection, and RGBFrequency
image manipulation detection demonstrate the clear effectiveness of the
proposed RXFOOD.
Related papers
- Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection [20.12812979315803]
Object detection utilizing both visible (RGB) and thermal infrared (IR) imagery has garnered extensive attention.
Most existing multi-modal object detection methods directly input the RGB and IR images into deep neural networks.
We propose a novel coarse-to-fine perspective to purify and fuse features from both modalities.
arXiv Detail & Related papers (2024-01-19T14:49:42Z) - HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.