Related papers: CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

URL: http://arxiv.org/abs/2307.08098v2
Date: Tue, 11 Jun 2024 14:07:59 GMT
Title: CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation
Authors: Jialun Pei, Tao Jiang, He Tang, Nian Liu, Yueming Jin, Deng-Ping Fan, Pheng-Ann Heng,
Abstract summary: CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF) Experiments show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set.
Score: 88.50067783122559
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set, which significantly surpasses the alternative frameworks. Our code and dataset are available at: https://github.com/PJLallen/CalibNet.

Related papers

DDUNet: Dual Dynamic U-Net for Highly-Efficient Cloud Segmentation [9.625982455419306]
We propose a Dual Dynamic U-Net (DDUNet) for supervised cloud segmentation. The DDUNet adheres to a U-Net architecture and integrates two crucial modules: the dynamic multi-scale convolution (DMSC) and the dynamic weights and bias generator (DWBG)
arXiv Detail & Related papers (2025-01-26T03:54:14Z)
Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection [10.353412441955436]
We propose the GL-DMNet, a novel dual mutual learning network with global-local awareness. We present a position mutual fusion module and a channel mutual fusion module to exploit the interdependencies among different modalities. Our proposed GL-DMNet performs better than 24 RGB-D SOD methods, achieving an average improvement of 3%.
arXiv Detail & Related papers (2025-01-03T05:37:54Z)
ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification [5.863175733097434]
We propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet) to address the issue of asymmetry at the feature level. The proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. We have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%.
arXiv Detail & Related papers (2024-12-03T00:03:33Z)
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation [15.414518995812754]
We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets.
arXiv Detail & Related papers (2024-05-28T06:16:57Z)
Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection. Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps. Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z)
Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD. Two components are designed to implement the effective cross-modality interaction. Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z)
Self-Supervised Representation Learning for RGB-D Salient Object Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation. Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts. For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z)
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map. Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.