Mutual Information Regularization for Weakly-supervised RGB-D Salient
Object Detection
- URL: http://arxiv.org/abs/2306.03630v1
- Date: Tue, 6 Jun 2023 12:36:57 GMT
- Title: Mutual Information Regularization for Weakly-supervised RGB-D Salient
Object Detection
- Authors: Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai
- Abstract summary: We present a weakly-supervised RGB-D salient object detection model via supervision.
We focus on effective multimodal representation learning via inter-modal mutual information regularization.
- Score: 33.210575826086654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a weakly-supervised RGB-D salient object detection
model via scribble supervision. Specifically, as a multimodal learning task, we
focus on effective multimodal representation learning via inter-modal mutual
information regularization. In particular, following the principle of
disentangled representation learning, we introduce a mutual information upper
bound with a mutual information minimization regularizer to encourage the
disentangled representation of each modality for salient object detection.
Based on our multimodal representation learning framework, we introduce an
asymmetric feature extractor for our multimodal data, which is proven more
effective than the conventional symmetric backbone setting. We also introduce
multimodal variational auto-encoder as stochastic prediction refinement
techniques, which takes pseudo labels from the first training stage as
supervision and generates refined prediction. Experimental results on benchmark
RGB-D salient object detection datasets verify both effectiveness of our
explicit multimodal disentangled representation learning method and the
stochastic prediction refinement strategy, achieving comparable performance
with the state-of-the-art fully supervised models. Our code and data are
available at: https://github.com/baneitixiaomai/MIRV.
Related papers
- Uniting contrastive and generative learning for event sequences models [51.547576949425604]
This study investigates the integration of two self-supervised learning techniques - instance-wise contrastive learning and a generative approach based on restoring masked events in latent space.
Experiments conducted on several public datasets, focusing on sequence classification and next-event type prediction, show that the integrated method achieves superior performance compared to individual approaches.
arXiv Detail & Related papers (2024-08-19T13:47:17Z) - Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning [10.630297877530614]
We propose a novel Multi-Grained Contrast method (MGC) for unsupervised representation learning.
Specifically, we construct delicate multi-grained correspondences between positive views and then conduct multi-grained contrast by the correspondences to learn more general unsupervised representations.
Our method significantly outperforms the existing state-of-the-art methods on extensive downstream tasks, including object detection, instance segmentation, scene parsing, semantic segmentation and keypoint detection.
arXiv Detail & Related papers (2024-07-02T07:35:21Z) - GM-DF: Generalized Multi-Scenario Deepfake Detection [49.072106087564144]
Existing face forgery detection usually follows the paradigm of training models in a single domain.
In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets.
arXiv Detail & Related papers (2024-06-28T17:42:08Z) - Self-Supervised Representation Learning with Meta Comprehensive
Regularization [11.387994024747842]
We introduce a module called CompMod with Meta Comprehensive Regularization (MCR), embedded into existing self-supervised frameworks.
We update our proposed model through a bi-level optimization mechanism, enabling it to capture comprehensive features.
We provide theoretical support for our proposed method from information theory and causal counterfactual perspective.
arXiv Detail & Related papers (2024-03-03T15:53:48Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via
Sequence Modeling [3.867363075280544]
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data.
New model is developed, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM)
Model achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
arXiv Detail & Related papers (2023-01-06T10:08:11Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Self-supervised Multi-view Stereo via Effective Co-Segmentation and
Data-Augmentation [39.95831985522991]
We propose a framework integrated with more reliable supervision guided by semantic co-segmentation and data-augmentation.
Our proposed methods achieve the state-of-the-art performance among unsupervised methods, and even compete on par with supervised methods.
arXiv Detail & Related papers (2021-04-12T11:48:54Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.