Multi-Modal Mutual Information (MuMMI) Training for Robust
Self-Supervised Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2107.02339v1
- Date: Tue, 6 Jul 2021 01:39:21 GMT
- Title: Multi-Modal Mutual Information (MuMMI) Training for Robust
Self-Supervised Deep Reinforcement Learning
- Authors: Kaiqi Chen, Yong Lee, Harold Soh
- Abstract summary: This work focuses on learning useful and robust deep world models using multiple, possibly unreliable, sensors.
We contribute a new multi-modal deep latent state-space model, trained using a mutual information lower-bound.
Experiments show our method significantly outperforms state-of-the-art deep reinforcement learning methods.
- Score: 13.937546816302715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work focuses on learning useful and robust deep world models using
multiple, possibly unreliable, sensors. We find that current methods do not
sufficiently encourage a shared representation between modalities; this can
cause poor performance on downstream tasks and over-reliance on specific
sensors. As a solution, we contribute a new multi-modal deep latent state-space
model, trained using a mutual information lower-bound. The key innovation is a
specially-designed density ratio estimator that encourages consistency between
the latent codes of each modality. We tasked our method to learn policies (in a
self-supervised manner) on multi-modal Natural MuJoCo benchmarks and a
challenging Table Wiping task. Experiments show our method significantly
outperforms state-of-the-art deep reinforcement learning methods, particularly
in the presence of missing observations.
Related papers
- Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning [8.868945335907867]
We propose a deep modal shared information learning module to capture the shared information between modalities.
We also use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities.
Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.
arXiv Detail & Related papers (2023-05-15T09:24:48Z) - Distilled Mid-Fusion Transformer Networks for Multi-Modal Human Activity
Recognition [34.424960016807795]
Multi-modal Human Activity Recognition could utilize the complementary information to build models that can generalize well.
Deep learning methods have shown promising results, their potential in extracting salient multi-modal spatial-temporal features has not been fully explored.
A knowledge distillation-based Multi-modal Mid-Fusion approach, DMFT, is proposed to conduct informative feature extraction and fusion to resolve the Multi-modal Human Activity Recognition task efficiently.
arXiv Detail & Related papers (2023-05-05T19:26:06Z) - Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal
Human Activity Recognition [1.869225486385596]
We explore the hypothesis that leveraging multiple modalities can lead to better recognition.
We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition.
We propose a flexible, general-purpose framework for performing multimodal self-supervised learning.
arXiv Detail & Related papers (2022-05-20T10:39:16Z) - Hybrid Contrastive Learning of Tri-Modal Representation for Multimodal
Sentiment Analysis [18.4364234071951]
We propose a novel framework HyCon for hybrid contrastive learning of tri-modal representation.
Specifically, we simultaneously perform intra-/inter-modal contrastive learning and semi-contrastive learning.
Our proposed method outperforms existing works.
arXiv Detail & Related papers (2021-09-04T06:04:21Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.