Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors
- URL: http://arxiv.org/abs/2410.17551v1
- Date: Wed, 23 Oct 2024 04:32:37 GMT
- Title: Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors
- Authors: Bang You, Huaping Liu,
- Abstract summary: Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively.
Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs.
We argue that compressing information in the learned joint representations about raw multimodal observations is helpful.
- Score: 10.454194186065195
- License:
- Abstract: Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively from multiple sensory modalities that differ in many characteristics. Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs to improve the sample efficiency and performance of reinforcement learning algorithms. However, the representations learned by these methods could capture information irrelevant to learning a policy and may degrade the performance. We argue that compressing information in the learned joint representations about raw multimodal observations is helpful, and propose a multimodal information bottleneck model to learn task-relevant joint representations from egocentric images and proprioception. Our model compresses and retains the predictive information in multimodal observations for learning a compressed joint representation, which fuses complementary information from visual and proprioceptive feedback and meanwhile filters out task-irrelevant information in raw multimodal observations. We propose to minimize the upper bound of our multimodal information bottleneck objective for computationally tractable optimization. Experimental evaluations on several challenging locomotion tasks with egocentric images and proprioception show that our method achieves better sample efficiency and zero-shot robustness to unseen white noise than leading baselines. We also empirically demonstrate that leveraging information from egocentric images and proprioception is more helpful for learning policies on locomotion tasks than solely using one single modality.
Related papers
- Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Multi-view Information Bottleneck Without Variational Approximation [34.877573432746246]
We extend the information bottleneck principle to a supervised multi-view learning scenario.
We use the recently proposed matrix-based R'enyi's $alpha$-order entropy functional to optimize the resulting objective.
Empirical results in both synthetic and real-world datasets suggest that our method enjoys improved robustness to noise and redundant information in each view.
arXiv Detail & Related papers (2022-04-22T06:48:04Z) - Rethinking Minimal Sufficient Representation in Contrastive Learning [28.83450836832452]
We show that contrastive learning models have the risk of over-fitting to the shared information between views.
We propose to increase the mutual information between the representation and input as regularization to approximately introduce more task-relevant information.
It significantly improves the performance of several classic contrastive learning models in downstream tasks.
arXiv Detail & Related papers (2022-03-14T11:17:48Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Multi-Pretext Attention Network for Few-shot Learning with
Self-supervision [37.6064643502453]
We propose a novel augmentation-free method for self-supervised learning, which does not rely on any auxiliary sample.
Besides, we propose Multi-pretext Attention Network (MAN), which exploits a specific attention mechanism to combine the traditional augmentation-relied methods and our GC.
We evaluate our MAN extensively on miniImageNet and tieredImageNet datasets and the results demonstrate that the proposed method outperforms the state-of-the-art (SOTA) relevant methods.
arXiv Detail & Related papers (2021-03-10T10:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.