Learning Fused State Representations for Control from Multi-View Observations
- URL: http://arxiv.org/abs/2502.01316v1
- Date: Mon, 03 Feb 2025 12:46:02 GMT
- Title: Learning Fused State Representations for Control from Multi-View Observations
- Authors: Zeyu Wang, Yao-Hui Li, Xin Li, Hongyu Zang, Romain Laroche, Riashat Islam,
- Abstract summary: Multi-view Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision.
Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks.
We propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations.
- Score: 19.862313754887648
- License:
- Abstract: Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC's robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance.
Related papers
- Balanced Multi-view Clustering [56.17836963920012]
Multi-view clustering (MvC) aims to integrate information from different views to enhance the capability of the model in capturing the underlying data structures.
The widely used joint training paradigm in MvC is potentially not fully leverage the multi-view information.
We propose a novel balanced multi-view clustering (BMvC) method, which introduces a view-specific contrastive regularization (VCR) to modulate the optimization of each view.
arXiv Detail & Related papers (2025-01-05T14:42:47Z) - Rethinking Multi-view Representation Learning via Distilled Disentangling [34.14711778177439]
Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources.
This paper presents an in-depth analysis of existing approaches in this domain, highlighting the redundancy between view-consistent and view-specific representations.
We propose an innovative framework for multi-view representation learning, which incorporates a technique we term 'distilled disentangling'
arXiv Detail & Related papers (2024-03-16T11:21:24Z) - Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model [83.85856356798531]
VistaLLM is a visual system that addresses coarse- and fine-grained vision-language tasks.
It employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences.
We also introduce a novel task, AttCoSeg, which boosts the model's reasoning and grounding capability over multiple input images.
arXiv Detail & Related papers (2023-12-19T18:53:01Z) - Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios [35.32285779434823]
Multi-view clustering (MVC) aims at exploring category structures among multi-view data in self-supervised manners.
noisy views might seriously degenerate when the views are noisy in practical multi-view scenarios.
We propose a theoretically grounded deep MVC method (namely MVCAN) to address this issue.
arXiv Detail & Related papers (2023-03-30T09:22:17Z) - Robust Representation Learning by Clustering with Bisimulation Metrics
for Visual Reinforcement Learning with Distractions [9.088460902782547]
Clustering with Bisimulation Metrics (CBM) learns robust representations by grouping visual observations in the latent space.
CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments.
Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms.
arXiv Detail & Related papers (2023-02-12T13:27:34Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - Tackling Visual Control via Multi-View Exploration Maximization [64.8463574294237]
MEM is the first approach that combines multi-view representation learning and reward-driven exploration in reinforcement learning (RL)
We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games.
arXiv Detail & Related papers (2022-11-28T11:29:56Z) - Mask-based Latent Reconstruction for Reinforcement Learning [58.43247393611453]
Mask-based Latent Reconstruction (MLR) is proposed to predict the complete state representations in the latent space from the observations with spatially and temporally masked pixels.
Extensive experiments show that our MLR significantly improves the sample efficiency in deep reinforcement learning.
arXiv Detail & Related papers (2022-01-28T13:07:11Z) - Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem.
The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other.
Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.